Archive-It Utilties is a Python library for extracting information from Archive-It collections. Most work is currently done through a single class
ArchiveItCollection, which performs screen-scraping in order to acquire general collection metadata, seed lists, and seed metadata.
This package is called
aiu on PyPI. Installation is handled via
pip install aiu
The heart of Archive-It Utilities is a class named
ArchiveItCollection that has many methods for extracting information about an Archive-It collection using its collection identifier.
For example, to use iPython to get information about Archive-It collection number 5728, one can execute the following:
In : from aiu import ArchiveItCollection In : aic = ArchiveItCollection(5728) In : aic.get_collection_name() Out: 'Social Media' In : aic.get_collectedby() Out: 'Willamette University' In : aic.get_archived_since() Out: 'Apr, 2015' In : aic.is_private() Out: False In : seeds = aic.list_seed_uris() In : len(seeds) Out: 107
From this session we now know that the collection's name is Social Media, it was collected by Willamette University, it has been archived since April 2015, it is not private, and it has 107 seeds.
For now, examine the source in
aiu/archiveit_collection.py for a full list of methods to use with this class.