Skip to content
master
Go to file
Code
This branch is 1 commit ahead, 2 commits behind internetarchive:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

README.md

collections-cleaners

This is a collection of scripts in use by the collections team to clean up the collections at the Internet Archive. They require administration privileges and the use of the InternetArchive Python interface script.

These scripts are being published on github to allow commentary and learning materials for the internal maintenance of the Archive's stores of materials.

  • frontpage.sh is a script that goes through all the items in a collection (with a texts mediatype) and ensures the first page of every item's document is the thumbnail, and that the online reader starts on the cover.
  • semi.sh is run against an item and if the "Subject" field is a set of subjects separated by semicolons instead of a set of different Subject settings with each entry, it will split them up and add them in properly.

About

Resources

Releases

No releases published

Languages

You can’t perform that action at this time.