Skip to content

ludazhao/SherlockNet

Repository files navigation

SherlockNet

Using Convolutional Neural Networks to Explore Over 400 Years of Book Illustrations

Alt text

Starting from February 2016, as part of the British Library Labs Competition, we embarked on a collaboration with the British Library Labs and the British Museum to tag and caption the entire British Library 1M Collection, a set of 1 million book illustrations scanned from books published between 1500 and 1900. We proposed to use convolutional neural networks (CNNs) to perform this automatic tagging and captioning. In addition, we proposed deeper analysis of temporal trends in these images using the explanatory power provided by neural networks. Below we provide our deliverables as well as short explanations of the iPython notebooks we wrote for our project.

Our tags and captions can be found at our web portal here. We have also uploaded all our tags to Flickr here.

Disclaimer: This code is research quality only and should be treated as such.

Writeups and Slides

  1. Poster for Stanford CS231N class (March '16)
  2. Writeup for Stanford CS231N class (March '16)
  3. Supplemental Figures (March '16)
  4. Proposal for British Library funding (April '16)
  5. Announcement of Finalist status (May '16)
  6. Progress Notes #1 (June '16)
  7. Progress Presentation (Sep '16)
  8. Progress Notes #2 (Sep '16)
  9. Final Presentation Slides (Nov '16)
  10. Final Reflections (Dec '16)

Key pieces of code

1. Data Preprocessing

  • preprocess.py: This makes all images 256x256 and grayscale. It scales all images such that the 'smaller' dimension is 256 pixels, then crops the image to a square.
  • augment_data.py and image_util.py: This augments our training set with rotations, crops, and other transformations to increase robustness.

2. Training and Tagging

  • retrain.py: Modified TensorFlow for our needs; we performed 2 training steps, first on a manually classified 1.5K image training set, then on a machine-classified and manually validated 10K training set.
  • tag_analysis_on_manual_tags.ipynb: We analyzed how well our 1.5K training set was classified by our model.
  • tag_analysis_on_10K_tags.ipynb: We analyzed how well this 1.5K model performed on a larger 10K test set.
  • tag_1M.py: Tagging all 1M images using the new 10K model.
  • tags_net_analysis.ipynb: Loss functions as our model was training.

3. Analysis of Tags and Trends

  • 1M Tag Analysis.ipynb: Gathering statistics for our dataset -- how many images from each tag, and trends over time
  • look_at_dual_tags.ipynb: How many images have two almost equally likely tags, and what does that say about the images?
  • analyze_maps_by_decade.ipynb: We trained a new model to categorize maps into 4 eras, then analyzed which neurons in this model became more or less active over time. To understand what each neuron represented, we found images that either highly activated the neuron or did not activate it at all.
  • analyze_decorations_by_decade.ipynb: Similar analysis but with decorations.
  • retrain_decades.py: Script used by the above 2 analyses to retrain model to categorize images into eras.

4. Obtaining text around each image, and generating tags using a nearest neighbor voting process

  • get_json_text.py: Save the text from the 3 pages around each image.
  • extract_noun_phrases.py: For each image, extract all noun phrases from its surrounding OCR text..
  • cluster_images_by_ocr.ipynb: Exploring ways to cluster images into topics using Tf-idf, PCA, and LDA
  • nearest_neighbor_ocr_tagging_pca.ipynb: Perform PCA within each category using 10K random images chosen from that category. Transform each image's 2048-dimensional representation from the CNN into a 40-dimensional vector, and calculate the 20 most similar images for each image in this 40D space.
  • get_final_tags.py: For each image, have the 20 most similar images vote on the 20 words that appear most often in all of the images. Before this voting process, the script also performs spell check and stemming on the words.
  • filter_final_tags.ipynb: For each image, make sure its tags are all correctly spelled, and take out stopwords.

5. training and generating captions

Note: This part leverages the open-source package neuraltalk2 for training and evaluating iamge captions, with slight modifications. Our generous thanks to its author, Andrej Karpathy.

  • prepro.py Converts a json files of captions to images into an form easily feed-able into Torch.
  • train.lua training script. For the list of hyperparameters, see the top of the file.
  • eval.lua run the evaluation script to generate captions in the /vis folder. See eval_10k.sh for an example of usage details.

For more usage details, please also consult the documentation of neuraltalk2.

6. Preprocessing, training, tagging & captioning experiments for the British Museum Prints and Drawings(BM) dataset

  • All of the code related to preprocessing, training, tagging, and captioning using the British Museum Prints and Drawings dataset is contained in /captionings/bm_prints_drawings folder. Within, the notebooks are named in order. Please consult in-line headers for more details on each notebook.

7. Uploading tags, captions to Flickr

Data

We will publish our data, both in its raw form and its processed form, at a separate portal. Details coming soon!

About

Using Convolutional Neural Networks to Explore Over 400 Years of Book Illustrations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages