Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.
Ruby CoffeeScript HTML Makefile
Permalink
Failed to load latest commit information.
js
.gitignore
Gemfile
Gemfile.lock
LICENSE
Makefile
README.md
_config.yml
book-aligner-titles.rb
book-aligner.rb
index.html
merge-results.rb

README.md

book-aligner

This repository is for experimental scripts to align books between HathiTrust, Internet Archive, Google Books, etc.

By "alignment", I mean that for a given volume in one repository, I want to try to find any matching volumes in the other repositories.

Ultimately, I want to be able to mash in a HT/IA/GB/etc. URL or other identifier and get a list of potential matches elsewhere on the web.

Requirements

  • make
  • curl
  • Ruby

Usage

The default make target should download and run everything.

WARNING: this currently produces about 4.3GB of output.

Algorithm

The book-aligner.rb script uses bulk metadata downloads from HathiTrust and the Internet Archive to find the complete set of identifiers that have any matching OCLC/LCCN/ISSN/ISBN identifier (~41M matches). These results are then filtered down to those that have a matching volume number or publication year.

HT/IA/GB Relationship Diagram

Because there's no freely-available bulk metadata download for Google Books, we'll have to rely on the 1.1M associations we get for free from Internet Archive metadata.

The second component of this project is a GitHub Pages HTML frontend which includes a small JavaSript library that queries book-aligner.rb matches loaded into Fusion Tables. The code for this is in js/book-aligner.coffee.

Examples

Some examples of what I want for "matching volumes":