Tools for analyzing data from TinEye's MatchEngine service.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cql
data
utils
.gitignore
README.md
gen-artwork-image-map.sh
gen-data.sh
gen-known-map.sh
gen-similarity.sh
import.sh
package.json

README.md

MatchEngine Data Analysis

This project is a collection of Shell and Cypher scripts for Neo4j that consumes MatchEngine image similarity data and generates a queryable graph for further analysis.

At the moment the code and scripts in this repository mostly exist to replicate the existing research and results that've been done against the Frick Photoarchive's Anonymous Italian photo archive and the Zeri Foundation's Italian art photo archive. More information about this research, and the results, can be found here:

Importing Data into Neo4j

To start you'll need to make sure that you have a copy of Neo4j installed on your computer. After you have it installed you'll need to start it. Make sure that it's running locally and is available on the default port.

Once you have done that you should be able to run the following command from your shell:

./import.sh

This will import all the existing data (seen in the data/ directory) into your personal copy of Neo4j. After this has been completed you can then open your browser and visit:

http://localhost:7474/

And you'll be able to query the imported data using Neo4j's Cypher query language.

Generating Data

Currently tools and scripts are provided for generating data from sources at the Frick Photoarchive and the Zeri Foundation. You will need to generate your own data, likely using your own tools, if you wish to analyze your own archive of images.

That being said this repository does contain all the data from the analysis done on the Frick and Zeri's Italian art collections and you can replicate those results by simply importing the data (as detailed above).

Artwork-Image Mapping

You'll need to have a last of image ID with their corresponding artwork IDs. The exact format for this data is detailed here.

In the case of the Frick and Zeri's collections specific tools were needed to convert the data from their existing formats into the preferred format linked to above. Those utilities can be found in the utils/ directory.

The final data resides in data/artwork-image-map.csv.

Known Mapping

Optionally you can provide a hand-curated list of mappings in-between artworks in different collections. This was done for the Frick Photoarchive's anonymous Italian art archive and the Zeri Foundation's 15th century Italian art archive. The hand-generated matches can be found in the data/known-map.csv file. This data can be used to confirm the quality of matches that were generated by MatchEngine, against those of a known expert.

The final data resides in data/known-map.csv.

Image Similarity Data

Finally, the image similarity data itself, as provided by MatchEngine. All image similarity data is generated by using the MatchEngine tools. The tool produces a JSON file which can then be converted into a usable CSV file. The script to do this can be found in shared/gen-similarity.js.

The final data resides in data/similarity.csv.

Credits

Created by John Resig. Released under an MIT license.

Funding for this project was provided by a Digital Resources grant from the Kress Foundation, in cooperation with the Frick Photoarchive.