MatchEngine Data Analysis
This project is a collection of Shell and Cypher scripts for Neo4j that consumes MatchEngine image similarity data and generates a queryable graph for further analysis.
At the moment the code and scripts in this repository mostly exist to replicate the existing research and results that've been done against the Frick Photoarchive's Anonymous Italian photo archive and the Zeri Foundation's Italian art photo archive. More information about this research, and the results, can be found here:
Importing Data into Neo4j
To start you'll need to make sure that you have a copy of Neo4j installed on your computer. After you have it installed you'll need to start it. Make sure that it's running locally and is available on the default port.
Once you have done that you should be able to run the following command from your shell:
./import.sh
This will import all the existing data (seen in the data/
directory) into your personal copy of Neo4j. After this has been completed you can then open your browser and visit:
http://localhost:7474/
And you'll be able to query the imported data using Neo4j's Cypher query language.
Generating Data
Currently tools and scripts are provided for generating data from sources at the Frick Photoarchive and the Zeri Foundation. You will need to generate your own data, likely using your own tools, if you wish to analyze your own archive of images.
That being said this repository does contain all the data from the analysis done on the Frick and Zeri's Italian art collections and you can replicate those results by simply importing the data (as detailed above).
Artwork-Image Mapping
You'll need to have a last of image ID with their corresponding artwork IDs. The exact format for this data is detailed here.
In the case of the Frick and Zeri's collections specific tools were needed to convert the data from their existing formats into the preferred format linked to above. Those utilities can be found in the utils/
directory.
The final data resides in data/artwork-image-map.csv
.
Known Mapping
Optionally you can provide a hand-curated list of mappings in-between artworks in different collections. This was done for the Frick Photoarchive's anonymous Italian art archive and the Zeri Foundation's 15th century Italian art archive. The hand-generated matches can be found in the data/known-map.csv
file. This data can be used to confirm the quality of matches that were generated by MatchEngine, against those of a known expert.
The final data resides in data/known-map.csv
.
Image Similarity Data
Finally, the image similarity data itself, as provided by MatchEngine. All image similarity data is generated by using the MatchEngine tools. The tool produces a JSON file which can then be converted into a usable CSV file. The script to do this can be found in shared/gen-similarity.js
.
The final data resides in data/similarity.csv
.
Credits
Created by John Resig. Released under an MIT license.
Funding for this project was provided by a Digital Resources grant from the Kress Foundation, in cooperation with the Frick Photoarchive.