# Replica project

## Creating a topology of artworks from the Cini Foundation Fototeca

### The data

The Cini Foundation possesses 300,000 cardboards containing the photograph of artworks and some metadata. These have been digitized as part of the Replica project and are now available in IIIF format.

The artworks span 12th-20th century, with most artworks being between 1400-1699. The artworks are mostly from Venician artists and most are Italian/European.

In [1]:
# loading the metadata
%load_ext autoreload
%autoreload 2

import pandas as pd
from IPython.display import Image


metadata = pd.read_csv('./CiniDatabases_August2021/Cini_20210623_WithImageURL.csv', sep=';')
metadata.head(5)

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,Drawer,ImageNumber,AuthorOriginal,Author,AuthorULAN,AuthorULANLabel,AuthorUncertain,AuthorModifier,CardboardURL,ImageURL,...,AuthorDeathLong,AuthorDeathLat,SimpleCollection,AuthorBirthCity,AuthorDeathCity,AuthorComplemented,AuthorComplement,AuthorNeighbour,AuthorNeighbourhood,uid
0,1A,1,"ALLEGRINI, Francesco",ALLEGRINI Francesco,ulan:500115272,"Allegrini, Francesco",0,,https://dhlabsrv4.epfl.ch/iiif_cini/1A%2F1A_1....,https://dhlabsrv4.epfl.ch/iiif_replica/cini%2F...,...,,,Corpus Gernsheim,,,0,No complement,0,No complement,253993c139284a45be233a13121ddeeb
1,1A,2,"BAROCCI, Federico",BAROCCI Federico,ulan:500115210,"Barocci, Federico",0,,https://dhlabsrv4.epfl.ch/iiif_cini/1A%2F1A_2....,https://dhlabsrv4.epfl.ch/iiif_replica/cini%2F...,...,12.633333,43.716667,Corpus Gernsheim,Urbino,Urbino,0,No complement,0,No complement,1323356994c24635a11fdcd9d5f9284a
2,1A,3,"BASSANO, Leandro",BASSANO Leandro,ulan:500015945,"Bassano, Leandro",0,,https://dhlabsrv4.epfl.ch/iiif_cini/1A%2F1A_3....,https://dhlabsrv4.epfl.ch/iiif_replica/cini%2F...,...,12.326667,45.438611,Corpus Gernsheim,Bassano del Grappa,Venice,0,No complement,0,No complement,a4268385f6384e61a3dd092bc6b8c083
3,1A,4,"CAMPIGLI, Massimo",CAMPIGLI Massimo,ulan:500029770,"Campigli, Massimo",0,,https://dhlabsrv4.epfl.ch/iiif_cini/1A%2F1A_4....,https://dhlabsrv4.epfl.ch/iiif_replica/cini%2F...,...,6.639811,43.269316,Corpus Gernsheim,Berlin,Var,0,No complement,0,No complement,550f368cdb4442aab4d5e2ada702d6ad
4,1A,5,"CARRACCI, A. attr.",CARRACCI A attr,,,0,,https://dhlabsrv4.epfl.ch/iiif_cini/1A%2F1A_5....,https://dhlabsrv4.epfl.ch/iiif_replica/cini%2F...,...,,,Corpus Gernsheim,,,1,attr,0,No complement,6047de5547b643cb87491be925748bee


In [5]:
# taking, for example, a subset of the data
titian_data = metadata[metadata['AuthorOriginal'].fillna('').str.startswith('TIZIANO')]
titian_data.shape

(2265, 39)

The image url allows to retrieve the image from the IIIF source.

In [16]:
example_url = titian_data.reset_index()['ImageURL'][0]

print(example_url)
display(Image(url=example_url, width=200, height=200))

https://dhlabsrv4.epfl.ch/iiif_replica/cini%2F1A%2F1A_583.jpg/full/full/0/default.jpg


#### Data loader for model?

### The morphograph

The morphograph is an annotated graph containing sets of two images that are considered similar according to the definition that they share a similar visual or physical trait. Where is this set stored? 

#### Data loader for this set?

In [None]:
# in jeanne's code
import pickle

with open('../../../../../../scratch/students/jeanne/replica_data/save_link_data_2018_08_02.pkl', 'rb') as f:
    morpho_graph_complete = pickle.load(f)

# contains uid	img1	img2	type	annotated

### Model

According to Seguin (2016), the model to find similar images is obtained using the loss:

l(A,B,C) --> l(A,B) < l(B,C)

where A is the input image, B is the morphograph ground truth, C is an image that is considered similar by a pre-trained model.

So the steps are:

Model 1: e.g. any well performing model from https://pytorch.org/vision/stable/models.html, give all images in the data and obtain embedding (i.e. last CNN layer) of all of them. Compute most similar embeddings to the embeddings of the images in the morphograph.

Model 2: fine tune the model with the loss above. The model should stop at the embedding layer.

#### Is the loss and training defined somewhere?
#### Can we improve this? i.e. another loss, more finetuning layers on artworks?

Store final weights of the finetuned model for prediction.

Evaluation:
- metrics introduced by Seguin (2016), in replica_search.train_retrieval

### Topological Data Analysis

Since we wish to map the space of images so that the similar ones are close in space and that we can show a structure of such similarities, we use the **mapper** algorithm to find such structure and substructures.

Input: the embeddings of the images
TDA
Output: skeleton of the space, clusters

Evaluation: use the clusters and the closeness of the clusters in the skeleton to evaluate the structure.
Based on: morphohraph, predictions of the model based on different inputs.

In [None]:
from replica_analysis.topology_analysis.Mapper_Tools_Repl import MapperGraph
import kmapper as km


### Visualize the results

Using Flask, improve the visualization of the results.

### More:

#### Can we compare the process to a similar process on the metadata?
#### Do the clusters / structure of clusters mirror what we already know?