## Uso de TF-IDF para calcular distancia entre vectores de descripción

En este notebook se utilizará el algoritmo TF-IDF como approach alternativo para calcular similitud entre diferentes imágenes de culturas diferentes.

### Cargamos el dataset

In [1]:
import pandas as pd

In [2]:
df = pd.read_excel('../../data/dataset_v3.xlsx')
df = df.drop('Unnamed: 0', axis=1)
df.head()

Unnamed: 0,catalogation_id,cronology,cronology_time,culture_cl,morfofunctional_category,description,principal_scene,decoration_tecnique_external_body_section1,color_external_body_section1,color_internal_body_section1,...,trait_n89,trait_n90,trait_n100,trait_n101,trait_n102,trait_n103,trait_n104,trait_n105,file_path,image_path
0,ML020107,Horizonte Medio,7,Sican,botella doble cuerpo asa puente cintada silbadora,botella doble cuerpo asa puente cintada silbad...,,pintado escultorico,crema y naranja,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020107a.jpg
1,ML020108,Horizonte Medio,7,Sican,botella doble pico asa puente cintada escultorica,botella doble pico asa puente cintada escultor...,,pintado escultorico,rojo y naranja,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020108a.jpg
2,ML020109,Horizonte Medio,7,Sican,botella gollete asa puente cintada protoma sil...,botella gollete asa puente cintada protoma sil...,,pintado escultorico,crema y naranja,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020109a.jpg
3,ML020110,Horizonte Medio,7,Sican,botella gollete asa puente cintada protoma sil...,botella gollete asa puente cintada protoma sil...,,pintado escultorico,crema y naranja,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020110a.jpg
4,ML020111,Horizonte Medio,7,Sican,botella gollete asa puente cintada protoma sil...,botella gollete asa puente cintada protoma sil...,,pintado escultorico,crema y marron,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020111a.jpg


### Utilizamos la librería NLTK que nos permitirá implementar el algoritmo

In [2]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer

Definimos el lematizador y removemos las stopwords

Instalamos el tokenizador y lematizador

In [22]:
import nltk

nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/ldavico/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /Users/ldavico/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /Users/ldavico/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [23]:
stop_words = set(stopwords.words("spanish"))    
lemmatizer = WordNetLemmatizer()


In [43]:
texto1 = 'Hola soy Luciano'
texto2 = 'Hola soy Luciano y me gusta jugar futbol'

In [44]:
text1_tokenized = ' '.join([
    lemmatizer.lemmatize(word.lower()) 
        for word in word_tokenize(texto1) if word.isalnum() and word.lower() not in stop_words
    ])
text2_tokenized = ' '.join([
    lemmatizer.lemmatize(word.lower()) 
        for word in word_tokenize(texto2) if word.isalnum() and word.lower() not in stop_words
    ])

### Vectorizamos por medio de TFidf vectorizer y calculamos la similitud

In [49]:
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform([text1_tokenized, text2_tokenized])

In [46]:
similitud = 1 - (tfidf_matrix * tfidf_matrix.T).toarray()[0, 1]
similitud

0.4976712217743283

### Ahora lo aplicamos a nuestro dataset y guardamos nuestras features

In [62]:
df = df[~df.description.isnull()][['catalogation_id', 'culture_cl', 'description']]
descriptions = df.description.tolist()

In [67]:
def tokenize(description, lemmatizer, stop_words) -> str:
    return ' '.join([
    lemmatizer.lemmatize(word.lower()) 
        for word in word_tokenize(description) if word.isalnum() and word.lower() not in stop_words
    ])

def tokenize_descriptions(descriptions, lemmatizer, stop_words) -> list:
    tokenized = [tokenize(desc, lemmatizer, stop_words) for desc in descriptions]
    return tokenized

In [68]:
tokenized_descriptions = tokenize_descriptions(descriptions, lemmatizer, stop_words)

In [92]:
def get_vectors_tf_idf(descriptions, vectorizer):
    tfidf_matrix = vectorizer.fit_transform(descriptions)
    return tfidf_matrix.toarray()

In [93]:
description_vectors_tfidf = get_vectors_tf_idf(tokenized_descriptions, vectorizer).tolist()


In [94]:
df['tfidf_vector'] = description_vectors_tfidf

In [95]:
df

Unnamed: 0,catalogation_id,culture_cl,description,tfidf_vector
0,ML020107,Sican,botella doble cuerpo asa puente cintada silbad...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
1,ML020108,Sican,botella doble pico asa puente cintada escultor...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
2,ML020109,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
3,ML020110,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
4,ML020111,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
...,...,...,...,...
33577,ML038832,Tiahuanaco,plato con diseños geometricos de lineas horizo...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
33578,ML038833,Tiahuanaco,plato con diseños geometricos de eses ( s) y l...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
33579,ML015075,Cajamarca,cuenco escultorico que representa a un felino ...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
33580,ML015241,Cajamarca,cuenco con representacion de cabeza estilizada...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."


In [97]:
df.to_csv('../../data/tfidf_vectors.csv', sep=';', index=False)

### Calculamos las distancias

In [3]:
import pandas as pd
import numpy as np
import time
from numpy.linalg import norm

In [4]:
df = pd.read_csv('../../data/tfidf_vectors.csv', sep=';')
df.head()

Unnamed: 0,catalogation_id,culture_cl,description,tfidf_vector
0,ML020107,Sican,botella doble cuerpo asa puente cintada silbad...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
1,ML020108,Sican,botella doble pico asa puente cintada escultor...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
2,ML020109,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
3,ML020110,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
4,ML020111,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."


In [5]:
def str_to_list(string):
    string_list = string.strip('][').split(', ')
    float_list = [float(i) if len(i) > 0 else 0.0 for i in string_list]
    return float_list

df.tfidf_vector = df.tfidf_vector.apply(str_to_list)

In [4]:
cultures = df.culture_cl.unique().tolist()
culture_pairs = [(a, b) for idx, a in enumerate(cultures) for b in cultures[idx + 1:]]
len(culture_pairs)

171

In [5]:
len(df.tfidf_vector[0])

2426

Creamos una funcion que optimiza el calculo de distancia cosenoidal

In [6]:
def get_positive_positions(v1, v2):
    positions = [i for i in range(2426) if v1[i] != 0.0 and v2[i] != 0.0]
    return positions

def _cosine_similarity(v1, v2):
    positions = get_positive_positions(v1, v2)
    if len(positions) == 0:
        return 0.0
    
    reshaped_v1 = [v1[i] for i in positions]
    reshaped_v2 = [v2[j] for j in positions]

    distance = np.dot(reshaped_v1, reshaped_v2) / (norm(v1) * norm(v2))
    return 1 - distance

In [7]:
df.head()

Unnamed: 0,catalogation_id,culture_cl,description,tfidf_vector
0,ML020107,Sican,botella doble cuerpo asa puente cintada silbad...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
1,ML020108,Sican,botella doble pico asa puente cintada escultor...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
2,ML020109,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
3,ML020110,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
4,ML020111,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."


In [8]:
df.groupby('culture_cl')['culture_cl'].count().sort_values(ascending=False)

culture_cl
Moche         14250
Chimu          4834
Wari           4749
Nasca          3097
Sican          1367
Chancay         931
Cajamarca       889
Tiahuanaco      710
Salinar         671
Inca            622
Cupisnique      528
Vicus           374
Recuay          327
Paracas         136
Chincha          39
Pukara           31
Gallinazo        20
Chanca            4
Lima              2
Name: culture_cl, dtype: int64

In [9]:
seconds_per_data = 0.000110168477982
distances = {p: list() for p in culture_pairs}

for i in range(12, len(culture_pairs)):
    start_time = time.time()
    c1 = culture_pairs[i][0]
    c2 = culture_pairs[i][1]
    
    print(f'{i} -> Cultures: {c1}, {c2}')
    df_c1 = df[df.culture_cl == c1]
    df_c2 = df[df.culture_cl == c2]

    # Time estimation
    len_c1 = len(df_c1)
    len_c2 = len(df_c2)
    total_data = len_c1 * len_c2
    estimated_s = total_data * seconds_per_data
    estimated_minute = '0' + str(round(estimated_s / 60)) if round(estimated_s / 60) < 10 else str(round(estimated_s / 60))
    estimated_second = '0' + str(np.round(estimated_s % 60, 2)) if np.round(estimated_s % 60, 2) < 10 else str(np.round(estimated_s % 60, 2))
    print(f'Estimated time in seconds: {np.round(estimated_s, 2)}')
    print(f'Estimated time in minutes: {estimated_minute}:{estimated_second}')

    embed_c1 = df_c1.tfidf_vector
    embed_c2 = df_c2.tfidf_vector

    for e1 in embed_c1:
        for e2 in embed_c2:
            dist = _cosine_similarity(e1, e2)
            distances[(c1, c2)].append(dist)
    
    end_time = time.time()

    # Save distances in file
    mean = np.mean(distances[(c1, c2)])
    file = open(f'../../data/distances/mean_cosine_distances_tfidf.txt', 'a')
    file.write(f'{i}: {c1}-{c2} -> {mean}\n')
    file.close()

    print(f'Time distances between: {end_time - start_time}s')
    print(f'Mean distance: {mean}')
    print()

12 -> Cultures: Sican, Chimu
Estimated time in seconds: 728.0
Estimated time in minutes: 12:08.0
Time distances between: 930.1245942115784s
Mean distance: 0.6631710247118131

13 -> Cultures: Sican, Chanca
Estimated time in seconds: 0.6
Estimated time in minutes: 00:00.6
Time distances between: 0.7780289649963379s
Mean distance: 0.6087750872460312

14 -> Cultures: Sican, Chancay
Estimated time in seconds: 140.21
Estimated time in minutes: 02:20.21
Time distances between: 105.23850703239441s
Mean distance: 0.26108251323516535

15 -> Cultures: Sican, Inca
Estimated time in seconds: 93.67
Estimated time in minutes: 02:33.67
Time distances between: 94.43828511238098s
Mean distance: 0.4717432343575313

16 -> Cultures: Sican, Tiahuanaco
Estimated time in seconds: 106.93
Estimated time in minutes: 02:46.93
Time distances between: 79.75015997886658s
Mean distance: 0.24987923978735557

17 -> Cultures: Sican, Chincha
Estimated time in seconds: 5.87
Estimated time in minutes: 00:05.87
Time distanc

## Calculamos la matriz por periodo

In [1]:
import pandas

In [6]:
dataset_df = pd.read_excel('../../data/dataset_v4.xlsx')
dataset_df = dataset_df.drop('Unnamed: 0', axis=1)
dataset_df.head()

Unnamed: 0,catalogation_id,cronology,cronology_time,culture_cl,morfofunctional_category,description,principal_scene,decoration_tecnique_external_body_section1,color_external_body_section1,color_internal_body_section1,...,trait_n89,trait_n90,trait_n100,trait_n101,trait_n102,trait_n103,trait_n104,trait_n105,file_path,image_path
0,ML020107,Horizonte Medio,7,Sican,botella doble cuerpo asa puente cintada silbadora,botella doble cuerpo asa puente cintada silbad...,,pintado escultorico,crema y naranja,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020107a.jpg
1,ML020108,Horizonte Medio,7,Sican,botella doble pico asa puente cintada escultorica,botella doble pico asa puente cintada escultor...,,pintado escultorico,rojo y naranja,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020108a.jpg
2,ML020109,Horizonte Medio,7,Sican,botella gollete asa puente cintada protoma sil...,botella gollete asa puente cintada protoma sil...,,pintado escultorico,crema y naranja,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020109a.jpg
3,ML020110,Horizonte Medio,7,Sican,botella gollete asa puente cintada protoma sil...,botella gollete asa puente cintada protoma sil...,,pintado escultorico,crema y naranja,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020110a.jpg
4,ML020111,Horizonte Medio,7,Sican,botella gollete asa puente cintada protoma sil...,botella gollete asa puente cintada protoma sil...,,pintado escultorico,crema y marron,,...,0,0,1,1,1,0,1,0,data/sican_7/7 ADMINISTRADOR COLECCIONES VIRTU...,data/images/ML020111a.jpg


In [7]:
df['cronology_time'] = dataset_df['cronology_time']

In [8]:
df

Unnamed: 0,catalogation_id,culture_cl,description,tfidf_vector,cronology_time
0,ML020107,Sican,botella doble cuerpo asa puente cintada silbad...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",7
1,ML020108,Sican,botella doble pico asa puente cintada escultor...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",7
2,ML020109,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",7
3,ML020110,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",7
4,ML020111,Sican,botella gollete asa puente cintada protoma sil...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",7
...,...,...,...,...,...
33576,ML038832,Tiahuanaco,plato con diseños geometricos de lineas horizo...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",6
33577,ML038833,Tiahuanaco,plato con diseños geometricos de eses ( s) y l...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",6
33578,ML015075,Cajamarca,cuenco escultorico que representa a un felino ...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",6
33579,ML015241,Cajamarca,cuenco con representacion de cabeza estilizada...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",5


In [9]:
ctimes = df.cronology_time.unique().tolist()
ctimes

[7, 6, 8, 5, 9]

In [10]:
ctimes.remove(9)

In [11]:
from numpy.linalg import norm

def get_positive_positions(v1, v2, vector_length):
    positions = [i for i in range(vector_length) if v1[i] != 0.0 and v2[i] != 0.0]
    return positions

def _cosine_similarity(v1, v2):
    positions = get_positive_positions(v1, v2, len(v1))
    if len(positions) == 0:
        return 0.0
    
    reshaped_v1 = [v1[i] for i in positions]
    reshaped_v2 = [v2[j] for j in positions]

    distance = np.dot(reshaped_v1, reshaped_v2) / (norm(v1) * norm(v2))
    return 1 - distance

In [12]:
ctimes

[7, 6, 8, 5]

In [14]:
for ctime in ctimes:
    filtered_df = df[df['cronology_time'] == ctime]
    print(f'Cronology_time: {ctime}')
    display(filtered_df.groupby('culture_cl')['culture_cl'].count().sort_values(ascending=False))

    cultures = filtered_df.culture_cl.unique().tolist()
    culture_pairs = [(a, b) for idx, a in enumerate(cultures) for b in cultures[idx + 1:]]
    print(f'Culture pairs: {len(culture_pairs)}')

    distances = {p: list() for p in culture_pairs}

    for i in range(1, len(culture_pairs)):
        start_time = time.time()
        c1 = culture_pairs[i][0]
        c2 = culture_pairs[i][1]
        
        print(f'{i} -> Cultures: {c1}, {c2}')
        df_c1 = filtered_df[filtered_df.culture_cl == c1]
        df_c2 = filtered_df[filtered_df.culture_cl == c2]

        embed_c1 = df_c1.tfidf_vector
        embed_c2 = df_c2.tfidf_vector

        for e1 in embed_c1:
            for e2 in embed_c2:
                dist = _cosine_similarity(e1, e2)
                distances[(c1, c2)].append(dist)
        
        end_time = time.time()

        # Save distances in file
        mean = np.mean(distances[(c1, c2)])
        file = open(f'../../data/distances/mean_cosine_distances_tfidf_ctime_{ctime}.txt', 'a')
        file.write(f'{i}: {c1}-{c2} -> {mean}\n')
        file.close()

        print(f'Time distances between: {end_time - start_time}s')
        print(f'Mean distance: {mean}')
        print()


Cronology_time: 7


culture_cl
Wari          4749
Sican         1365
Tiahuanaco     698
Nasca           70
Cajamarca        4
Chincha          1
Name: culture_cl, dtype: int64

Culture pairs: 15
1 -> Cultures: Sican, Cajamarca
Time distances between: 0.35634517669677734s
Mean distance: 0.1279917424395847

2 -> Cultures: Sican, Nasca
Time distances between: 7.813749074935913s
Mean distance: 0.30057266722496945

3 -> Cultures: Sican, Tiahuanaco
Time distances between: 74.55452013015747s
Mean distance: 0.2507606244209369

4 -> Cultures: Sican, Chincha
Time distances between: 0.09331917762756348s
Mean distance: 0.15944027322011936

5 -> Cultures: Wari, Cajamarca
Time distances between: 1.6052350997924805s
Mean distance: 0.26877209837481914

6 -> Cultures: Wari, Nasca
Time distances between: 31.62102508544922s
Mean distance: 0.38326872218514446

7 -> Cultures: Wari, Tiahuanaco
Time distances between: 319.6554629802704s
Mean distance: 0.3871902469056842

8 -> Cultures: Wari, Chincha
Time distances between: 0.443774938583374s
Mean distance: 0.32699440535996166

9 -> Cultures: Cajamarca, Nasca
Time distances between: 0.03208303451538086s
Mean distance: 0.419703595852

culture_cl
Moche         14250
Nasca          3027
Cajamarca       883
Recuay          327
Gallinazo        20
Tiahuanaco       10
Lima              2
Pukara            2
Salinar           2
Chimu             1
Vicus             1
Name: culture_cl, dtype: int64

Culture pairs: 55
1 -> Cultures: Recuay, Lima
Time distances between: 0.06767010688781738s
Mean distance: 0.4179146775785226

2 -> Cultures: Recuay, Gallinazo
Time distances between: 0.6701850891113281s
Mean distance: 0.47665323281147115

3 -> Cultures: Recuay, Moche
Time distances between: 505.87014293670654s
Mean distance: 0.5210296510899314

4 -> Cultures: Recuay, Salinar
Time distances between: 0.0761568546295166s
Mean distance: 0.4745593229872898

5 -> Cultures: Recuay, Cajamarca
Time distances between: 19.947176933288574s
Mean distance: 0.18332565440224538

6 -> Cultures: Recuay, Pukara
Time distances between: 0.06794595718383789s
Mean distance: 0.4620462926548127

7 -> Cultures: Recuay, Nasca
Time distances between: 88.43652701377869s
Mean distance: 0.34315100081121896

8 -> Cultures: Recuay, Chimu
Time distances between: 0.017881155014038086s
Mean distance: 0.005995790570764781

9 -> Cultures: Recuay, Tiahuanaco
Time distances between: 0.30277109146118164s
Mean distance: 0.3365

culture_cl
Chimu      4833
Chancay     931
Chincha      38
Chanca        4
Sican         2
Inca          1
Pukara        1
Name: culture_cl, dtype: int64

Culture pairs: 21
1 -> Cultures: Sican, Chanca
Time distances between: 0.0023000240325927734s
Mean distance: 0.8408937849044532

2 -> Cultures: Sican, Chancay
Time distances between: 0.15228271484375s
Mean distance: 0.23669026485934783

3 -> Cultures: Sican, Inca
Time distances between: 0.0008370876312255859s
Mean distance: 0.0

4 -> Cultures: Sican, Chincha
Time distances between: 0.006162166595458984s
Mean distance: 0.17354537730657185

5 -> Cultures: Sican, Pukara
Time distances between: 0.0006730556488037109s
Mean distance: 0.0

6 -> Cultures: Chimu, Chanca
Time distances between: 2.4069156646728516s
Mean distance: 0.6240278037449077

7 -> Cultures: Chimu, Chancay
Time distances between: 369.22514820098877s
Mean distance: 0.2928021507383897

8 -> Cultures: Chimu, Inca
Time distances between: 0.40375494956970215s
Mean distance: 0.3260712680853003

9 -> Cultures: Chimu, Chincha
Time distances between: 16.03919291496277s
Mean distance: 0.3443179499651776

10 -> Cultures: Chimu, Pukara

culture_cl
Salinar       668
Cupisnique    528
Vicus         373
Paracas       136
Pukara         28
Cajamarca       2
Tiahuanaco      2
Name: culture_cl, dtype: int64

Culture pairs: 21
1 -> Cultures: Paracas, Vicus
Time distances between: 3.9243459701538086s
Mean distance: 0.24943234373857315

2 -> Cultures: Paracas, Salinar
Time distances between: 8.09825611114502s
Mean distance: 0.31954503319880806

3 -> Cultures: Paracas, Tiahuanaco
Time distances between: 0.028102874755859375s
Mean distance: 0.3046112048711121

4 -> Cultures: Paracas, Pukara
Time distances between: 0.3842151165008545s
Mean distance: 0.42159852232682593

5 -> Cultures: Paracas, Cajamarca
Time distances between: 0.0181119441986084s
Mean distance: 0.12870092285642626

6 -> Cultures: Cupisnique, Vicus
Time distances between: 20.994359016418457s
Mean distance: 0.4787484262904773

7 -> Cultures: Cupisnique, Salinar
Time distances between: 33.56985092163086s
Mean distance: 0.3644625842904113

8 -> Cultures: Cupisnique, Tiahuanaco
Time distances between: 0.07407784461975098s
Mean distance: 0.15125154761297244

9 -> Cultures: Cupisnique, Pukara
Time distances between: 1.1090278625488281s