# Part 2: Word Representations and Lexical Similarities

This part has 20 points in total.

Here we will compare different measures of semantic similarity between words: (1) WordNet depth distance (2) cosine similarity of words using a given GloVe model and (3) Resnet50 image features.

For more reading on vector semantics got to Chapter 6, sections 6.4 and 6.8:
https://web.stanford.edu/~jurafsky/slp3/6.pdf

To learn about Wordnet: https://www.nltk.org/howto/wordnet.html

For additional Wordnet discussions see Chapter 19: https://web.stanford.edu/~jurafsky/slp3/19.pdf

The GloVe word embeddings are described in [this paper](https://nlp.stanford.edu/projects/glove/)

Resnet50: Deep Residual Learning for Image Recognition are described in [this paper](https://arxiv.org/abs/1512.03385)

## Part 2.1: Semantic similarity with WordNet

In [1]:
# load wordnet
from nltk.corpus import wordnet as wn

# load word-vector glov
import gensim.downloader as gensim_api
glove_model = gensim_api.load("glove-wiki-gigaword-50")

from itertools import combinations, product
from scipy.stats import spearmanr
import numpy as np

In [2]:
some_words = ['car', 'dog', 'banana', 'delicious', 'baguette', 'jumping', 'hugging', 'election']

### Explore Word Representations in English WordNet 

In [3]:
# For each word above print their synsets
# for each synset print all lemmas, hypernyms, hyponyms

import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')

for word in some_words:
    synsets = wn.synsets(word)
    print(f"{word} synsets: {synsets}")
    for synset in synsets:
        # for each synset print all lemmas
        lemmas = synset.lemmas()
        print(f"\t{synset} lemmas: {lemmas}")
        # for each synset print all hypernyms
        hypernyms = synset.hypernyms()
        print(f"\t{synset} hypernyms: {hypernyms}")
        # for each synset print all hyponyms
        hyponyms = synset.hyponyms()
        print(f"\t{synset} hyponyms: {hyponyms}")

[nltk_data] Downloading package wordnet to /Users/ezz/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /Users/ezz/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


car synsets: [Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]
	Synset('car.n.01') lemmas: [Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'), Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]
	Synset('car.n.01') hypernyms: [Synset('motor_vehicle.n.01')]
	Synset('car.n.01') hyponyms: [Synset('ambulance.n.01'), Synset('beach_wagon.n.01'), Synset('bus.n.04'), Synset('cab.n.03'), Synset('compact.n.03'), Synset('convertible.n.01'), Synset('coupe.n.01'), Synset('cruiser.n.01'), Synset('electric.n.01'), Synset('gas_guzzler.n.01'), Synset('hardtop.n.01'), Synset('hatchback.n.01'), Synset('horseless_carriage.n.01'), Synset('hot_rod.n.01'), Synset('jeep.n.01'), Synset('limousine.n.01'), Synset('loaner.n.02'), Synset('minicar.n.01'), Synset('minivan.n.01'), Synset('model_t.n.01'), Synset('pace_car.n.01'), Synset('racer.n.02'), Synset('roadster.n.01'), Synset('sedan.n.01'), Synset('sport_utility.n.01'), Synset

#### Measure The Lexical Similarity 

In [4]:
# Wu-Palmer Similarity is a measure of similarity between to sense based on their depth distance. 
#
# For each pair of words, find their closes sense based on Wu-Palmer Similarity.
# List all word pairs and their highest possible wup_similarity. 
# Use wn.wup_similarity(s1, s2) and itertools (combinations and product).
# if there is no connection between two words, put 0.

wn_sims = []
for word1, word2 in combinations(some_words, 2):
    max_sim = 0
    max_sim = 0
    for s1 in wn.synsets(word1):
        for s2 in wn.synsets(word2):
            sim = wn.wup_similarity(s1, s2)
            if sim and sim > max_sim:
                max_sim = sim
    wn_sims.append(max_sim)
    print(f"{word1:9} {word2:9} {max_sim:6.3f}")

# which word pair are the most similar words?

car       dog        0.667
car       banana     0.421
car       delicious  0.364
car       baguette   0.211
car       jumping    0.167
car       hugging    0.235
car       election   0.133
dog       banana     0.632
dog       delicious  0.556
dog       baguette   0.556
dog       jumping    0.333
dog       hugging    0.286
dog       election   0.182
banana    delicious  0.750
banana    baguette   0.556
banana    jumping    0.167
banana    hugging    0.250
banana    election   0.143
delicious baguette   0.500
delicious jumping    0.500
delicious hugging    0.400
delicious election   0.222
baguette  jumping    0.154
baguette  hugging    0.222
baguette  election   0.125
jumping   hugging    0.400
jumping   election   0.667
hugging   election   0.200


## Part 2.2: Semantic similarity with GloVe and comparison with WordNet

### Measure the similarities on GloVe Word Vectors

In [5]:
glov_sims = []
for word1, word2 in combinations(some_words, 2):
    max_sim = glove_model.similarity(word1, word2)
    glov_sims.append(max_sim)
    print(f"{word1:9} {word2:9} {max_sim:6.3f}")


car       dog        0.464
car       banana     0.219
car       delicious  0.068
car       baguette   0.046
car       jumping    0.516
car       hugging    0.278
car       election   0.333
dog       banana     0.333
dog       delicious  0.404
dog       baguette   0.018
dog       jumping    0.539
dog       hugging    0.410
dog       election   0.181
banana    delicious  0.487
banana    baguette   0.450
banana    jumping    0.108
banana    hugging    0.127
banana    election   0.164
delicious baguette   0.421
delicious jumping    0.042
delicious hugging    0.142
delicious election   0.028
baguette  jumping   -0.075
baguette  hugging    0.161
baguette  election  -0.091
jumping   hugging    0.447
jumping   election   0.206
hugging   election  -0.076


#### Examine if two measures correlate

In [6]:
# a correlation coefficent of two lists
print("Spearman's rho", spearmanr(glov_sims, wn_sims))

# Higher correlation (closer to 1.0) means two measures agree with each other.

Spearman's rho SpearmanrResult(correlation=0.4222499442309076, pvalue=0.02519986065189366)


How do the two similarities compare? 

In [7]:
correlation, pvalue = spearmanr(glov_sims, wn_sims)
corr = """
Correlation is a value that represents the relationship between two variables. 
It ranges from -1 to 1. A correlation of 1 means that as one variable increases, 
the other variable increases as well, indicating a perfect positive relationship. 
A correlation of -1 means that as one variable increases, the other variable 
decreases, indicating a perfect negative relationship. A correlation of 0 means
that there is no relationship between the variables.
"""

pv = """
The P-value is a statistical measure that helps to determine the strength of a 
correlation. It expresses the probability that the correlation observed between 
two variables is random. A P-value below 0.05 suggests that the correlation is 
statistically significant and likely not a result of chance. Conversely, a 
P-value above 0.05 suggests that the correlation is not statistically significant 
and may be a result of chance.
"""

tt = """ 
As we can see are the pvalue below 0.02, hence there is a statistical significant
corrrerlation. Furthermore, The correlation in it self is in the positive end 
in the range of [-1,1]. Therefore, will most variables chnage when one of them 
are changes.

Correlation tells us how the big chance there are for a variable to be influenced
by another. Pvalue tells us if the correlation is significat or not.
"""

print(f'{corr}\n In this case the corrrelation is: {correlation}')
print(f'{pv}\nIn this case the pvalues is: {pvalue}')
print(f'{tt}')


Correlation is a value that represents the relationship between two variables. 
It ranges from -1 to 1. A correlation of 1 means that as one variable increases, 
the other variable increases as well, indicating a perfect positive relationship. 
A correlation of -1 means that as one variable increases, the other variable 
decreases, indicating a perfect negative relationship. A correlation of 0 means
that there is no relationship between the variables.

 In this case the corrrelation is: 0.4222499442309076

The P-value is a statistical measure that helps to determine the strength of a 
correlation. It expresses the probability that the correlation observed between 
two variables is random. A P-value below 0.05 suggests that the correlation is 
statistically significant and likely not a result of chance. Conversely, a 
P-value above 0.05 suggests that the correlation is not statistically significant 
and may be a result of chance.

In this case the pvalues is: 0.02519986065189366
 
As w

### Word Vector Representations in GloVe

In [8]:
# Each word is represented as a vector:
print('dog =', glove_model['dog'])

# matrix of all word vectors is trained as parameters of a language model:
# P( target_word | context_word ) = f(word, context ; params)
#
# Words in a same sentence and in close proximity are in context of each other.

dog = [ 0.11008   -0.38781   -0.57615   -0.27714    0.70521    0.53994
 -1.0786    -0.40146    1.1504    -0.5678     0.0038977  0.52878
  0.64561    0.47262    0.48549   -0.18407    0.1801     0.91397
 -1.1979    -0.5778    -0.37985    0.33606    0.772      0.75555
  0.45506   -1.7671    -1.0503     0.42566    0.41893   -0.68327
  1.5673     0.27685   -0.61708    0.64638   -0.076996   0.37118
  0.1308    -0.45137    0.25398   -0.74392   -0.086199   0.24068
 -0.64819    0.83549    1.2502    -0.51379    0.04224   -0.88118
  0.7158     0.38519  ]


### Implement Cosine Similarity 

In [9]:
# based on equation 6.10 J&M (2019)
# https://web.stanford.edu/~jurafsky/slp3/6.pdf
#
def cosine_sim(v1, v2):
    out = 0
    dot = np.dot(v1,v2)
    norm = np.linalg.norm(v1) * np.linalg.norm(v2)
    out = dot / norm
    return out

cosine_sim(glove_model['car'], glove_model['automobile'])

0.6956217

### Implement top-n most similar words 

In [10]:
# search in glove_model:
def top_n(word, n):
    # example: top_n('dog', 3) =  
    #[('cat', 0.9218005537986755),
    # ('dogs', 0.8513159155845642),
    # ('horse', 0.7907583713531494)]
    # similar to glove_model.most_similar('dog', topn=3)

    out = []

    word_vec = glove_model[word]
    for word, vector in zip(glove_model.index_to_key, glove_model.get_normed_vectors()):
        sim = cosine_sim(word_vec, vector)
        out.append((word, sim))
    out.sort(key=lambda x: x[1], reverse=True)
    return out[:n]

top_n('dog', 3)


[('dog', 1.0000001), ('cat', 0.9218006), ('dogs', 0.8513159)]

## Part 2.3: Semantic similarity with visual features (ResNet)


### Measure the similarities with the ResNet vectors

In this part we will use visual features of images representing these objects. If you are interested how we extract these features have a look at `visual-feature-extraction.ipynb` but understanding that notebook is not necessary to complete this part as we have saved them for you they are loaded in the code below.

In [11]:
# run the feature extractor on all images
# make sure that the order of features is identical to the order of words (variable some_words)

In [12]:
some_words

['car',
 'dog',
 'banana',
 'delicious',
 'baguette',
 'jumping',
 'hugging',
 'election']

In [13]:
object_indices = {v:k for k,v in enumerate(some_words)}
print(object_indices)

{'car': 0, 'dog': 1, 'banana': 2, 'delicious': 3, 'baguette': 4, 'jumping': 5, 'hugging': 6, 'election': 7}


In [14]:
import torch
image_features = torch.load('image_features.pt')

In [15]:
image_features.shape

torch.Size([8, 2048])

In [16]:
from sklearn.metrics.pairwise import cosine_similarity

resnet_sims = []

# Load the Resnet vectors and create a for loop to compare the words pairwise.

# A loop that creates similarities for images pairwise TODO
# for resnet
for w1, w2 in combinations(some_words, 2):
    w1_visfeat = image_features[object_indices[w1]].unsqueeze(0).detach().numpy()
    w2_visfeat = image_features[object_indices[w2]].unsqueeze(0).detach().numpy()
    max_sim = cosine_similarity(w1_visfeat, w2_visfeat)[0][0]
    resnet_sims.append(max_sim)
    print(f"{w1:9} {w2:9} {max_sim:6.3f}")


car       dog        0.779
car       banana     0.749
car       delicious  0.751
car       baguette   0.738
car       jumping    0.752
car       hugging    0.734
car       election   0.752
dog       banana     0.770
dog       delicious  0.786
dog       baguette   0.769
dog       jumping    0.763
dog       hugging    0.803
dog       election   0.802
banana    delicious  0.791
banana    baguette   0.776
banana    jumping    0.749
banana    hugging    0.751
banana    election   0.735
delicious baguette   0.778
delicious jumping    0.787
delicious hugging    0.748
delicious election   0.758
baguette  jumping    0.750
baguette  hugging    0.731
baguette  election   0.746
jumping   hugging    0.765
jumping   election   0.758
hugging   election   0.785


#### Examine if Resnet and GloVe similarities correlate

In [17]:
# a correlation coefficent of two lists
print("Spearman's rho", spearmanr(resnet_sims, glov_sims))

# Higher correlation (closer to 1.0) means two measures agree with each other.

Spearman's rho SpearmanrResult(correlation=0.3459222769567597, pvalue=0.07136834743507672)


How does semantic similarity from word vectors compare with the visual similarity? Are there differences between different words? 

In [1]:
# Write your answer here
part1 = """
In order to compare semantic similarity from word vectors with visual similarity, 
one could calculate the correlation between the two sets of similarity scores. 
The correlation coefficient, represented by a value between -1 and 1, measures the 
strength and direction of the relationship between the two variables. A positive 
correlation indicates that as one variable increases, the other variable also 
increases, while a negative correlation indicates that as one variable increases, 
the other variable decreases. A correlation of 0 indicates no relationship between 
the variables.
"""

part2 = """
The p-value is a statistical measure that is used to determine the significance 
of the correlation. It represents the probability that the correlation observed 
between the two sets of similarity scores is due to chance. A low p-value 
(typically less than 0.05) indicates that the correlation is statistically 
significant and likely not due to chance, while a high p-value (greater than 0.05) 
indicates that the correlation is not statistically significant and may be due to chance.
"""

part3 = """
It is likely that there will be differences in the correlation and p-value between 
different words, as the relationship between visual similarity and semantic similarity 
may vary depending on the specific words being compared. Overall, this will provide a 
better understanding of how semantic similarity from word vectors compare with the visual 
similarity and also to know if there are any differences between different words.
"""

comparesing_of_result = """ 
The semantic similarity from word vectors is a measure of the relatedness of words based 
on the cosine similarity of their word embeddings, which are vector representations of 
words learned from large amounts of text data. On the other hand, visual similarity is 
a measure of how similar two images look to each other.

The results of the Spearman's rho correlation show that there is a moderate positive 
correlation (0.346) between the semantic similarity from word vectors and the visual similarity. 
However, the p-value (0.071) suggests that this correlation may not be statistically significant.

There are differences between different words in terms of the correlation between 
semantic similarity and visual similarity. Some words may have a strong correlation 
while others may have a weaker correlation or no correlation at all. It is important 
to consider the specific words and their context when interpreting the results.
"""

print(part1)
print(part2)
print(part3)
print(comparesing_of_result)


In order to compare semantic similarity from word vectors with visual similarity, 
one could calculate the correlation between the two sets of similarity scores. 
The correlation coefficient, represented by a value between -1 and 1, measures the 
strength and direction of the relationship between the two variables. A positive 
correlation indicates that as one variable increases, the other variable also 
increases, while a negative correlation indicates that as one variable increases, 
the other variable decreases. A correlation of 0 indicates no relationship between 
the variables.


The p-value is a statistical measure that is used to determine the significance 
of the correlation. It represents the probability that the correlation observed 
between the two sets of similarity scores is due to chance. A low p-value 
(typically less than 0.05) indicates that the correlation is statistically 
significant and likely not due to chance, while a high p-value (greater than 0.05) 
indicates 

## Part 2.4 Optional: Examine Fairness In Data Driven Word Vectors

There are no points for this part but you are welcome to further explore this topic if you are inetrested in it. We will address it again in the Computational semantics course.

Caliskan et al. (2017) argues that word vectors learn human biases from data. 

Try to replicate one of the tests of the paper:

Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. “Semantics derived automatically from language corpora contain human-like biases.” Science
356.6334 (2017): 183-186. http://opus.bath.ac.uk/55288/


For example on gender bias:
- Male names: John, Paul, Mike, Kevin, Steve, Greg, Jeff, Bill.
- Female names: Amy, Joan, Lisa, Sarah, Diana, Kate, Ann, Donna.
- Career words : executive, management, professional, corporation, salary, office, business, career.
- Family words : home, parents, children, family, cousins, marriage, wedding, relatives.


Report the average cosine similarity of male names to career words, and compare it with the average similarity of female names to career words. (repeat for family words) 

tokens in GloVe model are all in lower case.

Write at least one sentence to describe your observation.

The semantic similarity of words is a way to see how related words are by comparing their word embeddings, which are like map points for words made from analyzing lots of text. Visual similarity is how much two pictures look alike.

The Spearman's rho test showed that there is a medium connection (0.346) between semantic similarity and visual similarity. However, the test also showed that this connection may not be very strong because of the low p-value (0.071).

Different words can have different levels of connection between semantic similarity and visual similarity. Some words may have a strong connection, while others might not have a connection at all. It's important to consider the specific words and what they mean when looking at these results.

In [19]:
import numpy as np
import torch

word_embeddings = torch.load('image_features.pt')

print("Size of the word_embeddings tensor:", word_embeddings.shape)

male_names = ["john", "paul", "mike", "kevin", "steve", "greg", "jeff", "bill"]
female_names = ["amy", "joan", "lisa", "sarah", "diana", "kate", "ann", "donna"]
career_words = ["executive", "management", "professional", "corporation", "salary", "office", "business", "career"]
family_words = ["home", "parents", "children", "family", "cousins", "marriage", "wedding", "relatives"]

word_to_ix = {}
for i, word in enumerate(male_names + female_names + career_words + family_words):
    word_to_ix[word] = i

word_indices = [word_to_ix[word] for word in male_names + female_names + career_words + family_words]
word_embeddings = word_embeddings[word_indices, :]

num_words = len(male_names) + len(female_names) + len(career_words) + len(family_words)
if num_words != len(word_embeddings):
    raise ValueError("Number of words and number of embeddings do not match")

male_names_embeddings = word_embeddings[:len(male_names), :]
career_words_embeddings = word_embeddings[len(male_names) + len(female_names):, :]

male_career_similarities = np.dot(male_names_embeddings, career_words_embeddings.T) / (np.linalg.norm(male_names_embeddings, axis=1, keepdims=True) * np.linalg.norm(career_words_embeddings, axis=1))
average_male_career_similarity = male_career_similarities.mean()

female_names_embeddings = word_embeddings[len(male_names):len(male_names)+len(female_names), :]

female_career_similarities = np.dot(female_names_embeddings, career_words_embeddings.T) / (np.linalg.norm(female_names_embeddings, axis=1, keepdims=True) * np.linalg.norm(career_words_embeddings, axis=1))
average_female_career_similarity = female_career_similarities.mean()

# Compare the results
print(f"Average similarity of male names to career words: {average_male_career_similarity}")
print(f"Average similarity of female names to career words: {average_female_career_similarity}")

# Repeat for family words
family_word_embeddings = []
for word in family_words:
    word_embedding = word_embeddings[word_to_ix[word], :]
    family_word_embeddings.append(word_embedding)
family_word_embeddings = np.array(family_word_embeddings)

Size of the word_embeddings tensor: torch.Size([8, 2048])


IndexError: index 8 is out of bounds for dimension 0 with size 8