# Evaluating Word Embeddings Using WordSim-353

In this lab, you will learn how to evaluate word embeddings using the WordSim-353 dataset. 

Word embeddings are a way to represent words as vectors in a continuous vector space. Evaluating these embeddings is essential to understand their effectiveness in capturing semantic and syntactic relationships.

The WordSim-353 dataset is a standard benchmark used for this purpose. It contains 353 pairs of English words along with human-assigned similarity scores. The task is to calculate the similarity scores using word embeddings and compare them to human judgments using correlation metrics.


### Step 1: Load the WordSim-353 Dataset

The WordSim-353 dataset is publicly available. You can download it from the internet or use the version included in this lab.

### Step 2: Load Pre-trained Word Embeddings

Word embeddings like GloVe or Word2Vec are commonly used for these evaluations. We will use the GloVe embeddings in this lab. 

Download pre-trained embeddings (if not already downloaded):
 ```bash
 wget http://nlp.stanford.edu/data/glove.6B.zip
 unzip glove.6B.zip
 ```

#### Load the embeddings using Gensim.

### Step 3: Calculate Similarities

### Step 4: Evaluate Using Spearman Correlation

The **Spearman correlation coefficient** (denoted as $ \rho$ or $ r_s$) measures the strength and direction of a monotonic relationship between two variables. Unlike the Pearson correlation, Spearman's does not assume a linear relationship or normal distribution of the variables. Instead, it evaluates how well the relationship between two variables can be described using a monotonic function.

Spearman correlation is calculated by first converting the data values into ranks, then applying the Pearson correlation formula to the ranked data. It is especially useful for ordinal data or when the relationship between variables is not linear.

### Formula

The Spearman correlation coefficient is given by:

$$
\rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
$$

Where:
- $ d_i = R(x_i) - R(y_i)$: The difference between the ranks of corresponding values of $ x$ and $ y$.
- $ R(x_i)$ and $ R(y_i)$: The ranks of the $ i$-th observation in $ x$ and $ y$, respectively.
- $ n$: The number of data points.

### Steps to Calculate Spearman Correlation

1. Assign ranks to the data values for each variable.
2. Compute the rank differences ($ d_i$).
3. Square the rank differences and sum them ($ \sum d_i^2$).
4. Apply the formula to find $ \rho$.

The Spearman correlation ranges from $-1\) to $1\):
- $ \rho = 1$: Perfect positive monotonic relationship.
- $ \rho = -1$: Perfect negative monotonic relationship.
- $ \rho = 0$: No monotonic relationship.

### Example

If we have the data:

| $ x$ | $ y$ |
|--------|--------|
| 10     | 20     |
| 20     | 30     |
| 30     | 10     |

We would rank the values of $ x$ and $ y$, calculate $ d_i$, and apply the formula.

### Step 5: Interpret Results

A higher Spearman correlation indicates that the word embeddings better capture the semantic relationships as perceived by humans. Typical results for good embeddings range from 0.6 to 0.8, depending on the dataset and model.

#wv_glove_200.evaluate_word_pairs(datapath('wordsim353.tsv'))

### Part 2: Evaluating GloVe with Analogies

Word embeddings are often evaluated using analogy tasks. In these tasks, we assess whether embeddings can correctly complete analogies such as "man : king :: woman : ?" (answer: "queen").

### Step 1: Load the Analogy Dataset

Analogies are often organized in text files with four words per line: word1, word2, word3, and word4. 
The goal is to predict word4, given word1, word2, and word3.

In [None]:
#### Code
analogy_data_url = "https://raw.githubusercontent.com/nicholas-leonard/word2vec/master/questions-words.txt"
analogy_lines = []

### Step 3: Solve Analogies Using GloVe

To solve analogies, we use the vector arithmetic property of embeddings:
$$vec(word2) - vec(word1) + vec(word3) ≈ vec(word4)$$

In [None]:
#wv_glove_200.most_similar(positive=['woman', 'king'], negative=['man'])

### Step 3: Interpret Results

The analogy accuracy gives us insight into how well the embeddings capture relational semantics. Common analogies include relationships like gender (man:king::woman:queen) and geography (Paris:France::Berlin:Germany).

In [None]:
#from gensim.test.utils import datapath
#wv_glove_200.evaluate_word_analogies(datapath('questions-words.txt'))
