# GloVe (Gensim)

To analyze word vectors, we will utilize Gensim. While not strictly a deep learning library, Gensim is a highly efficient and scalable tool for modeling text and word similarity. It originally focused on topic models like LDA but has since expanded to include SVD and various neural word representations. But its efficient and scalable, and quite widely used.   We gonna use **GloVe** embeddings, downloaded at [the Glove page](https://nlp.stanford.edu/projects/glove/). They're inside [this zip file](https://nlp.stanford.edu/data/glove.6B.zip)



In [1]:
import os
from gensim.models import KeyedVectors
import gensim.downloader as api

model = api.load('glove-wiki-gigaword-100')
glove = api.load("glove-wiki-gigaword-100") 


In [2]:
def predict_analogy(a, b, c, model):
    # Use gensim's built-in analogy via most_similar
    if any(w not in model.key_to_index for w in (a, b, c)):
        return None

    for word, _ in model.most_similar(positive=[b, c], negative=[a], topn=10):
        if word not in {a, b, c}:
            return word
    return None



In [3]:
def evaluate_analogies(file_path, model):
    total = 0
    correct = 0

    with open(file_path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith(":"):
                continue

            words = line.split()
            if len(words) != 4:
                continue

            a, b, c, d = words

            # Skip if any word is OOV
            if any(w not in model.key_to_index for w in (a, b, c, d)):
                continue

            prediction = predict_analogy(a, b, c, model)
            if prediction is None:
                continue

            total += 1
            if prediction == d:
                correct += 1

    accuracy = correct / total if total > 0 else 0.0
    return accuracy, correct, total


In [4]:
evaluate_analogies("past-tense.txt", model)

(0.5544871794871795, 865, 1560)

In [5]:
def glove_predict_capital_country(cap1, country1, cap2, glove): # Predict using glove
    try:
        return glove.most_similar(
            positive=[country1, cap2],
            negative=[cap1],
            topn=1
        )[0][0]
    except KeyError:
        return None


In [6]:
glove_predict_capital_country("Athens", "Greece", "Berlin", glove)

## Summary: GloVe Analysis using Gensim

This notebook explores the use of pre-trained **GloVe (Global Vectors for Word Representation)** embeddings to perform semantic tasks and evaluate model performance using the **Gensim** library.

### Key Tasks & Implementations:
* **Pre-trained Model Loading**: 
    * Utilized the `gensim.downloader` API to load the `glove-wiki-gigaword-100` model.
    * Leveraged efficient word vector storage via `KeyedVectors`.
* **Analogy Prediction**: 
    * Implemented a `predict_analogy` function using the standard vector arithmetic formula: $Word_b - Word_a + Word_c \approx Word_d$.
    * Used Gensim's `most_similar` method to find the closest vector match for target analogies.
* **Performance Evaluation**: 
    * Conducted systematic testing using the `past-tense.txt` analogy dataset.
    * Calculated an accuracy score (e.g., ~55.45%) across valid test pairs to measure the model's grasp of grammatical transformations.
* **Semantic Applications**: 
    * Developed specific utility functions, such as `glove_predict_capital_country`, to demonstrate the model's ability to map geographical and political relationships (e.g., Athens is to Greece as Berlin is to Germany).

