
# GloVe: A Comprehensive Overview

This notebook provides an in-depth overview of GloVe (Global Vectors for Word Representation), including its history, mathematical foundation, implementation, usage, advantages and disadvantages, and more. We'll also include visualizations and a discussion of the model's impact and applications.



## History of GloVe

GloVe was introduced by Jeffrey Pennington, Richard Socher, and Christopher Manning at Stanford University in 2014 in the paper "GloVe: Global Vectors for Word Representation." The model was developed to address some of the limitations of earlier word embedding models like Word2Vec. GloVe differs from Word2Vec by incorporating global word co-occurrence statistics from a corpus, allowing it to capture a more nuanced understanding of word relationships. GloVe has since become one of the most widely used w...



## Mathematical Foundation of GloVe

### Co-occurrence Matrix

GloVe is based on the idea of constructing a word co-occurrence matrix from a large corpus, where each entry \(X_{ij}\) in the matrix represents the number of times word \(i\) occurs in the context of word \(j\). The co-occurrence matrix captures the statistical information about word occurrences across the entire corpus.

### Objective Function

The key insight of GloVe is that the ratio of co-occurrence probabilities between pairs of words can be used to encode meaningful word relationships. The model defines a weighted least squares objective function to learn word vectors that capture these relationships:

\[
J = \sum_{i,j=1}^{|V|} f(X_{ij}) \left( w_i^\top \tilde{w}_j + b_i + \tilde{b}_j - \log X_{ij} \right)^2
\]

Where:
- \(X_{ij}\) is the co-occurrence matrix entry for words \(i\) and \(j\).
- \(w_i\) and \(\tilde{w}_j\) are the word vectors for words \(i\) and \(j\).
- \(b_i\) and \(\tilde{b}_j\) are the bias terms.
- \(f(X_{ij})\) is a weighting function that reduces the impact of very frequent co-occurrences.

### Weighting Function

The weighting function \(f(X_{ij})\) is designed to give less importance to very frequent co-occurrences, as they tend to be less informative:

\[
f(X_{ij}) = \left\{
    \begin{array}{ll}
    \left(\frac{X_{ij}}{X_{\max}}\right)^\alpha & \text{if } X_{ij} < X_{\max} \\
    1 & \text{otherwise}
    \end{array}
\right.
\]

Where \(X_{\max}\) is a threshold and \(\alpha\) is a parameter that controls the scaling.

### Training

GloVe is trained by minimizing the objective function \(J\) using stochastic gradient descent (SGD) or its variants. The resulting word vectors are learned such that words appearing in similar contexts have similar vector representations, capturing semantic relationships between words.



## Implementation in Python

We'll implement a basic version of GloVe using the `glove-python-binary` library. This implementation will demonstrate how to train GloVe embeddings on a sample corpus and visualize the resulting word vectors.


In [None]:

import numpy as np
from glove import Corpus, Glove
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Sample corpus
sentences = [
    ["king", "queen", "man", "woman"],
    ["king", "man", "kingdom"],
    ["queen", "woman", "monarchy"],
    ["man", "woman", "child"],
    ["woman", "queen", "lady"],
    ["man", "king", "lord"]
]

# Create a corpus object and train the GloVe model
corpus = Corpus()
corpus.fit(sentences, window=2)

glove = Glove(no_components=50, learning_rate=0.05)
glove.fit(corpus.matrix, epochs=10, no_threads=1, verbose=True)
glove.add_dictionary(corpus.dictionary)

# Print word vectors
print("Vector for 'king':", glove.word_vectors[glove.dictionary['king']])
print("Vector for 'queen':", glove.word_vectors[glove.dictionary['queen']])

# Plot word vectors using PCA
words = list(glove.dictionary.keys())
vectors = np.array([glove.word_vectors[glove.dictionary[word]] for word in words])

pca = PCA(n_components=2)
result = pca.fit_transform(vectors)

plt.figure(figsize=(8, 6))
plt.scatter(result[:, 0], result[:, 1])

for i, word in enumerate(words):
    plt.annotate(word, xy=(result[i, 0], result[i, 1]))

plt.title('GloVe Word Embeddings')
plt.show()



## Pros and Cons of GloVe

### Advantages
- **Global Context**: GloVe captures global word co-occurrence statistics, providing a more comprehensive understanding of word relationships than local context-based models like Word2Vec.
- **Efficient Training**: GloVe can be trained on large corpora efficiently, making it suitable for generating high-quality word embeddings for large vocabularies.
- **Semantic Relationships**: The embeddings generated by GloVe effectively capture semantic relationships between words, making them useful for various NLP tasks.

### Disadvantages
- **Memory Usage**: The co-occurrence matrix can be large, especially for extensive vocabularies, leading to high memory usage during training.
- **Context Independence**: Like Word2Vec, GloVe does not capture the context in which words appear, leading to limitations in understanding polysemy (words with multiple meanings).



## Conclusion

GloVe is a powerful model for generating word embeddings that capture semantic relationships between words by leveraging global word co-occurrence statistics. It has become one of the most widely used methods for word representation in NLP, providing high-quality embeddings that are effective for a variety of tasks. While GloVe has some limitations, such as context independence and memory usage, its impact on the field of NLP is significant, and it remains a valuable tool for many applications.
