# GloVe

## Overview
**Model Objective**: To generate vector representations of words that capture as much semantic and syntactic information as possible.
**Input**: Corpus
**Output**: Word vectors
**Method Summary**:
1. Construct a word co-occurrence matrix based on the corpus.
2. Train word vectors using the co-occurrence matrix and the GloVe model.

**Flow**:
Start -> Compute co-occurrence matrix -> Train word vectors -> End

In [ ]:
from gensim.models import KeyedVectors

# 加载预训练的GloVe模型（需转换为Word2Vec格式）
glove_path = "glove.6B.100d.txt"  # 下载地址：https://nlp.stanford.edu/projects/glove/
model = KeyedVectors.load_word2vec_format(glove_path, binary=False, no_header=True)

# 获取词向量
vector = model["apple"]
similar_words = model.most_similar("king", topn=5)

## GloVe vs. Word2Vec: Comprehensive Comparison

### Core Differences

| Feature             | GloVe                           | Word2Vec                               |
|---------------------|---------------------------------|----------------------------------------|
| **Approach**        | Global co-occurrence statistics | Local context window prediction        |
| **Training Method** | Matrix factorization            | Neural network (Skip-gram/CBOW)        |
| **Data Usage**      | Entire corpus statistics        | Sliding window contexts                |
| **Optimization**    | Least squares minimization      | Negative sampling/hierarchical softmax |


### Performance Comparison

| Task                | GloVe Advantage               | Word2Vec Advantage            |
|---------------------|-------------------------------|-------------------------------|
| Word Analogies      | Better (e.g., king → queen)   | Slightly worse                |
| Training Speed      | Slower                        | Faster                        |
| Memory Usage        | Higher (stores full matrix)   | Lower                         |
| Rare Word Handling  | More stable                   | Less effective                |