## Fastext

FastText is a word embedding model that improves upon Word2Vec by representing words as **bags of character n-grams**. This means it can understand subword information like **prefixes, suffixes, and roots—making it especially powerful for morphologically rich languages and rare or misspelled words**


### 🧠 How FastText Works

Instead of learning a single vector per word, FastText:

    - Breaks each word into character n-grams (e.g., “where” → <wh, whe, her, ere, re>).
    - Learns embeddings for each n-gram.
    - Represents a word as the sum of its n-gram vectors.
So even if a word wasn’t seen during training, FastText can still generate a vector by composing its subword parts!


### ✅ Advantages

- Handles out-of-vocabulary (OOV) words using subword units.
- Better for rare words and morphologically complex languages.
- Fast to train and supports both CBOW and Skip-Gram.
- Pretrained models available in many languages


### ❌ Disadvantages
- Larger model size due to storing n-gram vectors.
- Less effective than contextual models (like BERT) for nuanced meaning.
- No context awareness—still a static embedding model.


In [1]:
import gensim
dir(gensim.models)

['AtireBM25Model',
 'AuthorTopicModel',
 'BackMappingTranslationMatrix',
 'CoherenceModel',
 'Doc2Vec',
 'EnsembleLda',
 'FAST_VERSION',
 'FastText',
 'HdpModel',
 'KeyedVectors',
 'LdaModel',
 'LdaMulticore',
 'LdaSeqModel',
 'LogEntropyModel',
 'LsiModel',
 'LuceneBM25Model',
 'Nmf',
 'NormModel',
 'OkapiBM25Model',
 'Phrases',
 'RpModel',
 'TfidfModel',
 'TranslationMatrix',
 'VocabTransform',
 'Word2Vec',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_fasttext_bin',
 'atmodel',
 'basemodel',
 'bm25model',
 'callbacks',
 'coherencemodel',
 'doc2vec',
 'doc2vec_corpusfile',
 'doc2vec_inner',
 'ensemblelda',
 'fasttext',
 'fasttext_corpusfile',
 'fasttext_inner',
 'hdpmodel',
 'interfaces',
 'keyedvectors',
 'ldamodel',
 'ldamulticore',
 'ldaseqmodel',
 'logentropy_model',
 'lsimodel',
 'nmf',
 'nmf_pgd',
 'normmodel',
 'phrases',
 'rpmodel',
 'tfidfmodel',
 'translation_matrix',
 'utils',
 'word2vec',


In [2]:
from gensim.models import FastText

# Sample corpus
sentences = [
    ["machine", "learning", "is", "fun"],
    ["deep", "learning", "is", "a", "subset", "of", "machine", "learning"],
    ["natural", "language", "processing", "is", "cool"]
]

# Train FastText model
model = FastText(sentences, 
                 vector_size=50, 
                 window=3, 
                 min_count=1, 
                 sg=1)

# Get word vector
print("Vector for 'learning':\n", model.wv['learning'])

# Handle out-of-vocabulary word
print("\nVector for 'learnify' (OOV):\n", model.wv['learnify'])

# Find similar words
print("\nWords similar to 'machine':", model.wv.most_similar('machine'))

Vector for 'learning':
 [-0.00161744  0.00030467 -0.00056338 -0.0019255  -0.00086632  0.00120399
  0.00256037 -0.00128126 -0.00378282  0.00225328  0.00014133 -0.00116593
 -0.00028679  0.00124179  0.00103948 -0.00108299  0.00153361 -0.00042023
 -0.00056507  0.00373649  0.00175249 -0.00037233 -0.00339002  0.00169573
  0.00209186 -0.00228598 -0.00193779  0.00256243 -0.00136976  0.00425175
  0.00310678  0.00304489 -0.0020575   0.00160803 -0.00127648 -0.00108021
  0.00054358  0.00061267  0.00377098  0.00531652  0.00313346  0.00023878
  0.00367623  0.00027897 -0.00081464 -0.00064092 -0.00443881 -0.00462794
 -0.00243396  0.00139452]

Vector for 'learnify' (OOV):
 [-8.0400368e-04  9.2568365e-04  1.0074117e-03 -3.8599859e-03
  2.0799220e-04 -5.9383974e-04  1.7541428e-03 -5.6410010e-04
 -2.3562456e-03  4.5533427e-03  9.7036612e-04  2.2101323e-03
  2.9260556e-03  2.3477278e-03  2.5861780e-03 -9.7213913e-04
 -3.8842298e-04  1.1546517e-03 -7.9165213e-04  2.6887755e-03
 -2.3300997e-03 -2.8159944e-03

### Word2vec vs Fastext

|Feature|Word2vec|fastext|
|---------|---------|----------|
|Unit of representation|Whole words only|Words + Character n-grams (subwords)|
|OOV Handling|Cannot handle out-of-vocabulary(OOV) words|Can generate vectors for unseen words using subwords|
|Morphology Awareness|Ignores internal structure of words|Captures prefixes, suffixes and roots|
|Training objective|Predict context or target word (CBOW/Skip-gram)|Same, but with subword information|
|Model Size|Smaller|Slightly larger due to n-gram storage|
|Performance on rare words|Weaker|Strongerm|