###  FastText Cosine Similarity Method

A powerful word embedding model developed by Facebook AI Research (FAIR), to compute the semantic similarity between two sentences.

FastText is unique because it doesn't just look at whole words — it breaks them down into **character-level n-grams**, which means it can generate embeddings for **out-of-vocabulary (OOV)** words like typos, rare words, or even newly coined terms. This makes it more robust than traditional models like Word2Vec and GloVe.


In [1]:
import gensim.downloader as api
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np


In [5]:
def average_vector(sentence, model):
    words = [word.lower() for word in sentence.split() if word.lower() in model]
    if not words:
        return np.zeros(model.vector_size)
    return np.mean([model[word] for word in words], axis=0)

def fasttext_cosine(sent1, sent2):
    model = api.load('fasttext-wiki-news-subwords-300')
    vec1 = average_vector(sent1, model)
    vec2 = average_vector(sent2, model)
    return cosine_similarity([vec1], [vec2])[0][0]


In [6]:
sent1 = "Dogs are wonderful pets."
sent2 = "Cats are amazing companions."
score = fasttext_cosine(sent1, sent2)
print(f"FastText Cosine Similarity: {score:.4f}")

FastText Cosine Similarity: 0.9142
