# Comparison of FastText and Word2Vec 

Facebook Research open sourced a great project recently - [fastText](https://github.com/facebookresearch/fastText), a fast (no surprise) and effective method to learn word representations and perform text classification. I was curious about comparing these embeddings to other commonly used embeddings, so word2vec seemed like the obvious choice, especially considering fastText embeddings are an extension of word2vec. 

I've used gensim to train the word2vec models, and the analogical reasoning task (described in Section 4.1 of [[2]](https://arxiv.org/pdf/1301.3781v3.pdf)) for comparing the word2vec and fastText models. I've compared embeddings trained using the skipgram architecture.

# Download data

In [1]:
import nltk
nltk.download('brown') 
# Only the brown corpus is needed in case you don't have it.

# Generate brown corpus text file
with open('brown_corp.txt', 'w+') as f:
    for word in nltk.corpus.brown.words():
        f.write('{word} '.format(word=word))

# download the text8 corpus (a 100 MB sample of cleaned wikipedia text)
import os.path
if not os.path.isfile('text8'):
    !wget http://mattmahoney.net/dc/text8.zip
    !unzip text8.zip

NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> q
--2016-08-11 01:55:34--  http://mattmahoney.net/dc/text8.zip
Resolving mattmahoney.net (mattmahoney.net)... 98.139.135.129
Connecting to mattmahoney.net (mattmahoney.net)|98.139.135.129|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31344016 (30M) [application/zip]
Saving to: ‘text8.zip’


2016-08-11 01:57:26 (274 KB/s) - ‘text8.zip’ saved [31344016/31344016]

Archive:  text8.zip
  inflating: text8                   


# Train models

For training the models yourself, you'll need to have both [Gensim](https://github.com/RaRe-Technologies/gensim) and [FastText](https://github.com/facebookresearch/fastText) set up on your machine.

In [2]:
# Make sure you set FT_HOME to your fastText directory root
FT_HOME = 'fastText/'
MODELS_DIR = 'models/'
!mkdir -p {MODELS_DIR}

lr = 0.05
dim = 100
ws = 5
epoch = 5
minCount = 5
neg = 5
loss = 'ns'
t = 1e-4

if not os.path.isfile(os.path.join(MODELS_DIR, 'brown_ft.vec')):
    print('Training fasttext on brown corpus..')
    %time !{FT_HOME}fasttext skipgram -input brown_corp.txt -output {MODELS_DIR+'brown_ft'}  -lr {lr} -dim {dim} -ws {ws} -epoch {epoch} -minCount {minCount} -neg {neg} -loss {loss} -t {t}
else:
    print('\nUsing existing model file brown_ft.vec')
    
if not os.path.isfile(os.path.join(MODELS_DIR, 'brown_ft_no_ng.vec')):
    print('\nTraining fasttext on brown corpus (without char n-grams)..')
    %time !{FT_HOME}fasttext skipgram -input brown_corp.txt -output {MODELS_DIR+'brown_ft_no_ng'}  -lr {lr} -dim {dim} -ws {ws} -epoch {epoch} -minCount {minCount} -neg {neg} -loss {loss} -t {t} -maxn 0
else:
    print('\nUsing existing model file brown_ft_no_ng.vec')
    
# Training gensim skipgram model on brown corpus
from gensim.models import Word2Vec
from gensim.models.word2vec import Text8Corpus

# Same values as used for fastText training above
params = {
    'alpha': lr,
    'size': dim,
    'window': ws,
    'iter': epoch,
    'min_count': minCount,
    'sample': t,
    'sg': 1,
    'hs': 0,
    'negative': neg
}

if not os.path.isfile(os.path.join(MODELS_DIR, 'brown_gs.vec')):
    print('\nTraining word2vec on brown corpus..')
    %time brown_gs = Word2Vec(Text8Corpus('brown_corp.txt'), **params) #Text8Corpus class for reading space-separated words file
    brown_gs.save_word2vec_format(MODELS_DIR + 'brown_gs.vec')
else:
    print('\nUsing existing model file brown_gs.vec')

Training fasttext on brown corpus..
Read 1M words
Progress: 100.0%  words/sec/thread: 40798  lr: 0.000001  loss: 2.291000  eta: 0h0m 
Train time: 39.000000 sec
CPU times: user 1.09 s, sys: 104 ms, total: 1.19 s
Wall time: 44.8 s

Training fasttext on brown corpus (without char n-grams)..
Read 1M words
Progress: 100.0%  words/sec/thread: 74234  lr: 0.000001  loss: 2.360271  eta: 0h0m 
Train time: 22.000000 sec


CPU times: user 756 ms, sys: 88 ms, total: 844 ms
Wall time: 27.4 s

Training word2vec on brown corpus..


2016-08-21 11:40:29,440 : INFO : collecting all words and their counts
2016-08-21 11:40:29,444 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2016-08-21 11:40:29,838 : INFO : collected 56057 word types from a corpus of 1161192 raw words and 117 sentences
2016-08-21 11:40:30,012 : INFO : min_count=5 retains 15173 unique words (drops 40884)
2016-08-21 11:40:30,013 : INFO : min_count leaves 1095086 word corpus (94% of original 1161192)
2016-08-21 11:40:30,063 : INFO : deleting the raw counts dictionary of 56057 items
2016-08-21 11:40:30,067 : INFO : sample=0.0001 downsamples 340 most-common words
2016-08-21 11:40:30,068 : INFO : downsampling leaves estimated 540252 word corpus (49.3% of prior 1095086)
2016-08-21 11:40:30,068 : INFO : estimated required memory for 15173 words and 100 dimensions: 19724900 bytes
2016-08-21 11:40:30,134 : INFO : resetting layer weights
2016-08-21 11:40:30,304 : INFO : training model with 3 workers on 15173 vocabulary and 100 featur

CPU times: user 36.9 s, sys: 44 ms, total: 37 s
Wall time: 13.9 s


In [3]:
if not os.path.isfile(os.path.join(MODELS_DIR, 'text8_ft.vec')):
    print("Training fastText skipgram model on text8 corpus...")
    %time !{FT_HOME}fasttext skipgram -input text8 -output {MODELS_DIR+'text8_ft'}  -lr {lr} -dim {dim} -ws {ws} -epoch {epoch} -minCount {minCount} -neg {neg} -loss {loss} -t {t}
else:
    print('\nUsing existing model file text8_ft.vec')
    
if not os.path.isfile(os.path.join(MODELS_DIR, 'text8_ft_no_ng.vec')):
    print("\nTraining fastText skipgram model on text8 corpus (without n-grams)...")
    %time !{FT_HOME}fasttext skipgram -input text8 -output {MODELS_DIR+'text8_ft_no_ng'} -lr {lr} -dim {dim} -ws {ws} -epoch {epoch} -minCount {minCount} -neg {neg} -loss {loss} -t {t} -maxn 0
else:
    print('\nUsing existing model file text8_ft_no_ng.vec')
    
if not os.path.isfile(os.path.join(MODELS_DIR, 'text8_gs.vec')):
    print("\nTraining word2vec on text8 corpus..")
    %time text8_gs = Word2Vec(Text8Corpus('text8'), **params)
    text8_gs.save_word2vec_format(MODELS_DIR + 'text8_gs.vec')
else:
    print('\nUsing existing model file text8_gs.vec')

Training fastText skipgram model on text8 corpus...
Read 17M words
Progress: 100.0%  words/sec/thread: 48719  lr: 0.000001  loss: 1.835444  eta: 0h0m 
Train time: 518.000000 sec
CPU times: user 8.52 s, sys: 944 ms, total: 9.46 s
Wall time: 8min 51s

Training fastText skipgram model on text8 corpus (without n-grams)...
Read 17M words
Progress: 100.0%  words/sec/thread: 107885  lr: 0.000001  loss: 1.878955  eta: 0h0m 
Train time: 219.000000 sec
CPU times: user 3.72 s, sys: 540 ms, total: 4.26 s
Wall time: 3min 51s

Training word2vec on text8 corpus..
CPU times: user 9min 2s, sys: 1.22 s, total: 9min 3s
Wall time: 3min 6s


# Comparisons

In [4]:
# download the file questions-words.txt to be used for comparing word embeddings
!wget https://raw.githubusercontent.com/tmikolov/word2vec/master/questions-words.txt

--2016-08-21 10:31:16--  https://raw.githubusercontent.com/tmikolov/word2vec/master/questions-words.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 603955 (590K) [text/plain]
Saving to: ‘questions-words.txt’


2016-08-21 10:31:17 (554 KB/s) - ‘questions-words.txt’ saved [603955/603955]



Once you have downloaded or trained the models and downloaded `questions-words.txt`, you're ready to run the comparison.

In [5]:
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

def print_accuracy(model, questions_file):
    print('Evaluating...\n')
    acc = model.accuracy(questions_file)

    sem_correct = sum((len(acc[i]['correct']) for i in range(5)))
    sem_total = sum((len(acc[i]['correct']) + len(acc[i]['incorrect'])) for i in range(5))
    print('\nSemantic: {:d}/{:d}, Accuracy: {:.2f}%'.format(sem_correct, sem_total, 100*float(sem_correct)/sem_total))
    
    syn_correct = sum((len(acc[i]['correct']) for i in range(5, len(acc)-1)))
    syn_total = sum((len(acc[i]['correct']) + len(acc[i]['incorrect'])) for i in range(5,len(acc)-1))
    print('Syntactic: {:d}/{:d}, Accuracy: {:.2f}%\n'.format(syn_correct, syn_total, 100*float(syn_correct)/syn_total))

word_analogies_file = 'questions-words.txt'
print('\nLoading FastText embeddings')
brown_ft = Word2Vec.load_word2vec_format(MODELS_DIR + 'brown_ft.vec')
print('Accuracy for FastText (with n-grams):')
print_accuracy(brown_ft, word_analogies_file)

print('\nLoading Gensim embeddings')
brown_gs = Word2Vec.load_word2vec_format(MODELS_DIR + 'brown_gs.vec')
print('Accuracy for Word2Vec:')
print_accuracy(brown_gs, word_analogies_file)

2016-08-11 02:14:00,341 : INFO : loading projection weights from models/brown_ft.vec



Loading FastText embeddings


2016-08-11 02:14:01,471 : INFO : loaded (15173, 100) matrix from models/brown_ft.vec
2016-08-11 02:14:01,525 : INFO : precomputing L2-norms of word weight vectors


Accuracy for FastText (with n-grams):
Evaluating...



2016-08-11 02:14:01,912 : INFO : family: 16.5% (30/182)
2016-08-11 02:14:03,225 : INFO : gram1-adjective-to-adverb: 74.4% (522/702)
2016-08-11 02:14:03,491 : INFO : gram2-opposite: 80.3% (106/132)
2016-08-11 02:14:05,584 : INFO : gram3-comparative: 61.4% (648/1056)
2016-08-11 02:14:06,029 : INFO : gram4-superlative: 67.1% (141/210)
2016-08-11 02:14:07,406 : INFO : gram5-present-participle: 65.5% (426/650)
2016-08-11 02:14:10,038 : INFO : gram7-past-tense: 13.3% (168/1260)
2016-08-11 02:14:11,128 : INFO : gram8-plural: 59.1% (326/552)
2016-08-11 02:14:11,799 : INFO : gram9-plural-verbs: 72.5% (248/342)
2016-08-11 02:14:11,801 : INFO : total: 51.4% (2615/5086)
2016-08-11 02:14:11,803 : INFO : loading projection weights from models/brown_gs.vec



Semantic: 30/182, Accuracy: 16.48%
Syntactic: 2585/4904, Accuracy: 52.71%


Loading Gensim embeddings


2016-08-11 02:14:13,067 : INFO : loaded (15173, 100) matrix from models/brown_gs.vec
2016-08-11 02:14:13,144 : INFO : precomputing L2-norms of word weight vectors


Accuracy for Word2Vec:
Evaluating...



2016-08-11 02:14:13,517 : INFO : family: 25.3% (46/182)
2016-08-11 02:14:14,826 : INFO : gram1-adjective-to-adverb: 0.4% (3/702)
2016-08-11 02:14:15,093 : INFO : gram2-opposite: 0.0% (0/132)
2016-08-11 02:14:18,479 : INFO : gram3-comparative: 3.3% (35/1056)
2016-08-11 02:14:19,155 : INFO : gram4-superlative: 1.4% (3/210)
2016-08-11 02:14:20,671 : INFO : gram5-present-participle: 0.8% (5/650)
2016-08-11 02:14:23,697 : INFO : gram7-past-tense: 1.3% (16/1260)
2016-08-11 02:14:24,753 : INFO : gram8-plural: 5.1% (28/552)
2016-08-11 02:14:25,435 : INFO : gram9-plural-verbs: 2.3% (8/342)
2016-08-11 02:14:25,437 : INFO : total: 2.8% (144/5086)



Semantic: 46/182, Accuracy: 25.27%
Syntactic: 98/4904, Accuracy: 2.00%



Word2Vec embeddings seem to be slightly better than fastText embeddings at the semantic tasks, while the fastText embeddings do significantly better on the syntactic analogies. Makes sense, since fastText embeddings are trained for understanding morphological nuances, and most of the syntactic analogies are morphology based. 

Let me explain that better.

According to the paper [[1]](https://arxiv.org/abs/1607.04606), embeddings for words are represented by the sum of their n-gram embeddings. This is meant to be useful for morphologically rich languages - so theoretically, the embedding for `apparently` would include information from both character n-grams `apparent` and `ly` (as well as other n-grams), and the n-grams would combine in a simple, linear manner. This is very similar to what most of our syntactic tasks look like.

Example analogy:

`amazing amazingly calm calmly`

This analogy is marked correct if: 

`embedding(amazing)` - `embedding(amazingly)` = `embedding(calm)` - `embedding(calmly)`

Both these subtractions would result in a very similar set of remaining ngrams.
No surprise the fastText embeddings do extremely well on this.

Let's do a small test to validate this hypothesis - fastText differs from word2vec only in that it uses char n-gram embeddings as well as the actual word embedding in the scoring function to calculate scores and then likelihoods for each word, given a context word. In case char n-gram embeddings are not present, this reduces (atleast theoretically) to the original word2vec model. This can be implemented by setting 0 for the max length of char n-grams for fastText.


In [6]:
print('Loading FastText embeddings')
brown_ft_no_ng = Word2Vec.load_word2vec_format(MODELS_DIR + 'brown_ft_no_ng.vec')
print('Accuracy for FastText (without n-grams):')
print_accuracy(brown_ft_no_ng, word_analogies_file)


2016-08-11 02:14:25,450 : INFO : loading projection weights from models/brown_ft_no_ng.vec


Loading FastText embeddings


2016-08-11 02:14:26,645 : INFO : loaded (15173, 100) matrix from models/brown_ft_no_ng.vec
2016-08-11 02:14:26,725 : INFO : precomputing L2-norms of word weight vectors


Accuracy for FastText (without n-grams):
Evaluating...



2016-08-11 02:14:27,091 : INFO : family: 18.1% (33/182)
2016-08-11 02:14:28,402 : INFO : gram1-adjective-to-adverb: 0.0% (0/702)
2016-08-11 02:14:28,689 : INFO : gram2-opposite: 0.8% (1/132)
2016-08-11 02:14:30,772 : INFO : gram3-comparative: 2.7% (28/1056)
2016-08-11 02:14:31,189 : INFO : gram4-superlative: 1.0% (2/210)
2016-08-11 02:14:32,406 : INFO : gram5-present-participle: 0.6% (4/650)
2016-08-11 02:14:34,765 : INFO : gram7-past-tense: 1.0% (12/1260)
2016-08-11 02:14:35,957 : INFO : gram8-plural: 6.3% (35/552)
2016-08-11 02:14:37,044 : INFO : gram9-plural-verbs: 1.2% (4/342)
2016-08-11 02:14:37,046 : INFO : total: 2.3% (119/5086)



Semantic: 33/182, Accuracy: 18.13%
Syntactic: 86/4904, Accuracy: 1.75%



A-ha! The results for FastText with no n-grams and Word2Vec look a lot more similar (as they should) - the differences could easily result from differences in implementation between fastText and Gensim, and randomization. Especially telling is that the semantic accuracy for FastText has improved slightly after removing n-grams, while the syntactic accuracy has taken a giant dive. Our hypothesis that the char n-grams result in better performance on syntactic analogies seems fair. It also seems possible that char n-grams hurt semantic accuracy a little. However, the brown corpus is too small to be able to draw any definite conclusions - the accuracies seem vary significantly over different runs.

Let's try with a larger corpus now - text8 (collection of wiki articles). I'm also curious about the impact on semantic accuracy - for models trained on the brown corpus, the difference in the semantic accuracy and the accuracy values themselves are too small to be conclusive. Hopefully a larger corpus helps, and the text8 corpus likely has a lot more information about capitals, currencies, cities etc, which should be relevant to the semantic tasks.

In [7]:
print('Loading FastText embeddings')
text8_ft_no_ng = Word2Vec.load_word2vec_format(MODELS_DIR + 'text8_ft_no_ng.vec')
print('Accuracy for FastText (without n-grams):')
print_accuracy(text8_ft_no_ng, word_analogies_file)

print('Loading Gensim embeddings')
text8_gs = Word2Vec.load_word2vec_format(MODELS_DIR + 'text8_gs.vec')
print('Accuracy for word2vec:')
print_accuracy(text8_gs, word_analogies_file)

print('Loading FastText embeddings (with n-grams)')
text8_ft = Word2Vec.load_word2vec_format(MODELS_DIR + 'text8_ft.vec')
print('Accuracy for FastText (with n-grams):')
print_accuracy(text8_ft, word_analogies_file)

2016-08-11 02:14:37,067 : INFO : loading projection weights from models/text8_ft_no_ng.vec


Loading FastText embeddings


2016-08-11 02:14:43,506 : INFO : loaded (71290, 100) matrix from models/text8_ft_no_ng.vec
2016-08-11 02:14:43,691 : INFO : precomputing L2-norms of word weight vectors


Accuracy for FastText (without n-grams):
Evaluating...



2016-08-11 02:14:45,858 : INFO : capital-common-countries: 71.1% (360/506)
2016-08-11 02:14:52,498 : INFO : capital-world: 48.6% (706/1452)
2016-08-11 02:14:53,594 : INFO : currency: 22.0% (59/268)
2016-08-11 02:15:00,493 : INFO : city-in-state: 22.0% (332/1511)
2016-08-11 02:15:02,293 : INFO : family: 57.2% (175/306)
2016-08-11 02:15:06,980 : INFO : gram1-adjective-to-adverb: 13.9% (105/756)
2016-08-11 02:15:08,790 : INFO : gram2-opposite: 15.0% (46/306)
2016-08-11 02:15:14,361 : INFO : gram3-comparative: 44.0% (555/1260)
2016-08-11 02:15:16,740 : INFO : gram4-superlative: 22.3% (113/506)
2016-08-11 02:15:20,652 : INFO : gram5-present-participle: 22.6% (224/992)
2016-08-11 02:15:26,655 : INFO : gram6-nationality-adjective: 79.3% (1087/1371)
2016-08-11 02:15:31,947 : INFO : gram7-past-tense: 32.7% (436/1332)
2016-08-11 02:15:36,068 : INFO : gram8-plural: 53.2% (528/992)
2016-08-11 02:15:39,583 : INFO : gram9-plural-verbs: 26.8% (174/650)
2016-08-11 02:15:39,585 : INFO : total: 40.1% (4


Semantic: 1632/4043, Accuracy: 40.37%
Syntactic: 3268/8165, Accuracy: 40.02%

Loading Gensim embeddings


2016-08-11 02:15:45,542 : INFO : loaded (71290, 100) matrix from models/text8_gs.vec
2016-08-11 02:15:45,753 : INFO : precomputing L2-norms of word weight vectors


Accuracy for word2vec:
Evaluating...



2016-08-11 02:15:47,957 : INFO : capital-common-countries: 68.0% (344/506)
2016-08-11 02:15:54,026 : INFO : capital-world: 47.9% (695/1452)
2016-08-11 02:15:55,180 : INFO : currency: 20.9% (56/268)
2016-08-11 02:16:03,023 : INFO : city-in-state: 23.2% (365/1571)
2016-08-11 02:16:05,472 : INFO : family: 54.2% (166/306)
2016-08-11 02:16:09,815 : INFO : gram1-adjective-to-adverb: 16.0% (121/756)
2016-08-11 02:16:11,688 : INFO : gram2-opposite: 16.0% (49/306)
2016-08-11 02:16:18,558 : INFO : gram3-comparative: 55.2% (695/1260)
2016-08-11 02:16:20,817 : INFO : gram4-superlative: 31.6% (160/506)
2016-08-11 02:16:25,408 : INFO : gram5-present-participle: 28.0% (278/992)
2016-08-11 02:16:31,638 : INFO : gram6-nationality-adjective: 77.2% (1059/1371)
2016-08-11 02:16:38,305 : INFO : gram7-past-tense: 31.2% (416/1332)
2016-08-11 02:16:42,955 : INFO : gram8-plural: 48.4% (480/992)
2016-08-11 02:16:45,772 : INFO : gram9-plural-verbs: 30.9% (201/650)
2016-08-11 02:16:45,774 : INFO : total: 41.4% (5


Semantic: 1626/4103, Accuracy: 39.63%
Syntactic: 3459/8165, Accuracy: 42.36%

Loading FastText embeddings (with n-grams)


2016-08-11 02:16:51,881 : INFO : loaded (71290, 100) matrix from models/text8_ft.vec
2016-08-11 02:16:52,076 : INFO : precomputing L2-norms of word weight vectors


Accuracy for FastText (with n-grams):
Evaluating...



2016-08-11 02:16:54,100 : INFO : capital-common-countries: 62.6% (317/506)
2016-08-11 02:17:00,728 : INFO : capital-world: 43.0% (624/1452)
2016-08-11 02:17:01,834 : INFO : currency: 11.9% (32/268)
2016-08-11 02:17:08,550 : INFO : city-in-state: 19.5% (294/1511)
2016-08-11 02:17:09,774 : INFO : family: 47.4% (145/306)
2016-08-11 02:17:13,485 : INFO : gram1-adjective-to-adverb: 77.5% (586/756)
2016-08-11 02:17:14,836 : INFO : gram2-opposite: 61.1% (187/306)
2016-08-11 02:17:20,270 : INFO : gram3-comparative: 63.1% (795/1260)
2016-08-11 02:17:22,524 : INFO : gram4-superlative: 59.1% (299/506)
2016-08-11 02:17:26,654 : INFO : gram5-present-participle: 55.7% (553/992)
2016-08-11 02:17:32,705 : INFO : gram6-nationality-adjective: 93.9% (1288/1371)
2016-08-11 02:17:38,990 : INFO : gram7-past-tense: 36.0% (480/1332)
2016-08-11 02:17:43,387 : INFO : gram8-plural: 88.6% (879/992)
2016-08-11 02:17:46,392 : INFO : gram9-plural-verbs: 59.7% (388/650)
2016-08-11 02:17:46,393 : INFO : total: 56.2% (


Semantic: 1412/4043, Accuracy: 34.92%
Syntactic: 5455/8165, Accuracy: 66.81%



With the text8 corpus, we observe a similar pattern. Semantic accuracy falls by a small but significant amount when n-grams are included in FastText, while FastText with n-grams performs far better on the syntactic analogies. FastText without n-grams are largely similar to Word2Vec.

My hypothesis for semantic accuracy being lower for the FastText-with-ngrams model is that most of the words in the semantic analogies are standalone words and are unrelated to their morphemes (eg: father, mother, France, Paris), hence inclusion of the char n-grams into the scoring function actually makes the embeddings worse.

This trend is observed in the original paper too where the performance of embeddings with n-grams is worse on semantic tasks than both word2vec cbow and skipgram models.

A couple of other notes - 

1. The semantic accuracy for all models increases significantly with the increase in corpus size.
2. However, the increase in syntactic accuracy from the increase in corpus size for the n-gram FastText model is lower (in both relative and absolute terms) for the n-gram FastText model. This could possibly indicate that advantages gained by incorporating morphological information could be less significant in case of larger corpus sizes (the corpuses used in the original paper seem to indicate this too)
3. Training times for gensim are slightly lower than the fastText no-ngram model, and significantly lower than the n-gram variant. This is quite impressive considering fastText is implemented in C++ and Gensim in Python. You could read [this post](http://rare-technologies.com/word2vec-in-python-part-two-optimizing/) for more details regarding word2vec optimisation in Gensim.

# Conclusions

These preliminary results seem to indicate fastText embeddings are significantly better than word2vec at encoding syntactic information. This is expected, since most syntactic analogies are morphology based, and the char n-gram approach of fastText takes such information into account. The original word2vec model seems to perform better on semantic tasks, since words in semantic analogies are unrelated to their char n-grams, and the added information from irrelevant char n-grams worsens the embeddings. It'd be interesting to see how transferable these embeddings are for different kinds of tasks by comparing their performance in a downstream supervised task.

# References

[1] [Enriching Word Vectors with Subword Information](https://arxiv.org/pdf/1607.04606v1.pdf)

[2] [Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/pdf/1301.3781v3.pdf)