# Comparison of FastText and Word2Vec 

Facebook Research open sourced a great project yesterday - [fastText](https://github.com/facebookresearch/fastText), a fast (no surprise) and effective method to learn word representations and perform text classification. I was curious about comparing these embeddings to other commonly used embeddings, so word2vec seemed like the obvious choice, especially considering fastText embeddings are an extension of word2vec. 

I've used gensim to train the word2vec models, and the analogical reasoning task (described in Section 4.1 of [[2]](https://arxiv.org/pdf/1301.3781v3.pdf)) for comparing the word2vec and fastText models. I've compared embeddings trained using the skipgram architecture.

# Download data

In [None]:
import nltk
nltk.download() 
# Only the brown corpus is needed in case you don't have it.
# alternately, you can simply download the pretrained models below if you wish to avoid downloading and training

# Generate brown corpus text file
with open('brown_corp.txt', 'w+') as f:
    for word in nltk.corpus.brown.words():
        f.write('{word} '.format(word=word))

In [None]:
# download the text8 corpus (a 100 MB sample of cleaned wikipedia text)
# alternately, you can simply download the pretrained models below if you wish to avoid downloading and training
!wget http://mattmahoney.net/dc/text8.zip

In [None]:
# download the file questions-words.txt to be used for comparing word embeddings
!wget https://raw.githubusercontent.com/arfon/word2vec/master/questions-words.txt

# Train models

If you wish to avoid training, you can download pre-trained models instead in the next section.
For training the fastText models yourself, you'll have to follow the setup instructions for [fastText](https://github.com/facebookresearch/fastText) and run the training with -

In [21]:
%%time
# Make sure you set $FT_HOME to your fastText directory root
# Training fastText skipgram model on brown corpus
!$FT_HOME/fasttext skipgram -input brown_corp.txt -output brown_ft  -lr 0.05 -dim 100 -ws 5 -epoch 5 -minCount 5 -neg 5 -loss ns -t 0.0001

Read 1M words
Progress: 100.0%  words/sec/thread: 31519  lr: 0.000001  loss: 2.289203  eta: 0h0m 
Train time: 50.000000 sec
CPU times: user 1.53 s, sys: 164 ms, total: 1.69 s
Wall time: 1min 11s


In [15]:
%%time
# Training fastText skipgram model on text8 corpus
!$FT_HOME/fasttext skipgram -input text8 -output text8_ft -lr 0.05 -dim 100 -ws 5 -epoch 5 -minCount 5 -neg 5 -loss ns -t 0.0001

Read 17M words
Progress: 100.0%  words/sec/thread: 20536  lr: 0.000001  loss: 1.830005  eta: 0h0m 
Train time: 1257.000000 sec
CPU times: user 29.5 s, sys: 3.75 s, total: 33.3 s
Wall time: 21min 38s


For training the gensim models -

In [6]:
from nltk.corpus import brown
from gensim.models import Word2Vec
from gensim.models.word2vec import Text8Corpus

import logging
logging.root.handlers = []
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)


# Make sure you create a models dir in case it doesn't exist, or modify MODELS_DIR
MODELS_DIR = 'models/'

# Same values as used for fastText training above
params = {
    'alpha': 0.05,
    'size': 100,
    'window': 5,
    'iter': 5,
    'min_count': 5,
    'sample': 1e-4,
    'sg': 1,
    'hs': 0,
    'negative': 5
}

%time brown_gs = Word2Vec(brown.sents(), **params)
brown_gs.save_word2vec_format(MODELS_DIR + 'brown_gs.vec')

%time text8_gs = Word2Vec(Text8Corpus('text8'), **params)
text8_gs.save_word2vec_format(MODELS_DIR + 'text8_gs.vec')

2016-08-08 17:47:00,842 : INFO : collecting all words and their counts
2016-08-08 17:47:00,886 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2016-08-08 17:47:01,704 : INFO : PROGRESS: at sentence #10000, processed 219770 words, keeping 23488 word types
2016-08-08 17:47:02,695 : INFO : PROGRESS: at sentence #20000, processed 430477 words, keeping 34367 word types
2016-08-08 17:47:03,826 : INFO : PROGRESS: at sentence #30000, processed 669056 words, keeping 42365 word types
2016-08-08 17:47:04,638 : INFO : PROGRESS: at sentence #40000, processed 888291 words, keeping 49136 word types
2016-08-08 17:47:05,293 : INFO : PROGRESS: at sentence #50000, processed 1039920 words, keeping 53024 word types
2016-08-08 17:47:05,781 : INFO : collected 56057 word types from a corpus of 1161192 raw words and 57340 sentences
2016-08-08 17:47:05,978 : INFO : min_count=5 retains 15173 unique words (drops 40884)
2016-08-08 17:47:05,980 : INFO : min_count leaves 1095086 word corpu

CPU times: user 1min 14s, sys: 332 ms, total: 1min 14s
Wall time: 45.8 s


2016-08-08 17:47:47,979 : INFO : collecting all words and their counts
2016-08-08 17:47:47,983 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2016-08-08 17:47:59,375 : INFO : collected 253854 word types from a corpus of 17005207 raw words and 1701 sentences
2016-08-08 17:48:00,240 : INFO : min_count=5 retains 71290 unique words (drops 182564)
2016-08-08 17:48:00,241 : INFO : min_count leaves 16718844 word corpus (98% of original 17005207)
2016-08-08 17:48:00,682 : INFO : deleting the raw counts dictionary of 253854 items
2016-08-08 17:48:00,718 : INFO : sample=0.0001 downsamples 341 most-common words
2016-08-08 17:48:00,719 : INFO : downsampling leaves estimated 9386181 word corpus (56.1% of prior 16718844)
2016-08-08 17:48:00,721 : INFO : estimated required memory for 71290 words and 100 dimensions: 92677000 bytes
2016-08-08 17:48:01,104 : INFO : resetting layer weights
2016-08-08 17:48:03,003 : INFO : training model with 3 workers on 71290 vocabulary and 1

CPU times: user 18min 52s, sys: 2.36 s, total: 18min 54s
Wall time: 6min 55s


# Download models
In case you wish to avoid downloading the corpus and training the models, you can download pretrained models with - 

In [None]:
# download the fastText and gensim models trained on the brown corpus and text8 corpus
!wget https://www.dropbox.com/s/d15f3eumu3i8ld6/models.tar.gz?dl=1 -O models.tar.gz

Once you have downloaded or trained the models (make sure they're in the `models/` directory, or that you've appropriately changed `MODELS_DIR`) and downloaded `questions-words.txt`, you're ready to run the comparison.

# Comparisons

In [7]:
from gensim.models import Word2Vec

def print_accuracy(model, questions_file):
    print('Evaluating...\n')
    acc = model.accuracy(questions_file)

    sem_correct = sum((len(acc[i]['correct']) for i in range(5)))
    sem_total = sum((len(acc[i]['correct']) + len(acc[i]['incorrect'])) for i in range(5))
    print('\nSemantic: {:d}/{:d}, Accuracy: {:.2f}%'.format(sem_correct, sem_total, 100*float(sem_correct)/sem_total))
    
    syn_correct = sum((len(acc[i]['correct']) for i in range(5, len(acc)-1)))
    syn_total = sum((len(acc[i]['correct']) + len(acc[i]['incorrect'])) for i in range(5,len(acc)-1))
    print('Syntactic: {:d}/{:d}, Accuracy: {:.2f}%\n'.format(syn_correct, syn_total, 100*float(syn_correct)/syn_total))

MODELS_DIR = 'models/'

word_analogies_file = 'questions-words.txt'
print('\nLoading FastText embeddings')
brown_ft = Word2Vec.load_word2vec_format(MODELS_DIR + 'brown_ft.vec')
print('Accuracy for FastText:')
print_accuracy(brown_ft, word_analogies_file)

print('\nLoading Gensim embeddings')
brown_gs = Word2Vec.load_word2vec_format(MODELS_DIR + 'brown_gs.vec')
print('Accuracy for Word2Vec:')
print_accuracy(brown_gs, word_analogies_file)

2016-08-08 18:12:42,844 : INFO : loading projection weights from models/brown_ft.vec



Loading FastText embeddings


2016-08-08 18:12:44,951 : INFO : loaded (15173, 100) matrix from models/brown_ft.vec
2016-08-08 18:12:45,075 : INFO : precomputing L2-norms of word weight vectors


Accuracy for FastText:
Evaluating...



2016-08-08 18:12:46,081 : INFO : family: 14.8% (27/182)
2016-08-08 18:12:50,798 : INFO : gram1-adjective-to-adverb: 73.5% (516/702)
2016-08-08 18:12:51,852 : INFO : gram2-opposite: 81.8% (108/132)
2016-08-08 18:12:59,736 : INFO : gram3-comparative: 61.5% (649/1056)
2016-08-08 18:13:01,312 : INFO : gram4-superlative: 68.6% (144/210)
2016-08-08 18:13:04,920 : INFO : gram5-present-participle: 67.1% (436/650)
2016-08-08 18:13:19,036 : INFO : gram7-past-tense: 11.5% (145/1260)
2016-08-08 18:13:25,371 : INFO : gram8-plural: 60.5% (334/552)
2016-08-08 18:13:28,636 : INFO : gram9-plural-verbs: 71.1% (243/342)
2016-08-08 18:13:28,643 : INFO : total: 51.2% (2602/5086)
2016-08-08 18:13:28,655 : INFO : loading projection weights from models/brown_gs.vec



Semantic: 27/182, Accuracy: 14.84%
Syntactic: 2575/4904, Accuracy: 52.51%


Loading Gensim embeddings


2016-08-08 18:13:34,891 : INFO : loaded (15173, 100) matrix from models/brown_gs.vec


Accuracy for Word2Vec:
Evaluating...



2016-08-08 18:13:35,170 : INFO : precomputing L2-norms of word weight vectors
2016-08-08 18:13:36,717 : INFO : family: 24.2% (44/182)
2016-08-08 18:13:43,872 : INFO : gram1-adjective-to-adverb: 0.6% (4/702)
2016-08-08 18:13:45,106 : INFO : gram2-opposite: 0.8% (1/132)
2016-08-08 18:13:56,865 : INFO : gram3-comparative: 3.4% (36/1056)
2016-08-08 18:13:59,292 : INFO : gram4-superlative: 1.4% (3/210)
2016-08-08 18:14:06,379 : INFO : gram5-present-participle: 0.8% (5/650)
2016-08-08 18:14:18,778 : INFO : gram7-past-tense: 1.4% (18/1260)
2016-08-08 18:14:22,712 : INFO : gram8-plural: 4.9% (27/552)
2016-08-08 18:14:25,127 : INFO : gram9-plural-verbs: 1.5% (5/342)
2016-08-08 18:14:25,130 : INFO : total: 2.8% (143/5086)



Semantic: 44/182, Accuracy: 24.18%
Syntactic: 99/4904, Accuracy: 2.02%



Word2Vec embeddings seem to be slightly better than fastText embeddings at the semantic tasks, while the fastText embeddings do significantly better on the syntactic analogies. Makes sense, since fastText embeddings are trained for understanding morphological nuances, and most of the syntactic analogies are morphology based. 

Let me explain that better.

According to the paper [[1]](https://arxiv.org/abs/1607.04606), embeddings for words are represented by the sum of their n-gram embeddings. This is meant to be useful for morphologically rich languages - so theoretically, the embedding for `apparently` would include information from both character n-grams `apparent` and `ly` (as well as other n-grams), and the n-grams would combine in a simple, linear manner. This is very similar to what most of our syntactic tasks look like.

Example analogy:

`amazing amazingly calm calmly`

This analogy is marked correct if: 

`embedding(amazing)` - `embedding(amazingly)` = `embedding(calm)` - `embedding(calmly)`

Both these subtractions would result in a very similar set of remaining ngrams.
No surprise the fastText embeddings do extremely well on this.

Let's do a small test to validate this hypothesis - fastText differs from word2vec only in that it uses char n-gram embeddings as well as the actual word embedding in the scoring function to calculate scores and then likelihoods for each word, given a context word. In case char n-gram embeddings are not present, this reduces (atleast theoretically) to the original word2vec model. This can be implemented by setting 0 for the max length of char n-grams for fastText.


In [10]:
%%time
# Training fastText skipgram model on brown corpus without n-grams
# If you chose to download the models, this model will already be present in the MODELS_DIR directory
!$FT_HOME/fasttext skipgram -input brown_corp.txt -output brown_ft_no_ng -lr 0.05 -dim 100 -ws 5 -epoch 5 -minCount 5 -neg 5 -loss ns -t 0.0001 -maxn 0

Read 1M words
Progress: 100.0%  words/sec/thread: 55755  lr: 0.000001  loss: 2.356848  eta: 0h0m 
Train time: 31.000000 sec
CPU times: user 1.32 s, sys: 240 ms, total: 1.56 s
Wall time: 57.7 s


In [11]:
%%time
# Training fastText skipgram model on text8 corpus without n-grams
# If you chose to download the models, this model will already be present in the MODELS_DIR directory
!$FT_HOME/fasttext skipgram -input text8 -output text8_ft_no_ng -lr 0.05 -dim 100 -ws 5 -epoch 5 -minCount 5 -neg 5 -loss ns -t 0.0001 -maxn 0

Read 17M words
Progress: 100.0%  words/sec/thread: 49050  lr: 0.000001  loss: 1.879224  eta: 0h0m 
Train time: 514.000000 sec
CPU times: user 13.2 s, sys: 1.89 s, total: 15.1 s
Wall time: 9min 16s


In [8]:
print('Loading FastText embeddings')
brown_ft_no_ng = Word2Vec.load_word2vec_format(MODELS_DIR + 'brown_ft_no_ng.vec')
print('Accuracy for FastText (without n-grams):')
print_accuracy(brown_ft_no_ng, word_analogies_file)

print('Accuracy for Word2Vec:')
print_accuracy(brown_gs, word_analogies_file)

print('Accuracy for FastText (with n-grams):')
print_accuracy(brown_ft, word_analogies_file)


2016-08-08 18:14:40,553 : INFO : loading projection weights from models/brown_ft_no_ng.vec


Loading FastText embeddings


2016-08-08 18:14:42,619 : INFO : loaded (15173, 100) matrix from models/brown_ft_no_ng.vec
2016-08-08 18:14:42,881 : INFO : precomputing L2-norms of word weight vectors


Accuracy for FastText (without n-grams):
Evaluating...



2016-08-08 18:14:44,144 : INFO : family: 17.6% (32/182)
2016-08-08 18:14:49,098 : INFO : gram1-adjective-to-adverb: 0.1% (1/702)
2016-08-08 18:14:50,116 : INFO : gram2-opposite: 0.0% (0/132)
2016-08-08 18:14:55,390 : INFO : gram3-comparative: 2.8% (30/1056)
2016-08-08 18:14:57,268 : INFO : gram4-superlative: 0.5% (1/210)
2016-08-08 18:15:02,244 : INFO : gram5-present-participle: 1.4% (9/650)
2016-08-08 18:15:11,160 : INFO : gram7-past-tense: 1.0% (12/1260)
2016-08-08 18:15:13,591 : INFO : gram8-plural: 6.3% (35/552)
2016-08-08 18:15:16,559 : INFO : gram9-plural-verbs: 1.2% (4/342)
2016-08-08 18:15:16,562 : INFO : total: 2.4% (124/5086)



Semantic: 32/182, Accuracy: 17.58%
Syntactic: 92/4904, Accuracy: 1.88%

Accuracy for Word2Vec:
Evaluating...



2016-08-08 18:15:17,910 : INFO : family: 24.2% (44/182)
2016-08-08 18:15:25,713 : INFO : gram1-adjective-to-adverb: 0.6% (4/702)
2016-08-08 18:15:26,494 : INFO : gram2-opposite: 0.8% (1/132)
2016-08-08 18:15:33,222 : INFO : gram3-comparative: 3.4% (36/1056)
2016-08-08 18:15:34,613 : INFO : gram4-superlative: 1.4% (3/210)
2016-08-08 18:15:40,954 : INFO : gram5-present-participle: 0.8% (5/650)
2016-08-08 18:15:54,275 : INFO : gram7-past-tense: 1.4% (18/1260)
2016-08-08 18:15:58,115 : INFO : gram8-plural: 4.9% (27/552)
2016-08-08 18:15:59,760 : INFO : gram9-plural-verbs: 1.5% (5/342)
2016-08-08 18:15:59,762 : INFO : total: 2.8% (143/5086)



Semantic: 44/182, Accuracy: 24.18%
Syntactic: 99/4904, Accuracy: 2.02%

Accuracy for FastText (with n-grams):
Evaluating...



2016-08-08 18:16:00,606 : INFO : family: 14.8% (27/182)
2016-08-08 18:16:05,544 : INFO : gram1-adjective-to-adverb: 73.5% (516/702)
2016-08-08 18:16:06,491 : INFO : gram2-opposite: 81.8% (108/132)
2016-08-08 18:16:13,179 : INFO : gram3-comparative: 61.5% (649/1056)
2016-08-08 18:16:14,931 : INFO : gram4-superlative: 68.6% (144/210)
2016-08-08 18:16:19,684 : INFO : gram5-present-participle: 67.1% (436/650)
2016-08-08 18:16:29,727 : INFO : gram7-past-tense: 11.5% (145/1260)
2016-08-08 18:16:33,346 : INFO : gram8-plural: 60.5% (334/552)
2016-08-08 18:16:37,013 : INFO : gram9-plural-verbs: 71.1% (243/342)
2016-08-08 18:16:37,017 : INFO : total: 51.2% (2602/5086)



Semantic: 27/182, Accuracy: 14.84%
Syntactic: 2575/4904, Accuracy: 52.51%



A-ha! The results for FastText with no n-grams and Word2Vec look a lot more similar (as they should) - the differences could easily result from differences in implementation between fastText and Gensim, and randomization. Especially telling is that the semantic accuracy for FastText has more or less remained the same after removing n-grams, while the syntactic accuracy has taken a giant dive. Our hypothesis that the char n-grams result in better performance on syntactic analogies seems fair.

Let's try with a larger corpus now - text8 (collection of wiki articles). I'm also curious about the impact on semantic accuracy - for models trained on the brown corpus, the difference in the semantic accuracy and the accuracy values themselves are too small to be conclusive. Hopefully a larger corpus helps, and the text8 corpus likely has a lot more information about capitals, currencies, cities etc, which should be relevant to the semantic tasks.

In [9]:
print('Loading FastText embeddings')
text8_ft_no_ng = Word2Vec.load_word2vec_format(MODELS_DIR + 'text8_ft_no_ng.vec')
print('Accuracy for FastText (without n-grams):')
print_accuracy(text8_ft_no_ng, word_analogies_file)

print('Loading Gensim embeddings')
text8_gs = Word2Vec.load_word2vec_format(MODELS_DIR + 'text8_gs.vec')
print('Accuracy for word2vec:')
print_accuracy(text8_gs, word_analogies_file)

print('Loading FastText embeddings (with n-grams)')
text8_ft = Word2Vec.load_word2vec_format(MODELS_DIR + 'text8_ft.vec')
print('Accuracy for FastText (with n-grams):')
print_accuracy(text8_ft, word_analogies_file)

2016-08-08 18:16:44,115 : INFO : loading projection weights from models/text8_ft_no_ng.vec


Loading FastText embeddings


2016-08-08 18:16:58,690 : INFO : loaded (71290, 100) matrix from models/text8_ft_no_ng.vec


Accuracy for FastText (without n-grams):
Evaluating...



2016-08-08 18:16:59,241 : INFO : precomputing L2-norms of word weight vectors
2016-08-08 18:17:08,464 : INFO : capital-common-countries: 71.5% (362/506)
2016-08-08 18:17:27,746 : INFO : capital-world: 48.6% (706/1452)
2016-08-08 18:17:31,516 : INFO : currency: 22.0% (59/268)
2016-08-08 18:17:54,621 : INFO : city-in-state: 23.7% (358/1511)
2016-08-08 18:17:59,452 : INFO : family: 63.7% (195/306)
2016-08-08 18:18:09,547 : INFO : gram1-adjective-to-adverb: 14.4% (109/756)
2016-08-08 18:18:14,506 : INFO : gram2-opposite: 14.4% (44/306)
2016-08-08 18:18:33,499 : INFO : gram3-comparative: 41.7% (526/1260)
2016-08-08 18:18:39,790 : INFO : gram4-superlative: 27.7% (140/506)
2016-08-08 18:18:55,154 : INFO : gram5-present-participle: 24.0% (238/992)
2016-08-08 18:19:15,897 : INFO : gram6-nationality-adjective: 78.8% (1080/1371)
2016-08-08 18:19:38,978 : INFO : gram7-past-tense: 35.1% (467/1332)
2016-08-08 18:19:53,881 : INFO : gram8-plural: 52.9% (525/992)
2016-08-08 18:20:02,998 : INFO : gram9-


Semantic: 1680/4043, Accuracy: 41.55%
Syntactic: 3301/8165, Accuracy: 40.43%

Loading Gensim embeddings


2016-08-08 18:20:18,229 : INFO : loaded (71290, 100) matrix from models/text8_gs.vec
2016-08-08 18:20:18,482 : INFO : precomputing L2-norms of word weight vectors


Accuracy for word2vec:
Evaluating...



2016-08-08 18:20:23,125 : INFO : capital-common-countries: 67.2% (340/506)
2016-08-08 18:20:45,232 : INFO : capital-world: 45.5% (661/1452)
2016-08-08 18:20:50,051 : INFO : currency: 22.4% (60/268)
2016-08-08 18:21:09,436 : INFO : city-in-state: 23.0% (361/1571)
2016-08-08 18:21:14,049 : INFO : family: 57.8% (177/306)
2016-08-08 18:21:25,828 : INFO : gram1-adjective-to-adverb: 16.4% (124/756)
2016-08-08 18:21:30,659 : INFO : gram2-opposite: 10.8% (33/306)
2016-08-08 18:21:48,540 : INFO : gram3-comparative: 53.1% (669/1260)
2016-08-08 18:21:55,544 : INFO : gram4-superlative: 30.8% (156/506)
2016-08-08 18:22:09,282 : INFO : gram5-present-participle: 25.3% (251/992)
2016-08-08 18:22:25,471 : INFO : gram6-nationality-adjective: 77.5% (1063/1371)
2016-08-08 18:22:45,242 : INFO : gram7-past-tense: 32.0% (426/1332)
2016-08-08 18:23:03,272 : INFO : gram8-plural: 51.1% (507/992)
2016-08-08 18:23:11,655 : INFO : gram9-plural-verbs: 27.4% (178/650)
2016-08-08 18:23:11,660 : INFO : total: 40.8% (5


Semantic: 1599/4103, Accuracy: 38.97%
Syntactic: 3407/8165, Accuracy: 41.73%

Loading FastText embeddings (with n-grams)


2016-08-08 18:23:25,010 : INFO : loaded (71290, 100) matrix from models/text8_ft.vec
2016-08-08 18:23:25,208 : INFO : precomputing L2-norms of word weight vectors


Accuracy for FastText (with n-grams):
Evaluating...



2016-08-08 18:23:34,556 : INFO : capital-common-countries: 57.5% (291/506)
2016-08-08 18:24:00,347 : INFO : capital-world: 42.2% (613/1452)
2016-08-08 18:24:04,067 : INFO : currency: 11.9% (32/268)
2016-08-08 18:24:30,388 : INFO : city-in-state: 18.3% (277/1511)
2016-08-08 18:24:36,499 : INFO : family: 51.6% (158/306)
2016-08-08 18:24:48,265 : INFO : gram1-adjective-to-adverb: 74.5% (563/756)
2016-08-08 18:24:53,082 : INFO : gram2-opposite: 59.8% (183/306)
2016-08-08 18:25:13,457 : INFO : gram3-comparative: 68.7% (865/1260)
2016-08-08 18:25:20,501 : INFO : gram4-superlative: 53.2% (269/506)
2016-08-08 18:25:35,751 : INFO : gram5-present-participle: 57.1% (566/992)
2016-08-08 18:25:58,592 : INFO : gram6-nationality-adjective: 94.7% (1299/1371)
2016-08-08 18:26:16,917 : INFO : gram7-past-tense: 36.6% (487/1332)
2016-08-08 18:26:30,900 : INFO : gram8-plural: 91.3% (906/992)
2016-08-08 18:26:41,516 : INFO : gram9-plural-verbs: 56.6% (368/650)
2016-08-08 18:26:41,520 : INFO : total: 56.3% (


Semantic: 1371/4043, Accuracy: 33.91%
Syntactic: 5506/8165, Accuracy: 67.43%



With the text8 corpus, we observe a similar pattern. Semantic accuracies for all three models are in the same range, while FastText with n-grams performs far better on the syntactic analogies.
The semantic accuracy for all models increases significantly with the increase in corpus size. However, the increase in syntactic accuracy from the increase in corpus size for the n-gram FastText model is lower (in both relative and absolute terms) for the n-gram FastText model. This could possibly indicate that advantages gained by incorporating morphological information could be less significant in case of larger corpus sizes (the corpuses used in the original paper seem to indicate this too).

# Conclusions

These preliminary results seem to indicate fastText embeddings might be better than word2vec at encoding semantic and especially syntactic information. This is expected, since most syntactic analogies are morphology based, and the char n-gram approach of fastText takes such information into account. It'd be interesting to see how transferable these embeddings are by comparing their performance in a downstream supervised task.

# References

[1] [Enriching Word Vectors with Subword Information](https://arxiv.org/pdf/1607.04606v1.pdf)

[2] [Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/pdf/1301.3781v3.pdf)