# Fasttext implementation in Gensim
For code examples etc. see the [tutorial](https://github.com/RaRe-Technologies/gensim/blob/de8657e9b8d5192750296b6765175c31c8bb3298/docs/notebooks/FastText_Tutorial.ipynb).

Currently the code includes two types of warning suppression - check in future installations whether they are still necessary. The first warning only occurs on Windows machines and tells the user that an alternative function will be used. The second warnings line suppresses the "FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated" warning which comes from Numpy.

In [1]:
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
warnings.filterwarnings(action='ignore', category=FutureWarning, module='gensim' )
import gensim
import pickle
from gensim.models.word2vec import LineSentence
from gensim.models.fasttext import FastText as FT_gensim

In [2]:
file = "corpus/sux_lemm.txt"
data = LineSentence(file)

In [3]:
model_gensim = FT_gensim(data, min_count=5, window=10, size=100, negative=20, sorted_vocab=1, min_n=1, 
                         max_n=6, sg=1, iter=100)
# with these parameters runtime appr. 3.5 hours on recent MacBook Pro
# iter is for epochs; sg is skipgram (sg = 0 is cbow)

In [4]:
model_gensim.wv.most_similar(positive=["urmah[lion]N"])

[('ur[dog]N', 0.8886517286300659),
 ('urgir[dog]N', 0.8823392391204834),
 ('gu[eat]V/t', 0.8587689399719238),
 ('urbara[wolf]N', 0.8540603518486023),
 ('eden[back]N', 0.8312609195709229),
 ('dusu[equid]N', 0.8308379650115967),
 ('gud[ox]N', 0.8304731845855713),
 ('ŋiri[foot]N', 0.8284202814102173),
 ('agaʾus[soldier]N', 0.8192402124404907),
 ('sisi[horse]N', 0.8191594481468201)]

In [5]:
filename = "model/model_lemm.model"
model_gensim.save(filename)