## **fastText for NLP tasks**

---



### **Download and explore pre-trained models**

In [1]:
!pip install fasttext

Collecting fasttext
  Downloading fasttext-0.9.3.tar.gz (73 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/73.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.4/73.4 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting pybind11>=2.2 (from fasttext)
  Using cached pybind11-2.13.6-py3-none-any.whl.metadata (9.5 kB)
Using cached pybind11-2.13.6-py3-none-any.whl (243 kB)
Building wheels for collected packages: fasttext
  Building wheel for fasttext (pyproject.toml) ... [?25l[?25hdone
  Created wheel for fasttext: filename=fasttext-0.9.3-cp310-cp310-linux_x86_64.whl size=4296183 sha256=7e2da65d302469d1ad163f434e06da612659b8a9c1e2d62b38b6fc9f15176e42
  Stored in directory: /root/.cache/pip/wheels/0d/a2/00/81db54d3e6a8199b829d58

## **(1) Explore English Model**

### **Word vectors for 157 languages**

We distribute pre-trained word vectors for 157 languages, trained on Common Crawl and Wikipedia using fastText. These models were trained using CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives. We also distribute three new word analogy datasets, for French, Hindi and Polish.

web: https://fasttext.cc/

get started: https://fasttext.cc/docs/en/support.html#building-fasttext-python-module

installation  guide + model list + model guide: https://fasttext.cc/docs/en/crawl-vectors.html

In [3]:
import fasttext

### **Downloading the model Ⓜ**

**English word vectors**

This page gathers several pre-trained word vectors trained using fastText.

Download pre-trained word vectors
Pre-trained word vectors learned on different sources can be downloaded below:

1. wiki-news-300d-1M.vec.zip: 1 million word vectors trained on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
2. wiki-news-300d-1M-subword.vec.zip: 1 million word vectors trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens).
3. crawl-300d-2M.vec.zip: 2 million word vectors trained on Common Crawl (600B tokens).
4. crawl-300d-2M-subword.zip: 2 million word vectors trained with subword information on Common Crawl (600B tokens).

doc: https://fasttext.cc/docs/en/english-vectors.html

In [2]:
import fasttext.util
fasttext.util.download_model('en', if_exists='ignore')  # English
ft = fasttext.load_model('cc.en.300.bin')

Downloading https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz



In [None]:
# import fasttext
# model_en = fasttext.load_model('C:\\Code\\nlp-tutorials\\downloads\\cc.en.300.bin')

# loading manually downlaoded model

In [4]:
ft.get_nearest_neighbors('good')

#find out nearest simillar words for given token/ word

[(0.7517593502998352, 'bad'),
 (0.7426098585128784, 'great'),
 (0.7299689054489136, 'decent'),
 (0.7123614549636841, 'nice'),
 (0.6796907186508179, 'Good'),
 (0.6737031936645508, 'excellent'),
 (0.669592022895813, 'goood'),
 (0.6602178812026978, 'ggod'),
 (0.6479219794273376, 'semi-good'),
 (0.6417751908302307, 'good.Good')]

In [6]:
dir(ft)  #know about available methods

['__class__',
 '__contains__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_labels',
 '_words',
 'f',
 'get_analogies',
 'get_dimension',
 'get_input_matrix',
 'get_input_vector',
 'get_label_id',
 'get_labels',
 'get_line',
 'get_meter',
 'get_nearest_neighbors',
 'get_output_matrix',
 'get_sentence_vector',
 'get_subword_id',
 'get_subwords',
 'get_word_id',
 'get_word_vector',
 'get_words',
 'is_quantized',
 'labels',
 'predict',
 'quantize',
 'save_model',
 'set_args',
 'set_matrices',
 'test',
 'test_label',
 'words']

In [None]:
ft.get_word_vector("good")

In [7]:
ft.get_word_vector("good").shape

(300,)

In [8]:
ft.get_analogies("berlin","germany","france")

#identify relationships between words based on 1st 2 input tokens
#simply analogy is that

[(0.7303731441497803, 'paris'),
 (0.6408537030220032, 'france.'),
 (0.6393311023712158, 'avignon'),
 (0.6316676139831543, 'paris.'),
 (0.5895596742630005, 'montpellier'),
 (0.5884554386138916, 'rennes'),
 (0.5850598812103271, 'grenoble'),
 (0.5832924246788025, 'london'),
 (0.5806092619895935, 'strasbourg'),
 (0.574320375919342, 'Paris.')]

In [9]:
ft.get_analogies("berlin","germany","india")

[(0.7148876190185547, 'delhi'),
 (0.6974374055862427, 'mumbai'),
 (0.648612916469574, 'jaipur'),
 (0.6349966526031494, 'kolkata'),
 (0.6279922723770142, 'pune'),
 (0.6277596354484558, 'bangalore'),
 (0.6044078469276428, 'hyderabad'),
 (0.6021745800971985, 'noida'),
 (0.6018899083137512, 'bhubaneswar'),
 (0.599077582359314, 'nashik')]

In [10]:
ft.get_analogies("driving","car","phone")

[(0.610385537147522, 'texting'),
 (0.5203558802604675, 'phone-calling'),
 (0.5153835415840149, 'cellphone'),
 (0.5135326981544495, 'cell-phone'),
 (0.5117910504341125, 'dialing'),
 (0.5087355971336365, 'texing'),
 (0.5079342722892761, 'text-messaging'),
 (0.500900387763977, 'txting'),
 (0.4960441589355469, 'texting.'),
 (0.4951859414577484, 'Texting')]

In [11]:
ft.get_analogies("driving","car","book")

[(0.5302355885505676, 'reading'),
 (0.517051637172699, 'book.I'),
 (0.5137901306152344, 'book--and'),
 (0.5090512633323669, 'book.That'),
 (0.5005884766578674, 'book--it'),
 (0.49395182728767395, 'book--I'),
 (0.49293914437294006, 're-reading'),
 (0.49156999588012695, 'book.This'),
 (0.49107635021209717, 'reading--and'),
 (0.48960915207862854, 'book--the')]

In [12]:
ft.get_nearest_neighbors("chutney")

[(0.8078702092170715, 'chutneys'),
 (0.7138292789459229, 'thokku'),
 (0.701572060585022, 'Chutney'),
 (0.6875490546226501, 'achaar'),
 (0.684525728225708, 'piccalilli'),
 (0.6737173199653625, 'raita'),
 (0.6715506911277771, 'chatni'),
 (0.6610829830169678, 'chutney.'),
 (0.6505922675132751, 'gojju'),
 (0.6398508548736572, 'kasundi')]

In [14]:
ft.get_nearest_neighbors("kottu")

[(0.7910846471786499, 'kothu'),
 (0.7650530934333801, 'Kottu'),
 (0.7085487842559814, 'kozhi'),
 (0.6973536014556885, 'biriyani'),
 (0.6850562691688538, 'Kothu'),
 (0.6849375367164612, 'paav'),
 (0.6835299730300903, 'bhath'),
 (0.6803717017173767, 'bhaaji'),
 (0.6743912696838379, 'appams'),
 (0.6742650270462036, 'parippu')]

In [15]:
ft.get_nearest_neighbors("saragva", k=3)

#for some rare words it gives some garbage outputs rarely. out of the datasets.

[(0.5384978652000427,
  'ReportsTabloidCrimeYakuzaTokyoGinzaIkebukuroKabukichoRoppongiShibuyaShimbashiShinjukuUenoJapanChibaFukuokaKobeKyotoNagoyaOkinawaOsakaSaitamaYokohamaSportsBaseballHorse'),
 (0.5373231768608093,
  'NoidaVaranasiBareillyMathuraAligarhMoradabadSaharanpurBijnorJaunpurGorakhpurMuzaffarnagarSultanpurDehradunHaridwarNainitalRoorkeeGarhwalBardhamanMurshidabadHooghlyMedinipurNorth'),
 (0.5331498980522156,
  'NagarBhiwaniKarnalKurukshetraMahendragarhSirsaPanipatJindJhajjarRewariSolanShimlaKangraHamirpurMandiJammuSrinagarRanchiJamshedpurMangaloreMysoreBelgaumGulbargaTumkurBijapurDavanagereDharwadShimogaUdupiHassanBidarHubliKolarBagalkotKannadaChitradurgaMandyaGadagBellaryRaichurThiruvananthapuramThrissurErnakulamMalappuramKochiKottayamKannurKozhikodeKollamPalakkadPathanamthittaCalicutTrivandrumAlappuzhaKasaragodBhopalIndoreGwaliorJabalpurUjjainSagarChhatarpurPuneNagpurAurangabadNashikKolhapurAhmed')]

## **(2) Explore Hindi Model**

In [1]:
import fasttext.util
fasttext.util.download_model('hi', if_exists='ignore')  # hindi
model_hi = fasttext.load_model('cc.hi.300.bin')

In [2]:
model_hi.get_nearest_neighbors("अच्छा")

[(0.6697985529899597, 'बुरा'),
 (0.6132625341415405, 'अच्छे'),
 (0.608695387840271, 'अच्चा'),
 (0.6058669090270996, 'अच्छाखासा'),
 (0.5848375558853149, 'कीअच्छा'),
 (0.5826330184936523, 'औरअच्छा'),
 (0.5811230540275574, 'हो.अच्छा'),
 (0.5805407762527466, 'हीअच्छा'),
 (0.5795978307723999, 'लगता'),
 (0.5777745246887207, '58अच्छा')]

In [3]:
model_hi.get_nearest_neighbors("गाय")

[(0.6485272645950317, 'गायों'),
 (0.6403631567955017, 'गोमाता'),
 (0.6264104247093201, 'बछड़े'),
 (0.6045769453048706, 'बछडे'),
 (0.6030024886131287, 'दुधारु'),
 (0.5880257487297058, 'भेंस'),
 (0.5822192430496216, 'भैंस'),
 (0.5819100737571716, 'दुधारू'),
 (0.5773836970329285, 'गौमाता'),
 (0.5771614909172058, 'गायें')]


### **Custom train word embeddings on indian food receipes 😋**

dataset credits: https://www.kaggle.com/datasets/sooryaprakash12/cleaned-indian-recipes-dataset

In [4]:
import pandas as pd

df = pd.read_csv("/Cleaned_Indian_Food_Dataset.csv")
print(df.shape)
df.head(3)

(5938, 9)


Unnamed: 0,TranslatedRecipeName,TranslatedIngredients,TotalTimeInMins,Cuisine,TranslatedInstructions,URL,Cleaned-Ingredients,image-url,Ingredient-count
0,Masala Karela Recipe,"1 tablespoon Red Chilli powder,3 tablespoon Gr...",45,Indian,"To begin making the Masala Karela Recipe,de-se...",https://www.archanaskitchen.com/masala-karela-...,"salt,amchur (dry mango powder),karela (bitter ...",https://www.archanaskitchen.com/images/archana...,10
1,Spicy Tomato Rice (Recipe),"2 teaspoon cashew - or peanuts, 1/2 Teaspoon ...",15,South Indian Recipes,"To make tomato puliogere, first cut the tomato...",https://www.archanaskitchen.com/spicy-tomato-r...,"tomato,salt,chickpea lentils,green chilli,rice...",https://www.archanaskitchen.com/images/archana...,12
2,Ragi Semiya Upma Recipe - Ragi Millet Vermicel...,"1 Onion - sliced,1 teaspoon White Urad Dal (Sp...",50,South Indian Recipes,"To begin making the Ragi Vermicelli Recipe, fi...",https://www.archanaskitchen.com/ragi-vermicell...,"salt,rice vermicelli noodles (thin),asafoetida...",https://www.archanaskitchen.com/images/archana...,12


* when we are using CBOW or SkipGram it's an unsupervised approach. it will look scanned through bunch of text and it will create training samples on it's own based on context and target word and then it will train the model.

In [5]:
df.TranslatedInstructions[0]

'To begin making the Masala Karela Recipe,de-seed the karela and slice.\nDo not remove the skin as the skin has all the nutrients.\nAdd the karela to the pressure cooker with 3 tablespoon of water, salt and turmeric powder and pressure cook for three whistles.\nRelease the pressure immediately and open the lids.\nKeep aside.Heat oil in a heavy bottomed pan or a kadhai.\nAdd cumin seeds and let it sizzle.Once the cumin seeds have sizzled, add onions and saute them till it turns golden brown in color.Add the karela, red chilli powder, amchur powder, coriander powder and besan.\nStir to combine the masalas into the karela.Drizzle a little extra oil on the top and mix again.\nCover the pan and simmer Masala Karela stirring occasionally until everything comes together well.\nTurn off the heat.Transfer Masala Karela into a serving bowl and serve.Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family.\n'

In [6]:
import re

text = 'To begin making the Masala Karela Recipe,de-seed the karela and slice.\nDo not remove the skin as the skin has all the nutrients.\nAdd the karela to the pressure cooker with 3 tablespoon of water, salt and turmeric powder and pressure cook for three whistles.\nRelease the pressure immediately and open the lids.\nKeep aside.Heat oil in a heavy bottomed pan or a kadhai.\nAdd cumin seeds and let it sizzle.Once the cumin seeds have sizzled, add onions and saute them till it turns golden brown in color.Add the karela, red chilli powder, amchur powder, coriander powder and besan.\nStir to combine the masalas into the karela.Drizzle a little extra oil on the top and mix again.\nCover the pan and simmer Masala Karela stirring occasionally until everything comes together well.\nTurn off the heat.Transfer Masala Karela into a serving bowl and serve.Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family.\n'

re.sub(r"[^\w\s]", " ", text, flags=re.MULTILINE)

# remove whitespaces , punctuations and special charachters using RegEx

'To begin making the Masala Karela Recipe de seed the karela and slice \nDo not remove the skin as the skin has all the nutrients \nAdd the karela to the pressure cooker with 3 tablespoon of water  salt and turmeric powder and pressure cook for three whistles \nRelease the pressure immediately and open the lids \nKeep aside Heat oil in a heavy bottomed pan or a kadhai \nAdd cumin seeds and let it sizzle Once the cumin seeds have sizzled  add onions and saute them till it turns golden brown in color Add the karela  red chilli powder  amchur powder  coriander powder and besan \nStir to combine the masalas into the karela Drizzle a little extra oil on the top and mix again \nCover the pan and simmer Masala Karela stirring occasionally until everything comes together well \nTurn off the heat Transfer Masala Karela into a serving bowl and serve Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family \n'

In [15]:
re.sub(" +"," ","powder  masala powder    coriander")
#if the corpus have more than one spaces ( repititive spaces ) replace that with a singal space

'powder masala powder coriander'

In [16]:
re.sub("[ \n]+"," ","powder  masala powder   \n  coriander")
#remove repititive spaces and \n with single space

'powder masala powder coriander'

In [17]:
def preprocess(text):
    text = re.sub(r'[^\w\s\']',' ', text)
    text = re.sub(r'[ \n]+', ' ', text)
    return text.strip().lower()

# this will remove leading and legging spaces from the text. 'text.strip()'
# convert to smallar case  '.lower()'

In [8]:
text = 'To begin making the Masala Karela Recipe,de-seed the karela and slice.\nDo not remove the skin as the skin has all the nutrients.\nAdd the karela to the pressure cooker with 3 tablespoon of water, salt and turmeric powder and pressure cook for three whistles.\nRelease the pressure immediately and open the lids.\nKeep aside.Heat oil in a heavy bottomed pan or a kadhai.\nAdd cumin seeds and let it sizzle.Once the cumin seeds have sizzled, add onions and saute them till it turns golden brown in color.Add the karela, red chilli powder, amchur powder, coriander powder and besan.\nStir to combine the masalas into the karela.Drizzle a little extra oil on the top and mix again.\nCover the pan and simmer Masala Karela stirring occasionally until everything comes together well.\nTurn off the heat.Transfer Masala Karela into a serving bowl and serve.Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family.\n'

preprocess(text)

'to begin making the masala karela recipe de seed the karela and slice do not remove the skin as the skin has all the nutrients add the karela to the pressure cooker with 3 tablespoon of water salt and turmeric powder and pressure cook for three whistles release the pressure immediately and open the lids keep aside heat oil in a heavy bottomed pan or a kadhai add cumin seeds and let it sizzle once the cumin seeds have sizzled add onions and saute them till it turns golden brown in color add the karela red chilli powder amchur powder coriander powder and besan stir to combine the masalas into the karela drizzle a little extra oil on the top and mix again cover the pan and simmer masala karela stirring occasionally until everything comes together well turn off the heat transfer masala karela into a serving bowl and serve serve masala karela along with panchmel dal and phulka for a weekday meal with your family'

In [19]:
df.TranslatedInstructions = df.TranslatedInstructions.map(preprocess)
#in map function it will map all the entries in that column by created preprocess function.

In [21]:
df.head(3)

Unnamed: 0,TranslatedRecipeName,TranslatedIngredients,TotalTimeInMins,Cuisine,TranslatedInstructions,URL,Cleaned-Ingredients,image-url,Ingredient-count
0,Masala Karela Recipe,"1 tablespoon Red Chilli powder,3 tablespoon Gr...",45,Indian,to begin making the masala karela recipe de se...,https://www.archanaskitchen.com/masala-karela-...,"salt,amchur (dry mango powder),karela (bitter ...",https://www.archanaskitchen.com/images/archana...,10
1,Spicy Tomato Rice (Recipe),"2 teaspoon cashew - or peanuts, 1/2 Teaspoon ...",15,South Indian Recipes,to make tomato puliogere first cut the tomatoe...,https://www.archanaskitchen.com/spicy-tomato-r...,"tomato,salt,chickpea lentils,green chilli,rice...",https://www.archanaskitchen.com/images/archana...,12
2,Ragi Semiya Upma Recipe - Ragi Millet Vermicel...,"1 Onion - sliced,1 teaspoon White Urad Dal (Sp...",50,South Indian Recipes,to begin making the ragi vermicelli recipe fir...,https://www.archanaskitchen.com/ragi-vermicell...,"salt,rice vermicelli noodles (thin),asafoetida...",https://www.archanaskitchen.com/images/archana...,12


In [20]:
df.TranslatedInstructions[0]

'to begin making the masala karela recipe de seed the karela and slice do not remove the skin as the skin has all the nutrients add the karela to the pressure cooker with 3 tablespoon of water salt and turmeric powder and pressure cook for three whistles release the pressure immediately and open the lids keep aside heat oil in a heavy bottomed pan or a kadhai add cumin seeds and let it sizzle once the cumin seeds have sizzled add onions and saute them till it turns golden brown in color add the karela red chilli powder amchur powder coriander powder and besan stir to combine the masalas into the karela drizzle a little extra oil on the top and mix again cover the pan and simmer masala karela stirring occasionally until everything comes together well turn off the heat transfer masala karela into a serving bowl and serve serve masala karela along with panchmel dal and phulka for a weekday meal with your family'

* the way fastText works is we need have a specific format file whenever we want train the model.

* CBOW or SkipGram are having an unsupervised form of training. we just need the raw text.

In [22]:
df.to_csv("food_receipes.txt", columns=["TranslatedInstructions"], header=None, index=False)

### **Train the customized model based on fastText model ( fasttext_model'cc.hi.300.bin ) using our data set**

In [31]:
import fasttext

model = fasttext.train_unsupervised("/content/food_receipes.txt")

* it is using unsupervised learning approach (CBOW / SkipGram) by default is uses SkipGram.
* it is going through all the text in that file (food_receipes.txt)
* it will create taining paires having context and target words for any pair of words.
* the it train the model. after that it got the word vectors.

In [32]:
model.get_nearest_neighbors("paneer")

[(0.7046746611595154, 'tikka'),
 (0.6630706191062927, 'tikkas'),
 (0.6622522473335266, 'tandoori'),
 (0.6518504619598389, 'bhurji'),
 (0.6466901302337646, 'reshmi'),
 (0.6369193196296692, 'nawabi'),
 (0.6190375685691833, 'makhanwala'),
 (0.6179590821266174, 'hariyali'),
 (0.6143130660057068, 'makhani'),
 (0.5987952351570129, 'malai')]

In [33]:
model.get_nearest_neighbors("chutney")

[(0.9275704622268677, 'chutneys'),
 (0.7463748455047607, 'dhaniya'),
 (0.7132056951522827, 'imli'),
 (0.7042087316513062, 'khajur'),
 (0.6639349460601807, 'kanchipuram'),
 (0.6590506434440613, 'pudina'),
 (0.6549491286277771, 'gothsu'),
 (0.6544407606124878, 'chammanthi'),
 (0.6525646448135376, 'south'),
 (0.6511055827140808, 'madurai')]

In [34]:
model.get_nearest_neighbors("halwa")

[(0.7467402815818787, 'khoya'),
 (0.7186369299888611, 'burfi'),
 (0.7104381322860718, 'rabri'),
 (0.6857462525367737, 'mawa'),
 (0.6752265095710754, 'badam'),
 (0.672613799571991, 'sheera'),
 (0.6673717498779297, 'kheer'),
 (0.6628114581108093, 'mohan'),
 (0.6588674187660217, 'basundi'),
 (0.6500763297080994, 'doodh')]

In [35]:
model.get_nearest_neighbors("dosa")

[(0.8473756909370422, 'dosai'),
 (0.8177902698516846, 'dosas'),
 (0.7941131591796875, "dosa's"),
 (0.7563254237174988, 'uthappam'),
 (0.7445687055587769, 'uttapam'),
 (0.7228896021842957, 'kanchipuram'),
 (0.7157869338989258, 'dose'),
 (0.7090448141098022, 'neer'),
 (0.7085314393043518, 'pesarattu'),
 (0.7060192227363586, 'chembaruthi')]

In [36]:
model.get_nearest_neighbors("moong")

[(0.7587465047836304, 'sprouted'),
 (0.7246676683425903, 'moth'),
 (0.7035178542137146, 'horse'),
 (0.6858246326446533, 'mooga'),
 (0.6832014918327332, 'horsegram'),
 (0.6807715892791748, 'dal'),
 (0.6562758684158325, 'moolangi'),
 (0.6540908217430115, 'sprout'),
 (0.6464773416519165, 'moongphali'),
 (0.6430541276931763, 'tuvar')]

In [39]:
model.get_word_vector("dosa")

array([-0.11950116,  0.28583047, -0.4654494 ,  0.08693456, -0.35507378,
        0.7289869 ,  0.20280266, -0.11838338,  0.23742546, -0.19064467,
       -0.31924918,  0.13865896, -0.6260956 , -0.6092955 , -0.23994708,
        0.23472966,  0.2875553 , -0.05696383,  0.35141674,  0.04170053,
       -0.6357446 , -0.18519005, -0.16451463,  0.16385582,  0.00592823,
       -0.02217221,  0.741354  , -0.13828816, -0.5270438 , -0.1519697 ,
       -0.20615944, -1.2354382 , -0.08859859, -0.19523752, -0.63037205,
       -0.09277713,  0.24950927, -0.00869391, -0.98898953,  0.44255862,
       -0.4225871 ,  0.02535464,  0.45536876,  0.8443987 ,  0.28843385,
       -0.08182858,  0.2496074 , -0.7621703 , -0.1465675 ,  0.02497284,
        0.06998996,  0.18063627, -0.0653295 , -0.13767588,  0.25840017,
       -0.05835231,  0.6013155 ,  0.1673816 , -0.43791327, -0.2440163 ,
       -0.18183254,  0.09999452,  0.63193876, -0.52141434, -0.17266056,
        0.33686408, -0.75262123,  0.16485716,  0.06816921, -0.31

In [37]:
model.get_word_vector("dosa").shape

(100,)

In [38]:
model.get_nearest_neighbors("saragva")

[(0.8921212553977966, 'fansi'),
 (0.8687644004821777, 'bhoplya'),
 (0.8530213832855225, 'phanu'),
 (0.8452553749084473, 'agathi'),
 (0.8449345827102661, 'saagu'),
 (0.8402233719825745, 'sookhi'),
 (0.8388527631759644, 'vaangi'),
 (0.8383015394210815, 'bhuga'),
 (0.8374607563018799, 'sukhi'),
 (0.8336049318313599, 'phalguni')]

* compare the outputs with early outputs that given by original fastText wiki model

follow official : https://fasttext.cc/docs/en/supervised-tutorial.html


source: https://fasttext.cc/docs/en/unsupervised-tutorial.html

for details on parameters in train_unsupervised function. Based on the need one can use following parameters for fine tunning,

1. epochs = Default value is 5. Epoch is how many times it will loop over the same dataset for the training
2. lr = Learning rate
3. thread = Number of threads for the training