### Why is FastText, fast? 
* https://www.analyticsvidhya.com/blog/2023/01/introduction-to-fasttext-embeddings-and-its-implication/#:~:text=FastText%20can%20provide%20better%20embeddings,it%20is%20faster%20than%20word2vec.
    * supports character n-gram
    * also uses CBOX and skipgram
    * each word is represented as the average of the vector representation of its character n-grams along with the word itself
    * can use hierarchical classifier which is computationally faster

### Quick notes on hierarchical softmax
* Uses a binary tree structure to represent the output classes.
    * Tree structure, computes probabilities along paths
* Each leaf node represents an output class, and internal nodes represent intermediate categories.
* Instead of computing probabilities for all classes, it computes probabilities along a path in the tree.
* Reduces time complexity from O(N) to O(log N), where N is the number of classes.


In [1]:
import fasttext

In [3]:
model = fasttext.train_supervised(input="../data/fasttext_cooking_data/cooking.train")

Read 0M words
Number of words:  14543
Number of labels: 735
Progress: 100.0% words/sec/thread:   59482 lr:  0.000000 avg.loss: 10.207205 ETA:   0h 0m 0s


In [4]:
model.save_model("model_cooking.bin")

In [5]:
model.predict("Which baking dish is best to bake a banana bread ?")

(('__label__baking',), array([0.06549904]))

In [7]:
model.test("../data/fasttext_cooking_data/cooking.valid")

(3000, 0.12233333333333334, 0.052904713853250684)

In [8]:
model.predict("Why not put knives in the dishwasher?", k=5)

(('__label__baking',
  '__label__food-safety',
  '__label__bread',
  '__label__substitutions',
  '__label__equipment'),
 array([0.0750299 , 0.06262442, 0.03546463, 0.03398636, 0.0327246 ]))

In [10]:
model = fasttext.train_supervised(input="../data/fasttext_cooking_data/cooking.train", epoch=25)

Read 0M words
Number of words:  14543
Number of labels: 735
Progress: 100.0% words/sec/thread:   59702 lr:  0.000000 avg.loss:  8.028993 ETA:   0h 0m 0s


In [11]:
model.test("../data/fasttext_cooking_data/cooking.valid")

(3000, 0.43766666666666665, 0.18927490269568978)

In [12]:
model = fasttext.train_supervised(input="../data/fasttext_cooking_data/cooking.train", lr=1.0, epoch=25, wordNgrams=2, bucket=200000, dim=50, loss='hs')

Read 0M words
Number of words:  14543
Number of labels: 735
Progress: 100.0% words/sec/thread: 2362058 lr:  0.000000 avg.loss:  2.614207 ETA:   0h 0m 0s
