[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/unboxai/examples-gallery/blob/main/text-classification/fasttext/fasttext.ipynb)


# Text classification using fastText

This notebook illustrates how fastText models can be upladed to the Unbox platform.

## Importing the modules and loading the dataset

In [3]:
import fasttext
import numpy as np

In [4]:
%%bash

if [ ! -d ./data ]; then
    mkdir ./data
fi

curl https://dl.fbaipublicfiles.com/fasttext/data/cooking.stackexchange.tar.gz --output data/cooking.stackexchange.tar.gz && tar xvzf data/cooking.stackexchange.tar.gz -C data
head -n 12404 data/cooking.stackexchange.txt > data/cooking.train
tail -n 3000 data/cooking.stackexchange.txt > data/cooking.valid

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  446k  100  446k    0     0   194k      0  0:00:02  0:00:02 --:--:--  194k
x cooking.stackexchange.id
x cooking.stackexchange.txt
x readme.txt


## Training and evaluating the model's performance

In [5]:
fasttext_model = fasttext.train_supervised(input="./data/cooking.train", lr=0.8, epoch=70, loss='hs')

Read 0M words
Number of words:  14543
Number of labels: 735
Progress: 100.0% words/sec/thread: 1863090 lr:  0.000000 avg.loss:  5.498450 ETA:   0h 0m 0s


In [6]:
fasttext_model.test("./data/cooking.valid")

(3000, 0.49633333333333335, 0.2146461006198645)

## Unbox part!

### Instantiating the client

In [7]:
import unboxapi

client = unboxapi.UnboxClient("YOUR_API_KEY_HERE")

### Creating a project on the platform

In [14]:
from unboxapi.tasks import TaskType

project = client.create_project(name="Recipe classification",
                                task_type=TaskType.TextClassification,
                                description="Fasttext Demo Project")

Created your project. Check out https://unbox.ai/projects!


### Uploading the model

First, it is important to create a `predict_proba` function, which is how Unbox interacts with your model

In [15]:
class_names = fasttext_model.labels
class_names = [s.replace("__label__", "") for s in class_names]

k = len(class_names)
idx_to_labels = {i:k for k, i in zip(class_names, range(k))}
labels_to_idx = {k:i for k, i in zip(class_names, range(k))}

In [16]:
def predict_proba(model, text_list):
    
    predictions = model.predict(text_list, k=k)
    x, y = predictions
    
    probabilities_full_list = []
    for label_list, prob_list in zip(x, y):
        label_prob_pair_dict = {}
        for lbl, prob in zip(label_list, prob_list):
            label_prob_pair_dict[lbl.replace("__label__", "")] = prob
        probabilities_list = []
        for cls in class_names:
            if cls in label_prob_pair_dict:
                p = label_prob_pair_dict[cls]
                probabilities_list.append(p)
            else:
                probabilities_list.append(0.0)
        probabilities_full_list.append(probabilities_list)
        
    return np.array(probabilities_full_list)

Let's test the `predict_proba` function to make sure the input-output format is consistent with what Unbox expects:

In [17]:
predict_proba(fasttext_model, ["cake"]*1000*10)

array([[1.6712982e-05, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [1.6712982e-05, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [1.6712982e-05, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       ...,
       [1.6712982e-05, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [1.6712982e-05, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [1.6712982e-05, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00]])

Now, we can upload the model:

In [18]:
from unboxapi.models import ModelType

model = project.add_model(
    function=predict_proba, 
    model=fasttext_model,
    model_type=ModelType.fasttext,
    class_names=class_names,
    name='Cooking Fast Text',
    commit_message='this is my fasttext model',
    requirements_txt_file='./requirements.txt'
)

Bundling model and artifacts...
Uploading model to Unbox! Check out https://unbox.ai/models to have a look!
