[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/unboxai/examples-gallery/blob/main/text-classification/demo-banking.ipynb)


# Banking chatbot using sklearn

This notebook illustrates how sklearn models can be upladed to the Unbox platform.

## Importing the modules and loading the dataset

In [2]:
import numpy as np
import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline

In [3]:
data = pd.read_csv("training.csv")

data.head()

Unnamed: 0,text,category
0,I am still waiting on my card?,card_arrival
1,What can I do if my card still hasn't arrived ...,card_arrival
2,I have been waiting over a week. Is the card s...,card_arrival
3,Can I track my card while it is in the process...,card_arrival
4,"How do I know if I will get my card, or if it ...",card_arrival


In [7]:
data['category'] = data['category'].astype('category')
data['label_code'] = data['category'].cat.codes

## Splitting the data into training and validation sets

In [8]:
# shuffling the data
data = data.sample(frac=1, random_state=42)  

training_set = data[:7000]
validation_set = data[7000:]

## Training and evaluating the model's performance

In [9]:
sklearn_model = Pipeline([('count_vect', CountVectorizer(ngram_range=(1,2), stop_words='english')), 
                          ('lr', LogisticRegression(random_state=42))])
sklearn_model.fit(training_set['text'], training_set['label_code'])

Pipeline(steps=[('count_vect',
                 CountVectorizer(ngram_range=(1, 2), stop_words='english')),
                ('lr', LogisticRegression(random_state=42))])

In [10]:
print(classification_report(validation_set['label_code'], sklearn_model.predict(validation_set['text'])))

              precision    recall  f1-score   support

           0       1.00      0.86      0.92        14
           1       1.00      1.00      1.00        11
           2       1.00      0.83      0.91        12
           3       1.00      0.94      0.97        17
           4       0.92      0.96      0.94        24
           5       0.89      0.84      0.86        19
           6       0.96      0.96      0.96        27
           7       0.50      0.86      0.63         7
           8       0.81      0.91      0.86        23
           9       0.93      0.74      0.82        19
          10       1.00      0.83      0.91        18
          11       0.93      0.74      0.82        19
          12       0.96      0.92      0.94        24
          13       0.83      0.89      0.86        28
          14       1.00      0.93      0.96        28
          15       0.90      1.00      0.95         9
          16       1.00      0.91      0.95        23
          17       1.00    

## Unbox part!

### Instantiating the client

In [11]:
import unboxapi

client = unboxapi.UnboxClient("YOUR_API_KEY_HERE")

### Creating a project on the platform

In [12]:
project = client.create_project(name="Banking Project",
                                description="Evaluating ML approaches for a chatbot")

Creating project on Unbox! Check out https://unbox.ai/projects to have a look!


### Uploading the validation set

In [15]:
# Getting the label list
label_dict = dict(zip(data.category.cat.codes, data.category))

label_list = [None] * len(label_dict)
for index, label in label_dict.items():
    label_list[index] = label

In [18]:
from unboxapi.tasks import TaskType

dataset = project.add_dataframe(
    df=validation_set,
    class_names=label_list,
    label_column_name="label_code",
    text_column_name="text",
    task_type=TaskType.TextClassification,
    name="Banking Test Dataset",
    description="my banking validation dataset"
)

Uploading dataset to Unbox! Check out https://unbox.ai/datasets to have a look!


### Uploading the model

First, it is important to create a `predict_proba` function, which is how Unbox interacts with your model

In [19]:
def predict_proba(model, text_list):
    return model.predict_proba(text_list)

Let's test the `predict_proba` function to make sure the input-output format is consistent with what Unbox expects:

In [22]:
texts = ['some new text, sweet noodles', 'where is my card?', 'sad day']

predict_proba(sklearn_model, texts)

array([[0.00747527, 0.01526512, 0.01108279, 0.02171184, 0.00802631,
        0.0135237 , 0.00744551, 0.00552845, 0.02719528, 0.01900453,
        0.00765338, 0.00591675, 0.0055067 , 0.00604107, 0.00461798,
        0.00398385, 0.01183233, 0.00595058, 0.02658783, 0.00770957,
        0.00917609, 0.0058173 , 0.0072034 , 0.00401974, 0.05226695,
        0.01140503, 0.0174299 , 0.01393726, 0.00819253, 0.01311658,
        0.00638173, 0.01532488, 0.00365531, 0.00920743, 0.00990506,
        0.03138079, 0.00565234, 0.005253  , 0.19453984, 0.00640795,
        0.00866278, 0.01508223, 0.02100794, 0.0052098 , 0.00802055,
        0.00495851, 0.01288669, 0.02031846, 0.04890798, 0.03881957,
        0.02322006, 0.01212345, 0.0079488 , 0.00473297, 0.01675585,
        0.0072597 , 0.01710851, 0.01829279, 0.01528501, 0.0112113 ,
        0.01768058, 0.01217454],
       [0.00356983, 0.00400935, 0.01719766, 0.00998708, 0.00249705,
        0.0042326 , 0.00184991, 0.05857406, 0.08743378, 0.06827657,
        0.04324

Now, we can upload the model:

In [25]:
from unboxapi.models import ModelType

model = project.add_model(
    function=predict_proba, 
    model=sklearn_model,
    model_type=ModelType.sklearn,
    task_type=TaskType.TextClassification,
    class_names=label_list,
    name='Banking Model',
    description='this is my sklearn banking model'
)

Bundling model and artifacts...
Uploading model to Unbox! Check out https://unbox.ai/models to have a look!


  retry_strategy = Retry(
