[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/unboxai/examples-gallery/blob/main/text-classification/sklearn/banking/demo-banking.ipynb)


# Banking chatbot using sklearn

This notebook illustrates how sklearn models can be upladed to the Openlayer platform.

In [1]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/unboxai/examples-gallery/main/text-classification/sklearn/banking/requirements.txt" --output "requirements.txt"
fi

In [2]:
!pip install -r requirements.txt



## Importing the modules and loading the dataset

In [3]:
import numpy as np
import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline

We have stored the dataset on the following S3 bucket. If, for some reason, you get an error reading the csv directly from it, feel free to copy and paste the URL in your browser and download the csv file. Alternatively, you can also find the dataset on [HuggingFace](https://huggingface.co/datasets/banking77).

In [4]:
DATASET_URL = "https://openlayer-static-assets.s3.us-west-2.amazonaws.com/examples-datasets/text-classification/banking.csv"

In [5]:
data = pd.read_csv(DATASET_URL)
data.head()

Unnamed: 0,text,category
0,I am still waiting on my card?,card_arrival
1,What can I do if my card still hasn't arrived ...,card_arrival
2,I have been waiting over a week. Is the card s...,card_arrival
3,Can I track my card while it is in the process...,card_arrival
4,"How do I know if I will get my card, or if it ...",card_arrival


In [6]:
data['category'] = data['category'].astype('category')
data['label_code'] = data['category'].cat.codes

## Splitting the data into training and validation sets

In [7]:
# shuffling the data
data = data.sample(frac=1, random_state=42)  

training_set = data[:7000]
validation_set = data[7000:]

## Training and evaluating the model's performance

In [8]:
sklearn_model = Pipeline([('count_vect', CountVectorizer(ngram_range=(1,2), stop_words='english')), 
                          ('lr', LogisticRegression(random_state=42))])
sklearn_model.fit(training_set['text'], training_set['label_code'])

Pipeline(steps=[('count_vect',
                 CountVectorizer(ngram_range=(1, 2), stop_words='english')),
                ('lr', LogisticRegression(random_state=42))])

In [9]:
print(classification_report(validation_set['label_code'], sklearn_model.predict(validation_set['text'])))

              precision    recall  f1-score   support

           0       0.91      0.91      0.91        23
           1       1.00      1.00      1.00        15
           2       0.80      0.80      0.80        10
           3       1.00      0.88      0.94        17
           4       0.93      0.90      0.91        29
           5       0.89      0.76      0.82        21
           6       0.87      1.00      0.93        13
           7       0.75      0.75      0.75        12
           8       0.57      0.86      0.69        14
           9       0.86      0.80      0.83        15
          10       0.94      1.00      0.97        17
          11       0.89      0.73      0.80        11
          12       0.74      0.85      0.79        20
          13       0.70      0.84      0.76        19
          14       0.95      0.86      0.90        21
          15       1.00      0.88      0.93         8
          16       1.00      0.94      0.97        16
          17       0.59    

## Openlayer part!

### pip installing openlayer

In [10]:
!pip install openlayer







### Instantiating the client

In [12]:
import openlayer

openlayer.api.OPENLAYER_ENDPOINT = "http://localhost:8080/v1"
openlayer.api.STORAGE = openlayer.api.StorageType.ONPREM

client = openlayer.OpenlayerClient("P0ZYAERZvzvbPvsXHTBJ2ORBqHxq9pUE")

### Creating a project on the platform

In [13]:
from openlayer.tasks import TaskType

project = client.create_or_load_project(name="Banking Project",
                                        task_type=TaskType.TextClassification,
                                        description="Evaluating ML approaches for a chatbot")

Created your project. Navigate to http://localhost:8000/projects/3 to see it.


### Uploading the validation set

In [14]:
# Getting the label list
label_dict = dict(zip(data.category.cat.codes, data.category))

label_list = [None] * len(label_dict)
for index, label in label_dict.items():
    label_list[index] = label

In [15]:
from openlayer.tasks import TaskType

dataset = project.add_dataframe(
    df=validation_set,
    class_names=label_list,
    label_column_name="label_code",
    text_column_name="text",
    commit_message="First commit!"
)

Adding your dataset to Openlayer! Check out the project page to have a look.


### Uploading the model

First, it is important to create a `predict_proba` function, which is how Openlayer interacts with your model

In [16]:
def predict_proba(model, text_list):
    return model.predict_proba(text_list)

Let's test the `predict_proba` function to make sure the input-output format is consistent with what Openlayer expects:

In [17]:
texts = ['some new text, sweet noodles', 'where is my card?', 'sad day']

predict_proba(sklearn_model, texts)

array([[0.0085163 , 0.01654408, 0.01178879, 0.02213127, 0.00844311,
        0.01339975, 0.0079393 , 0.00566056, 0.03075455, 0.02148207,
        0.01012182, 0.00549518, 0.00623524, 0.00669953, 0.00485448,
        0.00411068, 0.01336741, 0.00489345, 0.03455118, 0.00903034,
        0.00979557, 0.00668368, 0.00690582, 0.00391537, 0.03479671,
        0.01138972, 0.01938556, 0.01421167, 0.00928368, 0.01478102,
        0.00781391, 0.01336767, 0.00386452, 0.00870904, 0.01031671,
        0.02792679, 0.00643292, 0.00554295, 0.14997774, 0.00636605,
        0.0082626 , 0.01493688, 0.02213682, 0.00570669, 0.00887831,
        0.00542705, 0.01374474, 0.02349695, 0.05837501, 0.04249346,
        0.02770299, 0.0132905 , 0.00882109, 0.00534068, 0.01733151,
        0.00759683, 0.01734308, 0.01937366, 0.0180614 , 0.01215107,
        0.01967441, 0.01236811],
       [0.00329319, 0.00398165, 0.01697288, 0.00698693, 0.00244146,
        0.00406684, 0.00183957, 0.0483668 , 0.0965095 , 0.07651167,
        0.04590

Now, we can upload the model:

In [18]:
from openlayer.models import ModelType

model = project.add_model(
    function=predict_proba, 
    model=sklearn_model,
    model_type=ModelType.sklearn,
    class_names=label_list,
    name='Banking Model',
    commit_message='First commit!',
    requirements_txt_file='requirements.txt'
)

Bundling model and artifacts...
Adding your model to Openlayer! Check out the project page to have a look.
