# Intent classification the Banking77 dataset

**Candidate approach**
    * Set baseline performance with the simplest interpretable model
    * Set ideal performance with the State of the art model  
    * Find a tradeoff interpretable model with good performance and fast training  


## Setup

### Dependencies

In [7]:
import os
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, plot_confusion_matrix, classification_report

### Paths

In [8]:
proj_path = "/Users/steeve_laquitaine/desktop/CodeHub/intent/intent/"
train_data_path = proj_path + "data/01_raw/banking77/train.csv"
test_data_path = proj_path + "data/01_raw/banking77/test.csv"

## Load data

Most public corpora found had little data, except a task-oriented banking dataset Banking77. So we use it as a benchmark.

### Load

In [9]:
train_data  = pd.read_csv(train_data_path)
test_data  = pd.read_csv(test_data_path)

In [10]:
# preview
train_data.head(5)

Unnamed: 0,text,category
0,I am still waiting on my card?,card_arrival
1,What can I do if my card still hasn't arrived ...,card_arrival
2,I have been waiting over a week. Is the card s...,card_arrival
3,Can I track my card while it is in the process...,card_arrival
4,"How do I know if I will get my card, or if it ...",card_arrival


### Preview

In [11]:
# preview
test_data.head(5)

Unnamed: 0,text,category
0,How do I locate my card?,card_arrival
1,"I still have not received my new card, I order...",card_arrival
2,I ordered a card but it has not arrived. Help ...,card_arrival
3,Is there a way to know when my card will arrive?,card_arrival
4,My card has not arrived yet.,card_arrival


### normalize columns

In [12]:
def standardize_col_names(data:pd.DataFrame):
    return data.rename(columns={"text":"text","category":"intent"})

In [13]:
train_data = standardize_col_names(train_data)
test_data = standardize_col_names(test_data)

### summary description 

In [14]:
print("\nValue count:\n")
print(train_data.count())
print("\nUnique values:\n")
print(train_data.nunique())


Value count:

text      10003
intent    10003
dtype: int64

Unique values:

text      10003
intent       77
dtype: int64


## Explore the data

* cleanliness?
* intent imbalance?
* Task complexity
    * average query length?

## Baseline modeling performance

### BOW + multinomial log. reg

* feature engineering: bag of words  
    * unigrams
* model: logistic regression  

In [15]:
def get_txt(txt):
    txt = txt['text']
    return txt

def get_intent(txt):
    intent = txt['intent']
    return intent

def do_bag_of_words(train_txt:pd.DataFrame, test_txt:pd.DataFrame, params:dict):
    """
    Encode commands as bags-of-words
    """

    # build vectorizer
    cv = CountVectorizer(
        binary = params['binary'], 
        min_df = params['min_df'], 
        max_df = params['max_df'], 
        ngram_range = params['ngram_range']
    )  

    # encode as BOW
    train_features = cv.fit_transform(train_txt)
    test_features = cv.transform(test_txt)
    
    return train_features, test_features, cv

In [16]:
# set BOW params
params = dict({
    'binary': False,
    'min_df': 0.0,    
    'max_df': 1.0,
    'ngram_range':(1,1)
})

In [17]:
# get text and intents
train_txt = get_txt(train_data)
test_txt = get_txt(test_data)
train_intent = get_intent(train_data)
test_intent = get_intent(test_data)

In [18]:
# encode as BOW
train_feature, test_feature, cv = do_bag_of_words(train_txt, test_txt, params);

In [19]:
# checks
print("(# Queries, # Features):\n")
print("train:"+str(train_feature.shape))
print("train BOW:", train_feature.todense())
print("\ntest:"+str(test_feature.shape))
print("test BOW:", train_feature.todense())
print("\nPredictive features:\n\n"+str(cv.get_feature_names()[:100]))
print("\n Predicted intents:\n\n", train_intent.unique()[:30])

(# Queries, # Features):

train:(10003, 2320)
train BOW: [[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]

test:(3080, 2320)
test BOW: [[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]

Predictive features:

['00', '000', '10', '100', '13', '16', '18', '1818', '1l', '20', '200', '2018', '30', '3d', '40', '45', '50', '500', '5x', '60', '80', 'able', 'about', 'above', 'abroad', 'absolutely', 'accept', 'acceptable', 'accepted', 'accepting', 'accepts', 'access', 'accessed', 'accessible', 'accessing', 'accident', 'accidentally', 'accidently', 'accommodated', 'according', 'accordingly', 'account', 'accounts', 'accurate', 'achieve', 'acount', 'acquiring', 'across', 'acting', 'action', 'actions', 'activate', 'activated', 'activating', 'activation', 'active', 'activity', 'actual', 'actually', 'actuate', 'add', 'added', 'adding', 'addition', 'additional', '

In [20]:
# fit logistic regression  
clf = LogisticRegression(random_state=0, multi_class='multinomial').fit(train_feature, train_intent)
print("Training completed")

Training completed


In [21]:
# check predictions
predicted_test_intent = clf.predict(test_feature)
pd.DataFrame(predicted_test_intent).sample(n=10)

Unnamed: 0,0
644,pending_top_up
2655,top_up_failed
2766,exchange_charge
301,country_support
181,extra_charge_on_statement
1348,topping_up_by_card
2719,balance_not_updated_after_bank_transfer
1261,get_physical_card
736,top_up_limits
513,age_limit


In [22]:
# Evaluate the model
print("\nConfusion matrix:\n\n", confusion_matrix(test_intent, predicted_test_intent))
print("\nEvaluation metrics:\n\n", classification_report(test_intent, predicted_test_intent))


Confusion matrix:

 [[35  0  0 ...  0  0  0]
 [ 0 38  0 ...  0  0  0]
 [ 0  0 40 ...  0  0  0]
 ...
 [ 0  0  0 ... 32  0  0]
 [ 0  0  0 ...  0 35  0]
 [ 0  0  0 ...  0  0 31]]

Evaluation metrics:

                                                   precision    recall  f1-score   support

                           Refund_not_showing_up       0.97      0.88      0.92        40
                                activate_my_card       0.97      0.95      0.96        40
                                       age_limit       0.98      1.00      0.99        40
                         apple_pay_or_google_pay       0.98      1.00      0.99        40
                                     atm_support       0.93      0.93      0.93        40
                                automatic_top_up       1.00      0.88      0.93        40
         balance_not_updated_after_bank_transfer       0.72      0.78      0.75        40
balance_not_updated_after_cheque_or_cash_deposit       0.86      0.95      0.90

## model verification

* check that performance on randomized dataset are at chance
* check that most predictive features "make intuitive sense"

## Get SOTA performance  (USE + ConverRT [1]) 

* USE + ConveRT
    * more robust to sample size than other models: fit for few shot learning for small labelled dataset
        * full dataset: 93.36%  
        * 30 samples per intent: 90.57%      
        * 10 samples per intent: 85.19%   

## References
[1] Casanueva, I., Temčinas, T., Gerz, D., Henderson, M., & Vulić, I. (2020). Efficient Intent Detection with Dual Sentence Encoders. arXiv preprint arXiv:2003.04807. 