# Bitext - Customer Service Tagged Training Dataset
## Overview
This dataset can be used to train intent recognition models on Natural Language Understanding (NLU) platforms: LUIS, Dialogflow, Lex, RASA and any other NLU platform that accepts text as input.

The training dataset contains 8,100 utterances (300 per intent), because most platforms limit the number of utterances that can be used for training

## Cleaning up the dataset

From the dataset, we are removing the categories which is not having less than 3 intents, which removes the following categories:
- CANCELLATION_FEE
- FEEDBACK
- NEWSLETTER

## Code

### Installing Required Libraries

In [None]:
! pip install pandas numpy

### Importing Required Libraries

In [15]:
import pandas as pd
import numpy as np
import os

### Importing the Datasets

In [16]:
#navigate to parent directory
parent_dir = os.path.dirname(os.getcwd())

#setting the filenames
training_file = os.path.join(parent_dir, 'data/train/Bitext_Sample_Customer_Service_Training_Dataset.csv')
testing_file  = os.path.join(parent_dir, 'data/test/Bitext_Sample_Customer_Service_Testing_Dataset.csv')

#opening the files
training_df = pd.read_csv(training_file)
testing_df  = pd.read_csv(testing_file)

#### TRAINING DATASET

In [17]:
training_df.head()

Unnamed: 0,utterance,intent,entity_type,entity_value,start_offset,end_offset,category,tags
0,how can I cancel purchase 113542617735902?,cancel_order,order_id,113542617735902,26.0,41.0,ORDER,BIL
1,can you help me canceling purchase 00004587345?,cancel_order,order_id,4587345,35.0,46.0,ORDER,BIL
2,i want assistance to cancel purchase 732201349959,cancel_order,order_id,732201349959,37.0,49.0,ORDER,BLQ
3,i want assistance to cancel order 732201349959,cancel_order,order_id,732201349959,34.0,46.0,ORDER,BQ
4,"I don't want my last item, help me cancel orde...",cancel_order,order_id,370795561790,48.0,60.0,ORDER,BCLN


##### Removing Categories with intents less than 3

In [18]:
# retain only categories with intents more than 2
df = training_df[training_df["category"].isin(['ACCOUNT', 'CONTACT', 'ORDER', 'PAYMENT', 'REFUND', 'SHIPPING_ADDRESS'])]

In [19]:
df['category'].unique()

array(['ORDER', 'SHIPPING_ADDRESS', 'PAYMENT', 'REFUND', 'CONTACT',
       'ACCOUNT'], dtype=object)

In [20]:
df['intent'].unique()

array(['cancel_order', 'change_order', 'change_shipping_address',
       'check_payment_methods', 'check_refund_policy',
       'contact_customer_service', 'contact_human_agent',
       'create_account', 'delete_account', 'edit_account', 'get_refund',
       'payment_issue', 'place_order', 'recover_password',
       'registration_problems', 'set_up_shipping_address',
       'switch_account', 'track_order', 'track_refund'], dtype=object)

##### Export the new dataset

In [21]:
# Export the DataFrame to a CSV file
df.to_csv('./outputs/training_dataset.csv', index=False)

#### TESTING DATASET

In [22]:
testing_df.head()

Unnamed: 0,utterance,intent,entity_type,entity_value,start_offset,end_offset,category,tags
0,I do not know how I can cancel purchase 00123842,cancel_order,order_id,123842,40.0,48.0,ORDER,BEL
1,help to cancel purchase 00004587345,cancel_order,order_id,4587345,24.0,35.0,ORDER,BL
2,cancelling purchase 00123842,cancel_order,order_id,123842,20.0,28.0,ORDER,BKL
3,cancel purchase 00004587345,cancel_order,order_id,4587345,16.0,27.0,ORDER,BKL
4,I don't know how to cancel order 732201349959,cancel_order,order_id,732201349959,34.0,46.0,ORDER,BZ


##### Removing Categories with intents less than 3

In [23]:
# retain only categories with intents more than 2
df = testing_df[testing_df["category"].isin(['ACCOUNT', 'CONTACT', 'ORDER', 'PAYMENT', 'REFUND', 'SHIPPING_ADDRESS'])]

In [24]:
df['category'].unique()

array(['ORDER', 'SHIPPING_ADDRESS', 'PAYMENT', 'REFUND', 'CONTACT',
       'ACCOUNT'], dtype=object)

In [25]:
df['intent'].unique()

array(['cancel_order', 'change_order', 'change_shipping_address',
       'check_payment_methods', 'check_refund_policy',
       'contact_customer_service', 'contact_human_agent',
       'create_account', 'delete_account', 'edit_account', 'get_refund',
       'payment_issue', 'place_order', 'recover_password',
       'registration_problems', 'set_up_shipping_address',
       'switch_account', 'track_order', 'track_refund'], dtype=object)

In [26]:
df['intent'].unique()

array(['cancel_order', 'change_order', 'change_shipping_address',
       'check_payment_methods', 'check_refund_policy',
       'contact_customer_service', 'contact_human_agent',
       'create_account', 'delete_account', 'edit_account', 'get_refund',
       'payment_issue', 'place_order', 'recover_password',
       'registration_problems', 'set_up_shipping_address',
       'switch_account', 'track_order', 'track_refund'], dtype=object)

##### Export the dataset

In [27]:
# Export the DataFrame to a CSV file
df.to_csv('./outputs/testing_dataset.csv', index=False)