<a href="https://colab.research.google.com/github/shameer-phy/GenAI/blob/main/Customer_Support-Ticket-tagging/customer_support_ticket_tagger.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers datasets torch scikit-learn



## Importing necessary libraries

In [2]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from datasets import Dataset

In [3]:
df = pd.read_csv("https://raw.githubusercontent.com/shameer-phy/GenAI/refs/heads/main/Customer_Support-Ticket-tagging/customer_tickets.csv")
df.columns = ["text","labels"]
df.head()

Unnamed: 0,text,labels
0,"Dear Customer Support Team, We are experiencin...",Technical Support
1,"Dear Customer Support,<br><br>I hope this mess...",Product Support
2,"Dear Tech Online Store Customer Support,\n\nI ...",Returns and Exchanges
3,"Dear IT Services Customer Support, \n\nWe are ...",Product Support
4,"Greetings IT Services Customer Support,\n\nI a...",Technical Support


In [4]:
df.dropna(inplace=True)

In [5]:
df.head()

Unnamed: 0,text,labels
0,"Dear Customer Support Team, We are experiencin...",Technical Support
1,"Dear Customer Support,<br><br>I hope this mess...",Product Support
2,"Dear Tech Online Store Customer Support,\n\nI ...",Returns and Exchanges
3,"Dear IT Services Customer Support, \n\nWe are ...",Product Support
4,"Greetings IT Services Customer Support,\n\nI a...",Technical Support


In [6]:
label_encoder = LabelEncoder()
df['labels'] = label_encoder.fit_transform(df['labels']) # converts labels which are in character to numeric format

In [7]:
df.head()
label_encoder.inverse_transform([0, 1])

array(['Billing and Payments', 'Customer Service'], dtype=object)

In [8]:
# Convert to Hugging Face Dataset
dataset = Dataset.from_pandas(df[['text', 'labels']])
hf_dataset = dataset.train_test_split(test_size=0.145)
print(hf_dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'labels', '__index_level_0__'],
        num_rows: 288
    })
    test: Dataset({
        features: ['text', 'labels', '__index_level_0__'],
        num_rows: 50
    })
})


In [9]:
label_encoder.classes_

array(['Billing and Payments', 'Customer Service', 'General Inquiry',
       'Human Resources', 'IT Support', 'Product Support',
       'Returns and Exchanges', 'Sales and Pre-Sales',
       'Service Outages and Maintenance', 'Technical Support'],
      dtype=object)

In [10]:
label_encoder.inverse_transform([int(1)])

array(['Customer Service'], dtype=object)

In [11]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "microsoft/deberta-v3-base"  # loading the deberta model
tokenizer = AutoTokenizer.from_pretrained(model_name, device_map="cuda")
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=len(label_encoder.classes_), device_map="cuda")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
def preprocess_function(examples):
    return tokenizer(examples['text'], truncation=True, padding=True)

tokenized_datasets = hf_dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/288 [00:00<?, ? examples/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Map:   0%|          | 0/50 [00:00<?, ? examples/s]

In [13]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="steps",
    learning_rate=0.00002,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=10,
    weight_decay=0.05,
    logging_dir='./logs',
    logging_steps=30,
    report_to='none',
    use_cpu=False
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['test'],
    tokenizer=tokenizer
)

  trainer = Trainer(


In [14]:
trainer.train()

Step,Training Loss,Validation Loss
30,2.0771,1.924691
60,1.8123,1.824528
90,1.7348,1.634632
120,1.6266,1.542101
150,1.4437,1.370551
180,1.3359,1.44752
210,1.2749,1.310222
240,1.1249,1.2635
270,1.0068,1.237423
300,0.9688,1.239888


TrainOutput(global_step=360, training_loss=1.3465450922648112, metrics={'train_runtime': 309.0366, 'train_samples_per_second': 9.319, 'train_steps_per_second': 1.165, 'total_flos': 506205326983680.0, 'train_loss': 1.3465450922648112, 'epoch': 10.0})

In [15]:
trainer.evaluate()

{'eval_loss': 1.1868816614151,
 'eval_runtime': 1.6478,
 'eval_samples_per_second': 30.343,
 'eval_steps_per_second': 4.248,
 'epoch': 10.0}

In [16]:
trainer.save_model("./text-classification-model")
tokenizer.save_pretrained("./text-classification-model")

('./text-classification-model/tokenizer_config.json',
 './text-classification-model/special_tokens_map.json',
 './text-classification-model/spm.model',
 './text-classification-model/added_tokens.json',
 './text-classification-model/tokenizer.json')

In [17]:
from transformers import pipeline
# loading the locally saved model
classifier = pipeline("text-classification", model="./text-classification-model", tokenizer=tokenizer, device_map="cuda")

Device set to use cuda


In [18]:
# Predictor Function to evaluate some tickets
def predictor(input_ticket,org_label):
    print(f"Input Ticket: {input_ticket}")
    result = classifier(input_ticket)
    print("\n")
    org_label_decoded = label_encoder.inverse_transform([int(org_label)])[0]
    decoded_label = label_encoder.inverse_transform([int(result[0]['label'].split("_")[-1])])[0]
    print("Original Label: ",org_label,",Original Label Decoded: ",org_label_decoded)
    print(f"Predicted Label: {int(result[0]['label'].split('_')[-1])} ,Predicted label Decoded: {decoded_label}")

In [19]:
predictor(df['text'][319],df['labels'][319])

Input Ticket: I am unable to connect to the Wi-Fi.


Original Label:  1 ,Original Label Decoded:  Customer Service
Predicted Label: 1 ,Predicted label Decoded: Customer Service


In [20]:
predictor(df['text'][338],df['labels'][338])

Input Ticket: Dear Customer Support Team,

I am contacting you to seek prompt professional help regarding our IT Consulting Service. We are facing an urgent requirement for server setup and network enhancement. Our systems are presently experiencing difficulties that may negatively affect our business activities. It is imperative that we address these issues swiftly to avoid any interruptions.

Could you kindly prioritize our request and allocate an expert to help us with these concerns? We need someone with specialized expertise in server setups and optimization methods. Please inform us at your earliest convenience about the availability of your support personnel.

We are ready for a consultation call whenever it suits you to provide any additional information needed. You can reach me at <tel_num>.

Thank you for your prompt attention to this issue. We anticipate your swift reply.

Best regards,

<name>


Original Label:  9 ,Original Label Decoded:  Technical Support
Predicted Label:

In [21]:
predictor(df['text'][317],df['labels'][317])

Input Ticket: Dear Customer Support,

I hope this message finds you well. I am writing to bring to your attention an issue concerning the recent billing related to our AWS cloud usage. Upon reviewing our most recent statement, it appears that there are discrepancies that have significantly impacted our cost estimates. It seems the charges associated with the AWS Management Service do not align with the actual usage recorded on our account <acc_num>.

The incorrect billing has resulted in unexpected costs that differ noticeably from our budget forecasts, making it difficult for us to manage our financial resources efficiently. This discrepancy was first noticed by <name> from our finance department, prompting an urgent need for your review of the billing details.

Could you please conduct a thorough review of our account to determine the cause of this miscalculation? We believe that an error in recording or processing has occurred that needs rectification. We would appreciate it if you 

In [22]:
predictor(df['text'][210],df['labels'][210])

Input Ticket: Dear Customer Support,

I hope this message finds you well. I am writing to report a high-priority incident involving unstable connectivity issues with our Cisco Router ISR4331, which is currently impacting the performance of our enterprise network. Our entire network operations depend heavily on this router, and any disruptions can lead to significant operational setbacks.

The connectivity issues started occurring approximately 48 hours ago and have progressively worsened. Our IT team has conducted preliminary troubleshooting, which includes checking the physical connections, updating the firmware, and resetting the device multiple times; however, these actions have not resolved the issue. The router still exhibits sporadic connectivity drop-offs, causing disruptions in our daily workflows and negatively affecting the user experience within our enterprise.

We are requesting immediate technical assistance from your team to diagnose and resolve this matter as quickly as 

In [23]:
predictor(df['text'][50],df['labels'][50])

Input Ticket: Hi, I've noticed performance issues with my Dell XPS 13 9310 after the latest update. Please assist.


Original Label:  9 ,Original Label Decoded:  Technical Support
Predicted Label: 9 ,Predicted label Decoded: Technical Support


In [24]:
predictor(df['text'][175],df['labels'][175])

Input Ticket: Dear Customer Support Team,

I am writing to express my concerns regarding the Epson EcoTank ET-4760 printer that I purchased from your Tech Online Store. Despite being quite enthusiastic about its features, I have encountered frequent paper jams during printing operations, which significantly disrupt my workflow. This issue is hindering my productivity, and I would greatly appreciate your guidance on how to resolve it.

Could you please provide troubleshooting advice or recommend any steps I should take to remedy this situation? Additionally, if this is a known issue, kindly let me know if there is any update or technical support available to address it.

I am relying on your expertise to help find a suitable solution at your earliest convenience. Please feel free to contact me with any further instructions or if additional information is required for diagnostics on my end.

Thank you for your attention and assistance.

Best regards,

<name>


Original Label:  5 ,Origina

In [25]:
predictor(df['text'][15],df['labels'][15])

Input Ticket: Hello Customer Support,

I am experiencing a problem with my HP DeskJet 3755 printer. It fails to connect to the wireless network despite adhering to the setup guidelines. Could you provide troubleshooting support to resolve this issue?

Thank you, 
<name>


Original Label:  1 ,Original Label Decoded:  Customer Service
Predicted Label: 1 ,Predicted label Decoded: Customer Service
