## **Problem Statement**

### Business Context

In today's dynamic business landscape, organizations are increasingly recognizing the pivotal role customer feedback plays in shaping the trajectory of their products and services. The ability to swiftly and effectively respond to customer input not only fosters enhanced customer experiences but also serves as a catalyst for growth, prolonged customer engagement, and the nurturing of lifetime value relationships. As a dedicated Product Manager or Product Analyst, staying attuned to the voice of your customers is not just a best practice; it's a strategic imperative.

While your organization may be inundated with a wealth of customer-generated feedback and support tickets, your role entails much more than just processing these inputs. To make your efforts in managing customer experience and expectations truly impactful, you need a structured approach – a method that allows you to discern the most pressing issues, set priorities, and allocate resources judiciously. One of the most effective strategies at your disposal is to harness the power of Support Ticket Categorization.


### Objective

Develop an advanced support ticket categorization system that accurately classifies incoming tickets, assigns relevant tags based on their content, implements mechanisms and generate the first response based on the sentiment for prioritizing tickets for prompt resolution.


## **Installing and Importing Necessary Libraries and Dependencies**

In [1]:
# Installation for GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.8.3 requires cubinlinker, which is not installed.
cudf 24.8.3 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.8.3 requires ptxcompiler, which is not installed.
cuml 24.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 24.8.3 requires cupy-cuda11x>=12.0.0, which is not installed.
ucxx 0.39.1 requires libucx>=1.15.0, which is not installed.
apache-beam 2.46.0 requires cloudpickle~=2.2.1, but you have cloudpickle 3.0.0 which is incompatible.
apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.8 which is incompatible.
apache-beam 2.46.0 requires numpy<1.25.0,>=1.14.3, but you have numpy 2.1.2 which is incompatible.
apache-beam 2.46.0 requires pyarrow<10.0.0,>=3.0.0, but you have pyarrow 16.1.0 which is incompatible.
bigframes 0.22.0 requires google-clo

In [2]:
# For downloading the models from HF Hub
!pip install huggingface_hub==0.20.3 pandas==1.5.3 -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.8.3 requires cubinlinker, which is not installed.
cudf 24.8.3 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.8.3 requires ptxcompiler, which is not installed.
cuml 24.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 24.8.3 requires cupy-cuda11x>=12.0.0, which is not installed.
accelerate 0.34.2 requires huggingface-hub>=0.21.0, but you have huggingface-hub 0.20.3 which is incompatible.
beatrix-jupyterlab 2024.66.154055 requires jupyterlab~=3.6.0, but you have jupyterlab 4.2.5 which is incompatible.
bigframes 0.22.0 requires google-cloud-bigquery[bqstorage,pandas]>=3.10.0, but you have google-cloud-bigquery 2.34.4 which is incompatible.
bigframes 0.22.0 requires google-cloud-storage>=2.0.0, but you have google-cloud-storage 1.44.0 which is incompatible.
catboost 1.2.

In [3]:
# Function to download the model from the Hugging Face model hub
from huggingface_hub import hf_hub_download

# Importing the Llama class from the llama_cpp module
from llama_cpp import Llama

# Importing the json module
import json

# for loading and manipulating data
import pandas as pd

# for time computations
import time

ggml_init_cublas: found 2 CUDA devices:
  Device 0: Tesla T4, compute capability 7.5
  Device 1: Tesla T4, compute capability 7.5


## **Loading the Data**

In [4]:
# reading the CSV file.
data = pd.read_csv("/kaggle/input/customer-support-nlp-dataset/support_ticket_data.csv")

## **Data Overview**

### Checking the first 5 rows of the data

In [5]:
data.head()

Unnamed: 0,support_tick_id,support_ticket_text
0,ST2023-006,My internet connection has significantly slowe...
1,ST2023-007,Urgent help required! My laptop refuses to sta...
2,ST2023-008,I've accidentally deleted essential work docum...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...
4,ST2023-010,"My smartphone battery is draining rapidly, eve..."


### Checking the shape of the data

In [6]:
data.shape

(21, 2)

### Checking the missing values in the data

In [7]:
data.isnull().sum()

support_tick_id        0
support_ticket_text    0
dtype: int64

## **Model Building**

### Loading the model

In [8]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [9]:
model_path = hf_hub_download(
    repo_id=model_name_or_path, 
    filename=model_basename  
)

mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [10]:
llm = Llama(
    model_path=model_path,
    n_ctx=1024, # Context window
)

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /root/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q6_K.gguf (version unknown)
llama_model_loader: - tensor    0:                token_embd.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q6_K     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q6_K     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q6_K     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q6_K     [  4096, 143

### Utility functions

In [11]:
# defining a function to parse the JSON output from the model
def extract_json_data(json_str):
    try:
        # Find the indices of the opening and closing curly braces
        json_start = json_str.find('{')
        json_end = json_str.rfind('}')

        if json_start != -1 and json_end != -1:
            extracted_category = json_str[json_start:json_end + 1]  # Extract the JSON object
            data_dict = json.loads(extracted_category)
            return data_dict
        else:
            print(f"Warning: JSON object not found in response: {json_str}")
            return {}
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")
        return {}

## **Task 1: Ticket Categorization and Returning Structured Output**

In [12]:
data_1 = data.copy()

In [13]:
#Defining the response funciton for Task 1.
def response_1(prompt,ticket):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      A:
      """,
      max_tokens=1024, #seting the maximum number of tokens the model should generate for this task.
      stop=["Q:", "\n"],
      temperature=0.01, #setting the value for temperature.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [14]:
prompt_1 = """
You're purpose is to classify the give text and provide a general label based on the query from the customer

You will be given a customer feedback of our product try to understand what difficulty the customer has faced and summarize it to one just word

The response should be in the form of JSON format for example
{"Category":"classified_label"}

Ensure that the curly braces are closed properly and there are no additional characters in the output
"""

In [15]:
start = time.time()
data_1['model_response'] = data_1['support_ticket_text'].apply(lambda x: response_1(prompt_1, x))
end = time.time()


llama_print_timings:        load time =  2060.21 ms
llama_print_timings:      sample time =     6.71 ms /     7 runs   (    0.96 ms per token,  1042.60 tokens per second)
llama_print_timings: prompt eval time =  2060.10 ms /   164 tokens (   12.56 ms per token,    79.61 tokens per second)
llama_print_timings:        eval time =  2579.97 ms /     6 runs   (  429.99 ms per token,     2.33 tokens per second)
llama_print_timings:       total time =  4671.66 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =  2060.21 ms
llama_print_timings:      sample time =     8.60 ms /     9 runs   (    0.96 ms per token,  1046.88 tokens per second)
llama_print_timings: prompt eval time =  1604.75 ms /    51 tokens (   31.47 ms per token,    31.78 tokens per second)
llama_print_timings:        eval time =  3424.49 ms /     8 runs   (  428.06 ms per token,     2.34 tokens per second)
llama_print_timings:       total time =  5069.76 ms
Llama.generate: prefix-match hit

llama_pri

In [16]:
print("Time taken ",(end-start))

Time taken  149.3834719657898


In [17]:
data_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response
0,ST2023-006,My internet connection has significantly slowe...,"{""Category"":""Internet""}"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Category"":""Hardware_issue""}"
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Category"":""Data Loss""}"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Category"":""WiFi_Signal""}"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Category"":""Battery""}"


In [18]:
i = 2
print(data_1.loc[i, 'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [19]:
print(data_1.loc[i, 'model_response'])

{"Category":"Data Loss"}


In [21]:
# applying the function to the model response
data_1['model_response_parsed'] = data_1['model_response'].apply(extract_json_data)
data_1['model_response_parsed'].head()

0          {'Category': 'Internet'}
1    {'Category': 'Hardware_issue'}
2         {'Category': 'Data Loss'}
3       {'Category': 'WiFi_Signal'}
4           {'Category': 'Battery'}
Name: model_response_parsed, dtype: object

In [22]:
data_1['model_response_parsed'].value_counts()

{'Category': 'Hardware'}                                     5
{'Category': 'Data Recovery'}                                3
{'Category': 'Connectivity'}                                 2
{'Category': 'Data Loss'}                                    1
{'Category': 'Internet'}                                     1
{'Category': 'Hardware_issue'}                               1
{'Category': 'Account_Access'}                               1
{'Category': 'Battery'}                                      1
{'Category': 'WiFi_Signal'}                                  1
{'Category': 'Technical_Support'}                            1
{'Category': 'Performance'}                                  1
{'Category': 'Technical_Issue', 'SubCategory': 'Display'}    1
{'Category': 'Internet_Issue'}                               1
{'Category': 'Software_Issue'}                               1
Name: model_response_parsed, dtype: int64

In [23]:
# Normalizing the model_response_parsed column
model_response_parsed_df_1 = pd.json_normalize(data_1['model_response_parsed'])
model_response_parsed_df_1.head()

Unnamed: 0,Category,SubCategory
0,Internet,
1,Hardware_issue,
2,Data Loss,
3,WiFi_Signal,
4,Battery,


In [24]:
# Concatinating two dataframes
data_with_parsed_model_output_1 = pd.concat([data_1, model_response_parsed_df_1], axis=1)
data_with_parsed_model_output_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,Category,SubCategory
0,ST2023-006,My internet connection has significantly slowe...,"{""Category"":""Internet""}",{'Category': 'Internet'},Internet,
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Category"":""Hardware_issue""}",{'Category': 'Hardware_issue'},Hardware_issue,
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Category"":""Data Loss""}",{'Category': 'Data Loss'},Data Loss,
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Category"":""WiFi_Signal""}",{'Category': 'WiFi_Signal'},WiFi_Signal,
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Category"":""Battery""}",{'Category': 'Battery'},Battery,


In [25]:
# Dropping model_response and model_response_parsed columns
final_data_1 = data_with_parsed_model_output_1.drop(['model_response','model_response_parsed','SubCategory'], axis=1)
final_data_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category
0,ST2023-006,My internet connection has significantly slowe...,Internet
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware_issue
2,ST2023-008,I've accidentally deleted essential work docum...,Data Loss
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,WiFi_Signal
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Battery


## **Task 2: Creating Tags**

In [26]:
data_2 = data.copy()

In [27]:
def response_2(prompt,ticket,category):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category: {category}
      A:
      """,
      max_tokens=1024, # setting the maximum number of tokens the model should generate for this task.
      stop=["Q:", "\n"],
      temperature=0.01, # setting the value for temperature.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [29]:
prompt_2 = """
You're purpose is to classify the tags based on the support ticket and their respective categories the tagging is for the IT department 
you will be given support tickets and their category understand the most relevant words from both of them and create appropriate tags 

Make sure that you generate tags to each and every support ticket it is very important to generate tags for every support ticket and category
also the output tags that you provide should be in JSON format for example

{"Tags" : ["WiFi", "data_loss", "connection_issue"]} (Note these are just Example Tags, you should determine what the tags are relevant to the tickets)
Also make sure that all the curly brackes are properly closed and there are no additional characters from the output 
"""

In [30]:
start = time.time()
data_2["model_response"]=final_data_1[['support_ticket_text','Category']].apply(lambda x: response_2(prompt_2, x[0],x[1]),axis =1)
end = time.time()

Llama.generate: prefix-match hit

llama_print_timings:        load time =  2060.21 ms
llama_print_timings:      sample time =    21.46 ms /    22 runs   (    0.98 ms per token,  1025.21 tokens per second)
llama_print_timings: prompt eval time =  2253.30 ms /   218 tokens (   10.34 ms per token,    96.75 tokens per second)
llama_print_timings:        eval time =  9148.42 ms /    21 runs   (  435.64 ms per token,     2.30 tokens per second)
llama_print_timings:       total time = 11504.05 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =  2060.21 ms
llama_print_timings:      sample time =    18.55 ms /    19 runs   (    0.98 ms per token,  1024.31 tokens per second)
llama_print_timings: prompt eval time =  1623.71 ms /    59 tokens (   27.52 ms per token,    36.34 tokens per second)
llama_print_timings:        eval time =  7837.63 ms /    18 runs   (  435.42 ms per token,     2.30 tokens per second)
llama_print_timings:       total time =  9548.40 ms
Llama.gene

In [32]:
print("Time taken ",end-start)

Time taken  224.32707834243774


In [33]:
data_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response
0,ST2023-006,My internet connection has significantly slowe...,"{""Tags"" : [""connection_issue"", ""internet"", ""sl..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Tags"" : [""Hardware"", ""laptop"", ""startup""]}"
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Tags"" : [""data_loss""]}"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Tags"" : [""WiFi"", ""weak_signal""]}"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Tags"" : [""smartphone"", ""battery_drain""]}"


In [35]:
i = 2
print(data_2.loc[i, 'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [36]:
print(data_2.loc[i, 'model_response'])

{"Tags" : ["data_loss"]}


In [37]:
# Applying the function to the model response
data_2['model_response_parsed'] = data_2['model_response'].apply(extract_json_data)

In [38]:
data_2["model_response_parsed"]

0     {'Tags': ['connection_issue', 'internet', 'slo...
1           {'Tags': ['Hardware', 'laptop', 'startup']}
2                               {'Tags': ['data_loss']}
3                     {'Tags': ['WiFi', 'weak_signal']}
4             {'Tags': ['smartphone', 'battery_drain']}
5        {'Tags': ['Account_Access', 'Password_Reset']}
6       {'Tags': ['performance_issue', 'productivity']}
7            {'Tags': ['BlueScreen', 'Hardware_Issue']}
8             {'Tags': ['Hard_drive', 'Data_recovery']}
9     {'Tags': ['Graphics_card', 'gaming_laptop', 'h...
10                  {'Tags': ['data_loss', 'recovery']}
11        {'Tags': ['Screen_issue', 'Technical_Issue']}
12     {'Tags': ['laptop', 'water_damage', 'hardware']}
13           {'Tags': ['data_recovery', 'flash_drive']}
14             {'Tags': ['touchpad', 'hardware_issue']}
15    {'Tags': ['connection_issue', 'internet', 'dro...
16               {'Tags': ['WiFi', 'connection_issue']}
17             {'Tags': ['data_loss', 'file_reco

In [39]:
# Normalizing the model_response_parsed column
model_response_parsed_df_2 = pd.json_normalize(data_2['model_response_parsed'])
model_response_parsed_df_2.head()

Unnamed: 0,Tags
0,"[connection_issue, internet, slow_connectivity]"
1,"[Hardware, laptop, startup]"
2,[data_loss]
3,"[WiFi, weak_signal]"
4,"[smartphone, battery_drain]"


In [40]:
# Concatinating two dataframes
data_with_parsed_model_output_2 = pd.concat([data_2, model_response_parsed_df_2], axis=1)
data_with_parsed_model_output_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,Tags
0,ST2023-006,My internet connection has significantly slowe...,"{""Tags"" : [""connection_issue"", ""internet"", ""sl...","{'Tags': ['connection_issue', 'internet', 'slo...","[connection_issue, internet, slow_connectivity]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Tags"" : [""Hardware"", ""laptop"", ""startup""]}","{'Tags': ['Hardware', 'laptop', 'startup']}","[Hardware, laptop, startup]"
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Tags"" : [""data_loss""]}",{'Tags': ['data_loss']},[data_loss]
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Tags"" : [""WiFi"", ""weak_signal""]}","{'Tags': ['WiFi', 'weak_signal']}","[WiFi, weak_signal]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Tags"" : [""smartphone"", ""battery_drain""]}","{'Tags': ['smartphone', 'battery_drain']}","[smartphone, battery_drain]"


In [41]:
# Dropping model_response and model_response_parsed columns
final_data_2 = data_with_parsed_model_output_2.drop(['model_response','model_response_parsed'], axis=1)
final_data_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,Tags
0,ST2023-006,My internet connection has significantly slowe...,"[connection_issue, internet, slow_connectivity]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"[Hardware, laptop, startup]"
2,ST2023-008,I've accidentally deleted essential work docum...,[data_loss]
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"[WiFi, weak_signal]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","[smartphone, battery_drain]"


In [42]:
# Checking the value counts of Category column
final_data_2['Tags'].value_counts()

[Hard_drive, Data_recovery]                                             2
[connection_issue, internet, slow_connectivity]                         1
[data_loss]                                                             1
[Hardware, laptop, startup]                                             1
[WiFi, weak_signal]                                                     1
[smartphone, battery_drain]                                             1
[performance_issue, productivity]                                       1
[Account_Access, Password_Reset]                                        1
[BlueScreen, Hardware_Issue]                                            1
[Graphics_card, gaming_laptop, hardware_issue]                          1
[data_loss, recovery]                                                   1
[Screen_issue, Technical_Issue]                                         1
[laptop, water_damage, hardware]                                        1
[data_recovery, flash_drive]          

In [43]:
final_data_2 = pd.concat([final_data_2,final_data_1["Category"]],axis=1)

In [44]:
final_data_2 = final_data_2[["support_tick_id","support_ticket_text","Category","Tags"]]
final_data_2

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags
0,ST2023-006,My internet connection has significantly slowe...,Internet,"[connection_issue, internet, slow_connectivity]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware_issue,"[Hardware, laptop, startup]"
2,ST2023-008,I've accidentally deleted essential work docum...,Data Loss,[data_loss]
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,WiFi_Signal,"[WiFi, weak_signal]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Battery,"[smartphone, battery_drain]"
5,ST2023-011,I'm locked out of my online banking account an...,Account_Access,"[Account_Access, Password_Reset]"
6,ST2023-012,"My computer's performance is sluggish, severel...",Performance,"[performance_issue, productivity]"
7,ST2023-013,I'm experiencing a recurring blue screen error...,Technical_Support,"[BlueScreen, Hardware_Issue]"
8,ST2023-014,My external hard drive isn't being recognized ...,Hardware,"[Hard_drive, Data_recovery]"
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware,"[Graphics_card, gaming_laptop, hardware_issue]"


## **Task 3: Assigning Priority and ETA**

In [45]:
data_3 = data.copy()

In [46]:
def response_3(prompt,ticket,category,tags):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category: {category}
      Tags: {tags}
      A:
      """,
      max_tokens=1024,  # setting the maximum number of tokens the model should generate for this task.
      stop=["Q:", "\n"],
      temperature=0.01, # setting the value for temperature.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [48]:
prompt_3 = """
   You're task it to classify the Priority and ETA for the support tickets based on the severity of the issue and the time that it would take to fix the .
   You'll be given support ticket their category and tags understand the severity of the issue and assign the Priority and ETA.
   
   The output should be in JSON Format for example
   {"Priority" : "High", "ETA" : "Immediate"}
   {"Priority" : "Medium", "ETA" : "2 Days"}
   {"Priority" : "Low", "ETA" : "2 - 3 Business Days"}
   (Note: this is just an example you've to understand the support ticket its category and tags, based on it you should assign Priority and ETA to it).
   
   Make sure that Priority and ETA is generated for each support ticket its important that none gets missed out.
   Also make sure that all the curly braces are closed and there are no additional characters in the output.
"""

In [49]:
# Applying generate_llama_response function on support_ticket_text column
start = time.time()
data_3['model_response'] = final_data_2[['support_ticket_text','Category','Tags']].apply(lambda x: response_3(prompt_3, x[0],x[1],x[2]),axis=1)
end = time.time()

Llama.generate: prefix-match hit

llama_print_timings:        load time =  2060.21 ms
llama_print_timings:      sample time =    16.49 ms /    17 runs   (    0.97 ms per token,  1031.05 tokens per second)
llama_print_timings: prompt eval time =  2660.39 ms /   311 tokens (    8.55 ms per token,   116.90 tokens per second)
llama_print_timings:        eval time =  6979.88 ms /    16 runs   (  436.24 ms per token,     2.29 tokens per second)
llama_print_timings:       total time =  9719.73 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =  2060.21 ms
llama_print_timings:      sample time =    16.58 ms /    17 runs   (    0.98 ms per token,  1025.52 tokens per second)
llama_print_timings: prompt eval time =  1784.24 ms /    77 tokens (   23.17 ms per token,    43.16 tokens per second)
llama_print_timings:        eval time =  7346.11 ms /    16 runs   (  459.13 ms per token,     2.18 tokens per second)
llama_print_timings:       total time =  9209.56 ms
Llama.gene

In [50]:
print("Time taken ",(end-start))

Time taken  193.0741159915924


In [51]:
data_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response
0,ST2023-006,My internet connection has significantly slowe...,"{""Priority"" : ""High"", ""ETA"" : ""Immediate""}"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Priority"" : ""High"", ""ETA"" : ""Immediate""}"
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Priority"" : ""High"", ""ETA"" : ""Immediate""}"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Priority"" : ""Medium"", ""ETA"" : ""2 Days""}"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Priority"" : ""Medium"", ""ETA"" : ""1 - 2 Busines..."


In [53]:
i = 2
print(data_3.loc[i, 'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [54]:
print(data_3.loc[i, 'model_response'])

{"Priority" : "High", "ETA" : "Immediate"}


In [55]:
# Applying the function to the model response
data_3['model_response_parsed'] = data_3['model_response'].apply(extract_json_data)
data_3['model_response_parsed'].head()

0             {'Priority': 'High', 'ETA': 'Immediate'}
1             {'Priority': 'High', 'ETA': 'Immediate'}
2             {'Priority': 'High', 'ETA': 'Immediate'}
3              {'Priority': 'Medium', 'ETA': '2 Days'}
4    {'Priority': 'Medium', 'ETA': '1 - 2 Business ...
Name: model_response_parsed, dtype: object

In [56]:
# Normalizing the model_response_parsed column
model_response_parsed_df_3 = pd.json_normalize(data_3['model_response_parsed'])
model_response_parsed_df_3.head(21)

Unnamed: 0,Priority,ETA
0,High,Immediate
1,High,Immediate
2,High,Immediate
3,Medium,2 Days
4,Medium,1 - 2 Business Days
5,High,Immediate
6,High,Immediate
7,High,Immediate
8,High,Immediate
9,High,Immediate


In [57]:
# Concatinating two dataframes
data_with_parsed_model_output_3 = pd.concat([data_3, model_response_parsed_df_3], axis=1)
data_with_parsed_model_output_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,Priority,ETA
0,ST2023-006,My internet connection has significantly slowe...,"{""Priority"" : ""High"", ""ETA"" : ""Immediate""}","{'Priority': 'High', 'ETA': 'Immediate'}",High,Immediate
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Priority"" : ""High"", ""ETA"" : ""Immediate""}","{'Priority': 'High', 'ETA': 'Immediate'}",High,Immediate
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Priority"" : ""High"", ""ETA"" : ""Immediate""}","{'Priority': 'High', 'ETA': 'Immediate'}",High,Immediate
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Priority"" : ""Medium"", ""ETA"" : ""2 Days""}","{'Priority': 'Medium', 'ETA': '2 Days'}",Medium,2 Days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Priority"" : ""Medium"", ""ETA"" : ""1 - 2 Busines...","{'Priority': 'Medium', 'ETA': '1 - 2 Business ...",Medium,1 - 2 Business Days


In [58]:
# Dropping model_response and model_response_parsed columns
final_data_3 = data_with_parsed_model_output_3.drop(['model_response','model_response_parsed'], axis=1)
final_data_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,Priority,ETA
0,ST2023-006,My internet connection has significantly slowe...,High,Immediate
1,ST2023-007,Urgent help required! My laptop refuses to sta...,High,Immediate
2,ST2023-008,I've accidentally deleted essential work docum...,High,Immediate
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Medium,2 Days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Medium,1 - 2 Business Days


In [59]:
final_data_3 = pd.concat([final_data_3,final_data_2[["Category","Tags"]]],axis=1)

In [60]:
final_data_3 = final_data_3[["support_tick_id","support_ticket_text","Category","Tags","Priority","ETA"]]

In [61]:
final_data_3

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority,ETA
0,ST2023-006,My internet connection has significantly slowe...,Internet,"[connection_issue, internet, slow_connectivity]",High,Immediate
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware_issue,"[Hardware, laptop, startup]",High,Immediate
2,ST2023-008,I've accidentally deleted essential work docum...,Data Loss,[data_loss],High,Immediate
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,WiFi_Signal,"[WiFi, weak_signal]",Medium,2 Days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Battery,"[smartphone, battery_drain]",Medium,1 - 2 Business Days
5,ST2023-011,I'm locked out of my online banking account an...,Account_Access,"[Account_Access, Password_Reset]",High,Immediate
6,ST2023-012,"My computer's performance is sluggish, severel...",Performance,"[performance_issue, productivity]",High,Immediate
7,ST2023-013,I'm experiencing a recurring blue screen error...,Technical_Support,"[BlueScreen, Hardware_Issue]",High,Immediate
8,ST2023-014,My external hard drive isn't being recognized ...,Hardware,"[Hard_drive, Data_recovery]",High,Immediate
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware,"[Graphics_card, gaming_laptop, hardware_issue]",High,Immediate


## **Task 4 - Creating a Draft Response**

In [62]:
data_4 = data.copy()

In [144]:
def response_4(prompt,ticket,category,tags,priority,eta):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category : {category}
      Tags : {tags}
      Priority: {priority}
      ETA: {eta}
      A:
      """,
      max_tokens=1024, # setting the maximum number of tokens the model should generate for this task.
      stop=None,#["Q:", "\n"], # removing the stop so that the model would generate a complete response to the customers
      temperature=0.01, # setting the value for temperature.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]


    return temp_output

In [147]:
prompt_4 = """
   You are an AI tasked with creating a draft response for support tickets.
   Please create a draft response based on the following criteria: Customer satisfaction, severity of issue, how much responsibility the company has for the issue.
   Please also be understanding, professional, helpful and to the point.
   Also please keep it under 200 words
"""

In [148]:
#Applying generate_llama_response function on support_ticket_text column
start = time.time()
data_4['model_response'] = final_data_3[['support_ticket_text','Category','Tags','Priority','ETA']].apply(lambda x: response_4(prompt_4, x[0],x[1],x[2],x[3],x[4]),axis=1)
end = time.time()

Llama.generate: prefix-match hit

llama_print_timings:        load time =  2060.21 ms
llama_print_timings:      sample time =   250.19 ms /   234 runs   (    1.07 ms per token,   935.28 tokens per second)
llama_print_timings: prompt eval time =  1864.35 ms /   109 tokens (   17.10 ms per token,    58.47 tokens per second)
llama_print_timings:        eval time = 108514.01 ms /   233 runs   (  465.73 ms per token,     2.15 tokens per second)
llama_print_timings:       total time = 111667.01 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =  2060.21 ms
llama_print_timings:      sample time =   219.81 ms /   211 runs   (    1.04 ms per token,   959.93 tokens per second)
llama_print_timings: prompt eval time =  1784.89 ms /    90 tokens (   19.83 ms per token,    50.42 tokens per second)
llama_print_timings:        eval time = 97879.09 ms /   210 runs   (  466.09 ms per token,     2.15 tokens per second)
llama_print_timings:       total time = 100808.48 ms
Llama.g

In [149]:
print("Time taken",(end-start))

Time taken 2526.6263999938965


In [150]:
data_4.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response
0,ST2023-006,My internet connection has significantly slowe...,"Dear Valued Customer,\n \n We apo..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"Dear Valued Customer,\n We understand ..."
2,ST2023-008,I've accidentally deleted essential work docum...,"Dear Valued Customer,\n \n We are..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"Dear Valued Customer,\n \n We apo..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","Dear Valued Customer,\n \n We're ..."


In [152]:
i = 2
print(data_4.loc[i, 'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [153]:
print(data_4.loc[i, 'model_response'])

 Dear Valued Customer,
       
       We are deeply sorry for the inconvenience you have experienced due to the accidental deletion of essential work documents. We understand the severity of this issue and the impact it may have on your productivity. Our team is committed to prioritizing the data recovery process and will work diligently to help you regain access to your lost files as soon as possible.
       
       In order to facilitate the data recovery process, please follow these steps:
         1. Do not use your device for any further actions that may overwrite the deleted files.
         2. Contact our technical support team at [support_email] or call us at [support_phone] to initiate the data recovery process.
         3. Provide us with the necessary details about your device and the type of files that were lost.
       
       We take full responsibility for this issue and will do everything in our power to help you recover your data as quickly and efficiently as possible. 

In [154]:
final_data_4 = pd.concat([final_data_3,data_4["model_response"]],axis=1)

In [155]:
final_data_4.rename(columns={"model_response":"Response"},inplace=True)

In [156]:
final_data_4

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority,ETA,Response
0,ST2023-006,My internet connection has significantly slowe...,Internet,"[connection_issue, internet, slow_connectivity]",High,Immediate,"Dear Valued Customer,\n \n We apo..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware_issue,"[Hardware, laptop, startup]",High,Immediate,"Dear Valued Customer,\n We understand ..."
2,ST2023-008,I've accidentally deleted essential work docum...,Data Loss,[data_loss],High,Immediate,"Dear Valued Customer,\n \n We are..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,WiFi_Signal,"[WiFi, weak_signal]",Medium,2 Days,"Dear Valued Customer,\n \n We apo..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Battery,"[smartphone, battery_drain]",Medium,1 - 2 Business Days,"Dear Valued Customer,\n \n We're ..."
5,ST2023-011,I'm locked out of my online banking account an...,Account_Access,"[Account_Access, Password_Reset]",High,Immediate,"Dear Valued Customer,\n \n We're ..."
6,ST2023-012,"My computer's performance is sluggish, severel...",Performance,"[performance_issue, productivity]",High,Immediate,"Dear Valued Customer,\n \n We apo..."
7,ST2023-013,I'm experiencing a recurring blue screen error...,Technical_Support,"[BlueScreen, Hardware_Issue]",High,Immediate,"Dear Valued Customer,\n \n We're ..."
8,ST2023-014,My external hard drive isn't being recognized ...,Hardware,"[Hard_drive, Data_recovery]",High,Immediate,"Dear Valued Customer,\n \n We are..."
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware,"[Graphics_card, gaming_laptop, hardware_issue]",High,Immediate,"Dear Valued Customer,\n \n We're ..."


## **Model Output Analysis**

In [157]:
final_data = final_data_4.copy()

In [158]:
final_data['Category'].value_counts()

Hardware             5
Data Recovery        3
Connectivity         2
Data Loss            1
Internet             1
Hardware_issue       1
Account_Access       1
Battery              1
WiFi_Signal          1
Technical_Support    1
Performance          1
Technical_Issue      1
Internet_Issue       1
Software_Issue       1
Name: Category, dtype: int64

In [159]:
final_data["Priority"].value_counts() 

High      19
Medium     2
Name: Priority, dtype: int64

In [160]:
final_data["ETA"].value_counts()

Immediate              19
2 Days                  1
1 - 2 Business Days     1
Name: ETA, dtype: int64

Let's dive in a bit deeper here.

In [161]:
final_data.groupby(['Category', 'ETA']).support_tick_id.count()

Category           ETA                
Account_Access     Immediate              1
Battery            1 - 2 Business Days    1
Connectivity       Immediate              2
Data Loss          Immediate              1
Data Recovery      Immediate              3
Hardware           Immediate              5
Hardware_issue     Immediate              1
Internet           Immediate              1
Internet_Issue     Immediate              1
Performance        Immediate              1
Software_Issue     Immediate              1
Technical_Issue    Immediate              1
Technical_Support  Immediate              1
WiFi_Signal        2 Days                 1
Name: support_tick_id, dtype: int64

## **Actionable Insights and Recommendations**

- Leveraging LLM's for support ticket categorizing seem a brilliant use case especially since these techniques think and work on par with human.
- But we do have to careful when using such techniques as when we automate this process everything depends upon the LLM's response as well.
- In this particular task it was not easy to make the model understand that I am expecting the output in JSON format even when we've explicitly provided
    examples the model would generate category/tags/priority/eta for inital amount of queries but would skip the last 2 queries for some reason .
- And we overcame such scenarios by providing more examples it was also important to keep the prompt consistent, if we mention support ticket / customer's comment the model didn't think of it as the same thing it gets confused and didn't output a response.
- I would recommend using Llama 2.1 Instruct models for such tasks especially 13B Parameter model if you've the resource as it would provide more precise answers 
