# Ticket Processing System

This notebook demonstrates an NLP-based ticket processing workflow, including data preprocessing, feature engineering, model training, evaluation, entity extraction, and deployment via a Gradio interface.

## 1. Importing Libraries and Dataset

In [3]:
# Importing the dataset
import pandas as pd

data= pd.read_excel("E:\\assignment\\ai_dev_assignment_tickets_complex_1000.xls")
data.info()
print(data.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   ticket_id      1000 non-null   int64 
 1   ticket_text    945 non-null    object
 2   issue_type     924 non-null    object
 3   urgency_level  948 non-null    object
 4   product        1000 non-null   object
dtypes: int64(1), object(4)
memory usage: 39.2+ KB
   ticket_id                                        ticket_text  \
0          1  Payment issue for my SmartWatch V2. I was unde...   
1          2  Can you tell me more about the UltraClean Vacu...   
2          3  I ordered SoundWave 300 but got EcoBreeze AC i...   
3          4  Facing installation issue with PhotoSnap Cam. ...   
4          5  Order #30903 for Vision LED TV is 13 days late...   

           issue_type urgency_level            product  
0     Billing Problem        Medium      SmartWatch V2  
1     General Inquiry     

In [4]:
data.dropna(subset=['issue_type', 'urgency_level'], inplace=True)
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 876 entries, 0 to 999
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   ticket_id      876 non-null    int64 
 1   ticket_text    826 non-null    object
 2   issue_type     876 non-null    object
 3   urgency_level  876 non-null    object
 4   product        876 non-null    object
dtypes: int64(1), object(4)
memory usage: 41.1+ KB


## 2. Importing NLP and ML Modules

In [1]:
# Importing modules
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import json
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from textblob import TextBlob
from scipy.sparse import hstack

In [5]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('vader_lexicon')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\USER\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

## 3. Data Preprocessing

We normalize the text, remove stopwords, perform tokenization and lemmatization, and handle missing values.

In [6]:
#Data preprocessing (text normalization, Handling missing values, Tokenization, stopword removal, and lemmatization)

#Data preprocessing function

def preprocess_text(text):
    # Convert to lowercase
    if isinstance(text, str):
        text = text.lower()
    
        # Remove special characters and numbers
        text = re.sub(r'[^a-z\s]', '', text)
    
        # Tokenization
        tokens = text.split()
    
        # Remove stopwords
        stop_words = set(stopwords.words('english'))
        tokens = [word for word in tokens if word not in stop_words]
    
        # Lemmatization
        lemmatizer = WordNetLemmatizer()
        tokens = [lemmatizer.lemmatize(word) for word in tokens]
    
        return ' '.join(tokens)
    else:
        return ''

data['cleaned_text'] = data['ticket_text'].apply(preprocess_text)



## 4. Feature Engineering

We use TF-IDF vectorization, ticket length, and sentiment score as features.

In [7]:

# Feature Engineering (TF-IDF Vectorization)

vectorizer = TfidfVectorizer()
X=vectorizer.fit_transform(data['cleaned_text'])

#additional features
#ticket length
data['ticket_length'] = data['ticket_text'].apply(lambda x: len(x.split()) if isinstance(x, str) else 0)
#sentiment score
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
data['sentiment_score'] = data['ticket_text'].apply(lambda x: sia.polarity_scores(x)['compound'] if isinstance(x, str) else 0)

# Combining TF-IDF features with additional features
additional_features = data[['ticket_length', 'sentiment_score']]
X_combined = hstack([X, additional_features.values])


## 5. Preparing Target Variables and Splitting Dataset

In [8]:

# Target variable
y_issue = data['issue_type']
y_urgency = data['urgency_level']
# Splitting the dataset into training and testing sets
X_train, X_test, y_train_issue, y_test_issue = train_test_split(X, y_issue, test_size=0.2, random_state = 42)
X_train_urgency, X_test_urgency, y_train_urgency, y_test_urgency = train_test_split(X, y_urgency, test_size = 0.2, random_state = 42)

## 6. Model Training and Evaluation

We use a Random Forest Classifier for both issue type and urgency prediction.

In [18]:
#Model Training and Evaluation

# Random Forest Classifier for issue type prediction
issue_classifier = RandomForestClassifier()
issue_classifier.fit(X_train, y_train_issue)
# Random Forest Classifier for urgency level prediction
urgency_classifier = RandomForestClassifier()
urgency_classifier.fit(X_train_urgency, y_train_urgency)

# Model Evaluation for issue type
y_pred_issue = issue_classifier.predict(X_test)
# Model Evaluation for urgency level
y_pred_urgency = urgency_classifier.predict(X_test_urgency)

print("Issue Type Classification Report: ")
print(classification_report(y_test_issue, y_pred_issue))
print("Urgency Level Classification Report: ")
print(classification_report(y_test_urgency, y_pred_urgency))

Issue Type Classification Report: 
                    precision    recall  f1-score   support

    Account Access       1.00      0.97      0.99        35
   Billing Problem       1.00      0.91      0.95        32
   General Inquiry       1.00      0.93      0.96        29
Installation Issue       1.00      0.92      0.96        24
     Late Delivery       0.67      1.00      0.80        20
    Product Defect       1.00      0.95      0.97        19
        Wrong Item       1.00      0.94      0.97        17

          accuracy                           0.94       176
         macro avg       0.95      0.94      0.94       176
      weighted avg       0.96      0.94      0.95       176

Urgency Level Classification Report: 
              precision    recall  f1-score   support

        High       0.32      0.33      0.33        60
         Low       0.24      0.22      0.23        54
      Medium       0.39      0.39      0.39        62

    accuracy                           0.32   

## 7. Entity Extraction

We extract product names, dates, and complaint keywords from ticket text.

In [10]:
# Entity Extraction Function
product_names = data['product'].unique().tolist()
def extract_entities(ticket_text):
    products = re.findall(r'\b(?:' + '|'.join(map(re.escape, product_names)) + r')\b', ticket_text, flags=re.IGNORECASE)
    dates = re.findall(r'\b\d{1,2} \w+ d{4}\b', ticket_text)
    keywords = ['broken', 'late', 'error']
    complaints = [word for word in ticket_text.split() if word.lower() in keywords]
    return{"product_names": products, 
           "dates": dates, 
           "complaints": complaints
    }

## 8. Integration Function

Combining all steps into a single function for processing new tickets.

In [11]:
# Integration Function
def process_ticket(ticket_text):
    # Preprocess the ticket_text
    cleaned_text = preprocess_text(ticket_text)
    
    # Predict issue_type and urgency_level using trained models
    issue_type = issue_classifier.predict(vectorizer.transform([cleaned_text]))[0]
    urgency_level = urgency_classifier.predict(vectorizer.transform([cleaned_text]))[0]
    
    # Extract entities
    extracted_entities = extract_entities(ticket_text)
    
    return {
        "predicted_issue_type": issue_type,
        "predicted_urgency_level": urgency_level,
        "extracted_entities": extracted_entities,
    }

## 9. Example Usage

Test the system on example tickets.

In [12]:
# Example usage
example_ticket = "I ordered SmartWatch V2 but got PowerMax Battery instead. My order number is #65084."
result = process_ticket(example_ticket)
print(json.dumps(result, indent=4))

{
    "predicted_issue_type": "Wrong Item",
    "predicted_urgency_level": "Medium",
    "extracted_entities": {
        "product_names": [
            "SmartWatch V2",
            "PowerMax Battery"
        ],
        "dates": [],
        "complaints": []
    }
}


In [13]:
# Example usage
example_ticket = "I ordered the SmartWatch V2, but it is broken and the delivery was late."
result = process_ticket(example_ticket)
print(json.dumps(result, indent=4))

{
    "predicted_issue_type": "Late Delivery",
    "predicted_urgency_level": "Medium",
    "extracted_entities": {
        "product_names": [
            "SmartWatch V2"
        ],
        "dates": [],
        "complaints": [
            "broken"
        ]
    }
}


In [14]:
# Example usage
example_ticket = "Both my PowerMax Battery and EcoBreeze AC are lost. Both giving issues. Also, I contacted support on 12 March but got no response."
result = process_ticket(example_ticket)
print(json.dumps(result, indent=4))

{
    "predicted_issue_type": "Product Defect",
    "predicted_urgency_level": "High",
    "extracted_entities": {
        "product_names": [
            "PowerMax Battery",
            "EcoBreeze AC"
        ],
        "dates": [],
        "complaints": []
    }
}


## 10. Gradio Interface

We use Gradio to deploy the ticket processing system as a web app.

In [16]:
# Gradio Interface
import gradio as gr
def gradio_interface(ticket_text):
    result = process_ticket(ticket_text)
    return json.dumps(result, indent=4)
# Create Gradio app
iface = gr.Interface(fn=gradio_interface, 
                     inputs="text", 
                     outputs="json", 
                     title="Ticket Processing System",
                     description="Input raw ticket text to see the predicted issue type, urgency, and extracted entities.")
# Launch the Gradio app
iface.launch()

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.




In [24]:
!pip install gradio


Collecting gradio
  Downloading gradio-5.31.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.10.1 (from gradio)
  Downloading gradio_client-1.10.1-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting huggingface-hub>=0.28.1 (from gradio)
  Downloading huggingface_hub-0.32.2-py3-none-any.whl.metadata (14 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.18-cp312-cp312-win_amd64.whl.metadata (43 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading 