# AI for IT Support: Automated Ticket Classification

**Copyright (c) 2026 Shrikara Kaudambady**

This notebook implements a machine learning model to automatically classify IT support tickets based on their descriptions. By automating this process, IT departments can improve efficiency, reduce manual effort, and ensure tickets are routed to the correct teams faster.

**Problem:** Manually sorting through hundreds of IT tickets is time-consuming and prone to error.
**Solution:** A Natural Language Processing (NLP) model that reads a ticket's description and assigns it to a predefined category (e.g., `Hardware`, `Software`, `Network`).

## 1. Setup and Library Imports

First, we import the necessary libraries for data manipulation, text processing, model training, and visualization.

In [None]:
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Settings
sns.set_style('whitegrid')
warnings.filterwarnings('ignore')

## 2. Data Loading and Exploratory Data Analysis (EDA)

We will load our synthetic dataset of IT support tickets and perform a brief analysis to understand its structure and the distribution of categories.

In [None]:
df = pd.read_csv('it_support_tickets.csv')
print("Dataset shape:", df.shape)
df.head()

In [None]:
print("Category Distribution:")
plt.figure(figsize=(8, 5))
sns.countplot(x='category', data=df, palette='viridis')
plt.title('Distribution of Ticket Categories')
plt.xlabel('Category')
plt.ylabel('Number of Tickets')
plt.show()

## 3. Text Preprocessing

This is a critical step in any NLP task. We will clean the ticket descriptions by:
1. Converting text to lowercase.
2. Removing punctuation and numbers.
3. Removing common English 'stopwords' (e.g., 'the', 'a', 'is').
4. Applying stemming to reduce words to their root form (e.g., 'running' -> 'run').

In [None]:
# Download stopwords from NLTK
# This only needs to be done once
try:
    stopwords.words('english')
except LookupError:
    nltk.download('stopwords')

In [None]:
stemmer = PorterStemmer()
stop_words = set(stopwords.words('english'))

def clean_text(text):
    # Lowercase
    text = text.lower()
    # Remove punctuation and numbers
    text = re.sub(r'[^a-z\s]', '', text)
    # Tokenize and remove stopwords
    tokens = [word for word in text.split() if word not in stop_words]
    # Stemming
    stemmed_tokens = [stemmer.stem(word) for word in tokens]
    return ' '.join(stemmed_tokens)

In [None]:
# Apply the cleaning function
df['cleaned_description'] = df['description'].apply(clean_text)
print("Original vs. Cleaned Text:")
print("Original:", df['description'][0])
print("Cleaned:", df['cleaned_description'][0])

## 4. Feature Engineering (TF-IDF Vectorization)

Machine learning models can't understand raw text. We need to convert the cleaned text into numerical vectors. We'll use the **TF-IDF (Term Frequency-Inverse Document Frequency)** method. TF-IDF gives higher weight to words that are frequent in a specific document but rare across all documents, making them good indicators of the topic.

In [None]:
vectorizer = TfidfVectorizer(max_features=1000)

X = vectorizer.fit_transform(df['cleaned_description'])
y = df['category']

print("Shape of TF-IDF matrix:", X.shape)

## 5. Model Training

We will split our data into training and testing sets and train a `LogisticRegression` model. This model is a strong and interpretable baseline for text classification tasks.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

## 6. Model Evaluation

We'll now evaluate our model's performance on the unseen test data using accuracy, a classification report, and a confusion matrix.

In [None]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

In [None]:
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

In [None]:
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=model.classes_, yticklabels=model.classes_)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Category')
plt.ylabel('Actual Category')
plt.show()

## 7. Testing with New Tickets

Let's see how our model performs on a few new, unseen ticket descriptions.

In [None]:
new_tickets = [
    "My email client is not syncing new emails from the server.",
    "The power button on my desktop computer is stuck.",
    "I can't access the internet from my laptop, the wifi icon has a yellow triangle."
]

cleaned_new_tickets = [clean_text(ticket) for ticket in new_tickets]
new_tickets_tfidf = vectorizer.transform(cleaned_new_tickets)

predictions = model.predict(new_tickets_tfidf)

for ticket, category in zip(new_tickets, predictions):
    print(f'Ticket: "{ticket}" -> Predicted Category: {category}')

## 8. Conclusion

We have successfully built a simple but effective machine learning model to classify IT support tickets. With a high accuracy on our test set, this model can serve as a solid foundation for an automated ticket routing system.

### Next Steps
- **Use more data:** A real-world system would require thousands of tickets to be robust.
- **Advanced Models:** Explore more complex models like Support Vector Machines (SVM), or deep learning models (LSTMs, Transformers like BERT) for potentially higher accuracy.
- **Deployment:** Wrap the model in a REST API (using a framework like Flask or FastAPI) so that it can be integrated with a real IT ticketing system.