# Email Triage Inference Pipeline

This Jupyter Notebook focuses on testing the trained XLNet model for email triage and demonstrating its ability to predict the service for a given email ticket description. The guide outlines the steps for loading the trained model and tokenizer, preprocessing the input data, and making predictions.

## Setup

Before starting, ensure all necessary libraries are installed.

In [1]:
!pip install transformers torch pandas matplotlib seaborn tqdm sentencepiece
!pip install --upgrade jupyter ipywidgets


Collecting transformers
  Using cached transformers-4.39.3-py3-none-any.whl.metadata (134 kB)
Collecting torch
  Using cached torch-2.2.2-cp311-cp311-manylinux1_x86_64.whl.metadata (25 kB)
Collecting pandas
  Using cached pandas-2.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Collecting matplotlib
  Using cached matplotlib-3.8.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.8 kB)
Collecting seaborn
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Collecting sentencepiece
  Using cached sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting filelock (from transformers)
  Using cached filelock-3.13.4-py3-none-any.whl.metadata (2.8 kB)
Collecting huggingface-hub<1.0,>=0.19.3 (from transformers)
  Using cached huggingface_hub-0.22.2-py3-none-any.whl.metadata (12 kB)
Collecting numpy>=1.17 (from transformers)
  Using cached numpy-1.26.4-cp311-cp311-manylinux_2_17_x

## Step 1: Import Required Libraries

Import all necessary libraries for the inference process.

In [2]:
import torch
import pandas as pd
from transformers import XLNetTokenizer, XLNetForSequenceClassification

## Step 2: Load the Trained Model and Tokenizer

Load the trained XLNet model and tokenizer from the saved paths.

In [3]:
# Load the dataset
df = pd.read_csv('./processed_data/enhanced.csv')
print("Dataset loaded successfully.")

# Map 'Actual Service' to numeric labels
unique_services = df['Actual Service'].unique()
service_to_label = {service: i for i, service in enumerate(unique_services)}
print("Service to Label Mapping:", service_to_label)

# Load the tokenizer and model
model_path = './model/xlnet_email_classifier_model'
tokenizer_path = './model/xlnet_email_classifier_tokenizer'
tokenizer = XLNetTokenizer.from_pretrained(tokenizer_path)
model = XLNetForSequenceClassification.from_pretrained(model_path)

# Define the labels for inference
label_to_service = {v: k for k, v in service_to_label.items()}

print("Model and tokenizer loaded successfully.")

Dataset loaded successfully.
Service to Label Mapping: {'UWE Devices and Hardware Support': 0, 'Software Delivery': 1, 'IT Service Desk and Customer Support': 2, 'Lecture and AV Technologies': 3, 'Virtual Environments': 4, 'Student Application Experience': 5, 'Digital Learning': 6, 'Facilities - Business Systems': 7, 'Password and Identity Management': 8, 'Staff Printing': 9, 'Facilities - Operations': 10, 'Collaboration Tools': 11, 'Remote Connectivity': 12, 'Student Records/Administration': 13, 'Web and intranet systems': 14, 'Student Printing': 15, 'UWE Device Management': 16, 'PC, Mobile Device, and Software Delivery': 17, 'Email and Calendaring': 18, 'WiFi Networks': 19, 'Web Services': 20, 'Telephony and Video Conferencing': 21, 'Service Desk and Customer Support': 22, 'Software Usage and Availability': 23, 'Authentication and Identity Management': 24, 'Desktop Software Deployment': 25, 'Virtual Learning Environments': 26, 'Student Journey Systems': 27, 'Networking Service': 28, 

## Step 3: Preprocess the Input Data

Define a function to preprocess the input ticket description and convert it into the required format for the model.

In [4]:
def preprocess_input(ticket_description):
    inputs = tokenizer.encode_plus(
        ticket_description,
        None,
        add_special_tokens=True,
        max_length=512,
        padding='max_length',
        return_attention_mask=True,
        truncation=True,
        return_tensors='pt'
    )
    return inputs

## Step 4: Make Predictions

Define a function to make predictions using the trained model and convert the predicted label to the corresponding service.

In [5]:
def predict_service(ticket_description):
    inputs = preprocess_input(ticket_description)
    with torch.no_grad():
        outputs = model(**inputs)
        predicted_label = torch.argmax(outputs.logits, dim=1).item()
        predicted_service = label_to_service[predicted_label]
    return predicted_service

label_to_service = {v: k for k, v in service_to_label.items()}

## Step 5: Test the Model

Provide a sample ticket description and test the model's prediction.

In [10]:
sample_ticket_description = "I'm trying to fix my server and lost my password and login id. Can you please help?"

predicted_service = predict_service(sample_ticket_description)
print(f"Predicted Service: {predicted_service}")

Predicted Service: Line of Business Applications
