# Faker Data Enrichment Experiment

This notebook demonstrates how to append synthetic, contextually relevant data to "Ticket Descriptions" using the Faker library. The goal is to enrich existing ticket descriptions based on their "Actual Service" category.

#### 0. [Temporary workaround for "Module not found" error]

In [1]:
# Install Faker directly within the Jupyter notebook
!pip install Faker pandas


Collecting Faker
  Using cached Faker-24.2.0-py3-none-any.whl.metadata (15 kB)
Using cached Faker-24.2.0-py3-none-any.whl (1.8 MB)
Installing collected packages: Faker
Successfully installed Faker-24.2.0


#### 1. Import Necessary Libraries

First, we import the Python libraries needed for this experiment: `Faker` for generating synthetic data and `pandas` for data manipulation.


In [2]:
from faker import Faker
import pandas as pd
fake = Faker()


#### 2. Define Service Keywords

For each "Actual Service" category, we define a set of keywords that are contextually relevant. These keywords will help in generating ticket descriptions that are somewhat related to the service.


In [3]:
# Define keywords and phrases for each "Actual Service"

service_keywords = {
    "UWE Devices and Hardware Support": ["device", "hardware", "malfunction", "repair", "support"],
    "Software Delivery": ["software", "delivery", "installation", "update", "license"],
    "IT Service Desk and Customer Support": ["support", "ticket", "query", "resolution", "helpdesk"],
    "Lecture and AV Technologies": ["lecture", "audio", "video", "technology", "presentation"],
    "Virtual Environments": ["virtual", "environment", "simulation", "cloud", "hosting"],
    "Student Application Experience": ["application", "experience", "user", "interface", "navigation"],
    "Digital Learning": ["digital", "learning", "elearning", "course", "online"],
    "Facilities - Business Systems": ["facility", "business", "system", "management", "operation"],
    "Password and Identity Management": ["password", "identity", "security", "authentication", "access"],
    "Staff Printing": ["printing", "document", "print", "paper", "ink"],
    "Facilities - Operations": ["facility", "operation", "maintenance", "service", "infrastructure"],
    "Collaboration Tools": ["collaboration", "tool", "software", "teamwork", "communication"],
    "Remote Connectivity": ["remote", "connectivity", "VPN", "access", "network"],
    "Student Records/Administration": ["record", "administration", "student", "data", "management"],
    "Web and intranet systems": ["web", "intranet", "portal", "site", "content"],
    "Student Printing": ["student", "printing", "print", "document", "paper"],
    "UWE Device Management": ["device", "management", "inventory", "tracking", "asset"],
    "PC, Mobile Device, and Software Delivery": ["PC", "mobile", "software", "device", "delivery"],
    "Email and Calendaring": ["email", "calendar", "scheduling", "communication", "appointment"],
    "WiFi Networks": ["WiFi", "network", "connectivity", "wireless", "internet"],
    "Web Services": ["web", "service", "API", "online", "integration"],
    "Telephony and Video Conferencing": ["telephony", "video", "conference", "call", "communication"],
    "Service Desk and Customer Support": ["service", "desk", "support", "helpdesk", "customer"],
    "Software Usage and Availability": ["software", "usage", "availability", "license", "access"],
    "Authentication and Identity Management": ["authentication", "identity", "security", "login", "access"],
    "Desktop Software Deployment": ["desktop", "software", "deployment", "installation", "update"],
    "Virtual Learning Environments": ["virtual", "learning", "environment", "elearning", "platform"],
    "Student Journey Systems": ["student", "journey", "system", "management", "tracking"],
    "Networking Service": ["network", "service", "connectivity", "infrastructure", "internet"],
    "Teaching and Audio Visual Support": ["teaching", "audio", "visual", "support", "classroom"],
    "Monitoring and Reporting Services": ["monitoring", "reporting", "service", "analytics", "data"],
    "Finance Systems": ["finance", "system", "accounting", "budget", "transaction"],
    "Email and Calendars": ["email", "calendar", "scheduling", "communication", "appointment"],
    "File Storage": ["file", "storage", "data", "cloud", "archive"],
    "Campus phone handsets/headsets": ["campus", "phone", "handset", "headset", "telephony"],
    "Timetabling Systems": ["timetable", "system", "scheduling", "planning", "calendar"],
    "Data/Application Integration": ["data", "application", "integration", "system", "software"],
    "Telephone Switchboard": ["telephone", "switchboard", "call", "communication", "routing"],
    "HR Systems": ["HR", "human", "resource", "management", "employee"],
    "Database Services": ["database", "service", "data", "storage", "management"],
    "Library Systems": ["library", "system", "catalog", "borrow", "book"],
    "Application Testing": ["application", "testing", "software", "quality", "test"],
    "Mobile phones": ["mobile", "phone", "device", "call", "connectivity"],
    "IT Architecture": ["architecture", "IT", "system", "design", "infrastructure"],
    "Wired Networks": ["wired", "network", "connectivity", "cable", "LAN"],
    "Staff meeting technologies": ["meeting", "technology", "conference", "collaboration", "communication"],
    "Line of Business Applications": ["business", "application", "software", "solution", "service"],
    "IT Online": ["online", "IT", "service", "support", "technology"],
    "Purchasing IT Equipment and Software": ["purchasing", "IT", "equipment", "software", "acquisition"],
    "Data Storage": ["data", "storage", "cloud", "archive", "backup"],
    "Customer Relationship Management": ["customer", "relationship", "management", "CRM", "client"],
    "Virtual Environments and Systems Management": ["virtual", "environment", "system", "management", "infrastructure"],
    "Lecture and AV Support": ["lecture", "AV", "support", "audio", "visual"],
    "Backup and Service Continuity": ["backup", "service", "continuity", "recovery", "data"],
    "IT Asset Management and Licensing": ["asset", "management", "licensing", "IT", "inventory"],
}


#### 3. Generate Contextually Relevant Descriptions

We define a function that appends a synthetic sentence, containing a randomly chosen keyword, to the existing ticket descriptions. This sentence is generated by Faker and is related to the ticket's "Actual Service".


#### 4. Process the Dataset

Load the input CSV file, apply the description generation function to update ticket descriptions, and display the first few rows of the updated DataFrame to verify the changes.


In [1]:
# Function to prepend new, semi-random, contextually relevant descriptions
def generate_description(row):
    service = row['Actual Service']
    keywords = service_keywords.get(service, [])
    
    if keywords:
        # Use Faker's random_element method to select a key phrase
        key_phrase = fake.random_element(elements=keywords)
        # Prepend the new description to the existing 'Ticket Description'
        description = f"{key_phrase.capitalize()} issue: {fake.sentence()} {row['Ticket Description']}"
    else:
        description = f"General issue: {fake.sentence()} {row['Ticket Description']}"
    return description
df_emails = pd.read_csv('./raw_data/1.csv')

# Generate ticket descriptions
df_emails['Ticket Description'] = df_emails.apply(generate_description, axis=1)

# Display a random sample of 10 rows to verify the changes
print(df_emails.sample(n=10))


NameError: name 'pd' is not defined

#### 5. Save the Enhanced Dataset

Finally, we save the updated DataFrame to a new CSV file, completing the data enrichment process.


In [5]:

# Save the updated DataFrame to a new CSV file
df_emails.to_csv('./processed_data/enhanced.csv', index=False)

print("Updated dataset saved to 'enhanced.csv'.")

Updated dataset saved to 'enhanced.csv'.
