# Customer Support ticket Dataset

Here we will load a dataset containing customer support Tickets. There are ca. 8500 customer tickets, and each tickets as values such as customer details, purchased product, description, ticket types and various ticket labels. Find the dataset [here](https://www.kaggle.com/datasets/suraj520/customer-support-ticket-dataset).


## Prerequisites

Prior to using this notebook the following steps need to be completed:
1. [Configure the AI-Lab](../main_config.ipynb).

## Setup

### Open Secure Configuration Storage

In [8]:
%run ../utils/access_store_ui.ipynb
display(get_access_store_ui('../'))

Output()

Box(children=(Box(children=(Label(value='Configuration Store', layout=Layout(border_bottom='solid 1px', border…

## Download data

We will access the dataset on [Kaggle](https://www.kaggle.com/) using [kagglehub](https://github.com/Kaggle/kagglehub), and load it into a pandas dataframe.

In [44]:
!pip install kagglehub[pandas-datasets]
import re
import kagglehub
from kagglehub import KaggleDatasetAdapter

# Download latest version
df = kagglehub.dataset_load(
    KaggleDatasetAdapter.PANDAS, 
    "suraj520/customer-support-ticket-dataset", 
    "customer_support_tickets.csv"
)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [48]:

# replace {product purchased} placeholder in the ticket description with the product name
pattern = re.compile(
    r'\{\w*(product|purchase|name|item)\w*\}', 
    re.IGNORECASE
)
df['Ticket Description'] = df.apply(
    lambda row: pattern.sub(row['Product Purchased'], row['Ticket Description']),
    axis=1
)



Now we are ready to upload the data to the Exasol Database:

In [49]:
from exasol.nb_connector.connections import open_pyexasol_connection

with open_pyexasol_connection(ai_lab_config, compression=True) as conn:
    conn.execute(f"""
    CREATE OR REPLACE TABLE "{ai_lab_config.db_schema}"."CUSTOMER_SUPPORT_TICKETS" (
        "TICKET_ID" INTEGER,
        "CUSTOMER_NAME" VARCHAR(2000000),
        "CUSTOMER_EMAIL" VARCHAR(2000000),
        "CUSTOMER_AGE" INTEGER,
        "CUSTOMER_GENDER" VARCHAR(2000000),
        "PRODUCT_PURCHASED" VARCHAR(2000000),
        "DATE_OF_PURCHASE" VARCHAR(2000000),
        "TICKET_TYPE" VARCHAR(2000000),
        "TICKET_SUBJECT" VARCHAR(2000000),
        "TICKET_DESCRIPTION" VARCHAR(2000000),
        "TICKET_STATUS" VARCHAR(2000000),
        "RESOLUTION" VARCHAR(2000000),
        "TICKET_PRIORITY" VARCHAR(2000000),
        "TICKET_CHANNEL" VARCHAR(2000000),
        "FIRST_RESPONSE_TIME" VARCHAR(2000000),
        "TIME_TO_RESOLUTION" VARCHAR(2000000),
        "CUSTOMER_SATISFACTION_RATING" VARCHAR(2000000)
    );
    """)
    conn.import_from_pandas(df, table=(ai_lab_config.db_schema,"CUSTOMER_SUPPORT_TICKETS"))
