## Exploratory Data Analysis

### Load Libraries

In [2]:
import pandas as pd

### Dataset

In [3]:
tickets_df = pd.read_csv('../data/dataset-tickets-multi-lang-4-20k.csv')
tickets_df.head()

Unnamed: 0,subject,body,answer,type,queue,priority,language,tag_1,tag_2,tag_3,tag_4,tag_5,tag_6,tag_7,tag_8
0,Unvorhergesehener Absturz der Datenanalyse-Pla...,Die Datenanalyse-Plattform brach unerwartet ab...,Ich werde Ihnen bei der Lösung des Problems he...,Incident,General Inquiry,low,de,Crash,Technical,Bug,Hardware,Resolution,Outage,Documentation,
1,Customer Support Inquiry,Seeking information on digital strategies that...,We offer a variety of digital strategies and s...,Request,Customer Service,medium,en,Feedback,Sales,IT,Tech Support,,,,
2,Data Analytics for Investment,I am contacting you to request information on ...,I am here to assist you with data analytics to...,Request,Customer Service,medium,en,Technical,Product,Guidance,Documentation,Performance,Feature,,
3,Krankenhaus-Dienstleistung-Problem,Ein Medien-Daten-Sperrverhalten trat aufgrund ...,Zurück zur E-Mail-Beschwerde über den Sperrver...,Incident,Customer Service,high,de,Security,Breach,Login,Maintenance,Incident,Resolution,Feedback,
4,Security,"Dear Customer Support, I am reaching out to in...","Dear [name], we take the security of medical d...",Request,Customer Service,medium,en,Security,Customer,Compliance,Breach,Documentation,Guidance,,


We can appreciate that the dataset contains the next variables:
- `subject`: Subject of the customer's email.
- `body`: Body of the customer's email.
- `answer`: The response provided by the helpdesk agent.
- `type`: The type of ticket as picked by the agent (Incident, Request, Problem, Change).
- `queue`: Specifies the department to which the email ticket is routed (General Inquiry, Customer Service, Technical Support, IT Support, Product Support, Billing and Payments, Service Outages and Maintenance, Human Resources, Returns and Exchanges, Sales and Pre-Sales). 
- `priority`: Indicates the urgency and importance of the issue (low, medium, high).
- `language`: Indicates the language in which the email is written (de, en).
- `tag`: Tags/categories assigned to the ticket to further classify and identify common issues or topics, split into ten columns in the dataset (examples: "Product Support," "Technical Support," "Sales Inquiry").

In [10]:
tickets_df['tag_2'].unique()

array(['Technical', 'Sales', 'Product', 'Breach', 'Customer', 'Security',
       'Integration', 'Bug', 'Performance', 'Outage', 'Strategy',
       'Network', 'IT', 'Crash', 'Documentation', 'Compliance', 'Virus',
       'Payment', 'Disruption', 'Hardware', 'Tech Support', 'Software',
       'Feedback', 'Campaign', 'Feature', 'Account', 'Refund',
       'Specifications', 'Training', 'Marketing', 'Website', 'Inquiry',
       'Medical', 'Incident', 'Tool', 'Docker', 'Encryption', 'Dashboard',
       nan, 'Billing', 'Analytics', 'Backup', 'Guidance', 'Financial',
       'Sync', 'Maintenance', 'Collaboration', 'Investment',
       'Notification', 'System', 'CRM', 'Branding', 'Issue', 'Pricing',
       'Plugin', 'Alert', 'Connectivity', 'Tech', 'Hospital', 'Ecommerce',
       'Database', 'Data', 'Platform', 'Update', 'Access', 'Campaigns',
       'Healthcare', 'SocialMedia', 'Communication', 'Server', 'Project',
       'Engagement', 'Recovery', 'Employee', 'Upgrade', 'Patient',
       'Resol

In [7]:
tickets_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 15 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   subject   18539 non-null  object
 1   body      19998 non-null  object
 2   answer    19996 non-null  object
 3   type      20000 non-null  object
 4   queue     20000 non-null  object
 5   priority  20000 non-null  object
 6   language  20000 non-null  object
 7   tag_1     20000 non-null  object
 8   tag_2     19954 non-null  object
 9   tag_3     19905 non-null  object
 10  tag_4     18461 non-null  object
 11  tag_5     13091 non-null  object
 12  tag_6     7351 non-null   object
 13  tag_7     3928 non-null   object
 14  tag_8     1907 non-null   object
dtypes: object(15)
memory usage: 2.3+ MB


In [6]:
tickets_df[tickets_df['subject'].isnull()]

Unnamed: 0,subject,body,answer,type,queue,priority,language,tag_1,tag_2,tag_3,tag_4,tag_5,tag_6,tag_7,tag_8
19,,"To Whom It May Concern, I am contacting you to...","Dear <name>, We acknowledge your email regardi...",Incident,Product Support,high,en,Security,Breach,Technical,Outdated Software,Follow-Up,Guidance,Documentation,Reference Number
42,,"During the last deployment, users encountered ...",We are aware of the issue and the team is acti...,Problem,IT Support,high,en,Performance,Bug,IT,Tech Support,,,,
56,,"Dear Customer Support, the critical data analy...",We will assist with resolving the software com...,Incident,Technical Support,high,en,Bug,Performance,IT,Tech Support,,,,
82,,"Dear Customer Support, I am in need of your as...",I will assist you in securing your network and...,Problem,Billing and Payments,high,en,Security,Breach,Network,Guidance,Documentation,Incident,,
97,,Assistance needed for data breach in hospital ...,Please acknowledge the data breach issue with ...,Incident,Technical Support,high,en,Security,IT,Tech Support,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19954,,Looking for assistance in securing medical dat...,Can provide guidance on securing medical data ...,Request,Customer Service,medium,en,Security,Documentation,IT,Tech Support,,,,
19961,,Ein kritischer Fehler ist während der Projektm...,Wir haben eine kritische Fehlermeldung für die...,Problem,Product Support,medium,de,Bug,Disruption,Performance,IT,Tech Support,,,
19986,,"Dear Customer Support, our agency is facing co...","<name>, we are here to assist with the connect...",Incident,Technical Support,medium,en,Network,Disruption,Hardware,Performance,IT,Tech Support,,
19993,,Can you provide information on digital strateg...,I would be happy to discuss digital strategies...,Request,Billing and Payments,medium,en,Feedback,Sales,Lead,,,,,
