# EDA PROCESS

## Preparing environment 

In [35]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [37]:
%reload_ext sql #TODO Delete

### Getting data from .env file

`If you want to repeat the process, please create you database and import csv file from raw_data folder`

In [44]:
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine

dotenv_path = ".env" #TODO Change to .env.example
load_dotenv()

DATABASE_URL = os.getenv("DATABASE_URL", " ")
print(DATABASE_URL)

engine = create_engine(DATABASE_URL)

postgresql://postgres:greenmars555@localhost/call_center_project


### Checking the dataset

In [45]:
%%sql

SELECT *
FROM Calls
LIMIT 5

 * postgresql://postgres:***@localhost/call_center_project
5 rows affected.


id,customer_name,sentiment,csat_score,reason,city,state,channel,response_time,duration_minutes,call_center,call_date
DKK-57076809-w-055481-fU,Analise Gairdner,Neutral,7.0,Billing Question,Detroit,Michigan,Call-Center,Within SLA,17,Los Angeles/CA,2020-10-29
QGK-72219678-w-102139-KY,Crichton Kidsley,Very Positive,,Service Outage,Spartanburg,South Carolina,Chatbot,Within SLA,23,Baltimore/MD,2020-10-05
GYJ-30025932-A-023015-LD,Averill Brundrett,Negative,,Billing Question,Gainesville,Florida,Call-Center,Above SLA,45,Los Angeles/CA,2020-10-04
ZJI-96807559-i-620008-m7,Noreen Lafflina,Very Negative,1.0,Billing Question,Portland,Oregon,Chatbot,Within SLA,12,Los Angeles/CA,2020-10-17
DDU-69451719-O-176482-Fm,Toma Van der Beken,Very Positive,,Payments,Fort Wayne,Indiana,Call-Center,Within SLA,23,Los Angeles/CA,2020-10-17


## EDA

### Table Shape

In [46]:
%%sql

SELECT 'Rows' AS category, COUNT(*) AS count
    FROM Calls

UNION ALL

SELECT 'Columns' AS category, COUNT(*) AS count
    FROM information_schema.columns
    WHERE table_name = 'calls';

 * postgresql://postgres:***@localhost/call_center_project
2 rows affected.


category,count
Rows,32941
Columns,12


### Distinct values

1. Only 4 call centers

In [49]:
%%sql

SELECT DISTINCT call_center FROM calls;

 * postgresql://postgres:***@localhost/call_center_project
4 rows affected.


call_center
Los Angeles/CA
Chicago/IL
Denver/CO
Baltimore/MD


2. There are 5 distinct sentiments

In [48]:
%%sql

SELECT DISTINCT sentiment FROM calls;

 * postgresql://postgres:***@localhost/call_center_project
5 rows affected.


sentiment
Negative
Positive
Very Negative
Neutral
Very Positive


3. Only 3 different reasons

In [51]:
%%sql

SELECT DISTINCT reason FROM calls;

 * postgresql://postgres:***@localhost/call_center_project
3 rows affected.


reason
Service Outage
Payments
Billing Question


4. And 4 distinct channels

In [52]:
%%sql

SELECT DISTINCT channel FROM calls;

 * postgresql://postgres:***@localhost/call_center_project
4 rows affected.


channel
Chatbot
Web
Email
Call-Center


### Percentages

`What are percentage of sentiments for each channel?`

Result data for every channel is similar - mostly negative-neutral-very_negative

In [71]:
%%sql

SELECT
    channel,
    sentiment,
    ROUND((COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY channel)), 2) AS pct
FROM
    calls
GROUP BY
    channel, sentiment
ORDER BY
    channel, pct DESC;

 * postgresql://postgres:***@localhost/call_center_project
20 rows affected.


channel,sentiment,pct
Call-Center,Negative,33.56
Call-Center,Neutral,26.18
Call-Center,Very Negative,18.64
Call-Center,Positive,11.7
Call-Center,Very Positive,9.93
Chatbot,Negative,33.15
Chatbot,Neutral,26.66
Chatbot,Very Negative,18.48
Chatbot,Positive,11.91
Chatbot,Very Positive,9.8


`Are there any uniqueness in channels based on the reason client contacted the center?`

Yes, payments questions are only resolved by call_center channel. And Service Outage issues are not considered in call_center channel.

In [72]:
%%sql

SELECT
    channel,
    reason,
    ROUND((COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY channel)), 2) AS pct
FROM
    calls
GROUP BY
    channel, reason
ORDER BY
    channel, pct DESC;

 * postgresql://postgres:***@localhost/call_center_project
8 rows affected.


channel,reason,pct
Call-Center,Billing Question,55.36
Call-Center,Payments,44.64
Chatbot,Billing Question,71.48
Chatbot,Service Outage,28.52
Email,Billing Question,79.0
Email,Service Outage,21.0
Web,Billing Question,87.74
Web,Service Outage,12.26


`Which day of the week is the most busy with calls?`

Friday, thursday the most busy; sunday is the least

In [73]:
%%sql

select to_char(call_date, 'Day' ) as day_of_call,
	round((count(*)*100.0)/(select count(*) from calls),2) as percentage
from calls
group by 1
order by 2 desc;

 * postgresql://postgres:***@localhost/call_center_project
7 rows affected.


day_of_call,percentage
Friday,16.91
Thursday,16.64
Wednesday,13.51
Tuesday,13.38
Saturday,13.37
Monday,13.16
Sunday,13.04


### AGGREGATIONS:

1. SCORE

In [46]:
%%sql

select min(csat_score) as min_score,
    max(csat_score) as max_score,
    round(avg(csat_score),1) as avg_score
from calls

 * postgresql://postgres:***@localhost/call_center_project
1 rows affected.


min_score,max_score,avg_score
1,10,5.5


In [47]:
%%sql 

select min(call_date) as earliest_date,
    max(call_date) as most_recent_date
from calls

 * postgresql://postgres:***@localhost/call_center_project
1 rows affected.


earliest_date,most_recent_date
2020-10-01,2020-10-31


In [48]:
%%sql

select min(duration_minutes) as min_call_duration,
    max(duration_minutes) as max_call_duration,
    round(avg(duration_minutes),1) as avg_call_duration
from calls

 * postgresql://postgres:***@localhost/call_center_project
1 rows affected.


min_call_duration,max_call_duration,avg_call_duration
5,45,25.0


In [49]:
%%sql

select call_center, response_time, count(*) as count 
from calls
group by call_center, response_time
order by call_center asc, count desc

 * postgresql://postgres:***@localhost/call_center_project
12 rows affected.


call_center,response_time,count
Baltimore/MD,Within SLA,6855
Baltimore/MD,Below SLA,2768
Baltimore/MD,Above SLA,1389
Chicago/IL,Within SLA,3361
Chicago/IL,Below SLA,1361
Chicago/IL,Above SLA,697
Denver/CO,Within SLA,1741
Denver/CO,Below SLA,692
Denver/CO,Above SLA,343
Los Angeles/CA,Within SLA,8668


In [51]:
%%sql

select call_center, avg(duration_minutes) as avg_duration
from calls
group by call_center
order by avg_duration desc

 * postgresql://postgres:***@localhost/call_center_project
4 rows affected.


call_center,avg_duration
Chicago/IL,25.06255766746632
Los Angeles/CA,25.053225571574195
Denver/CO,25.01657060518732
Baltimore/MD,24.961950599346167


In [56]:
%%sql

select channel, avg(duration_minutes) as avg_duration
from calls
group by channel
order by avg_duration desc

 * postgresql://postgres:***@localhost/call_center_project
4 rows affected.


channel,avg_duration
Email,25.098795180722888
Call-Center,25.046150954037035
Web,25.02235401459854
Chatbot,24.917756782945737


In [58]:
%%sql

select state, round(avg(duration_minutes), 2) as avg_duration
from calls
group by state
order by avg_duration desc

 * postgresql://postgres:***@localhost/call_center_project
51 rows affected.


state,avg_duration
Rhode Island,27.66
Delaware,26.59
Hawaii,26.21
Montana,26.09
South Dakota,26.08
Idaho,26.07
Illinois,26.04
Kansas,25.88
Minnesota,25.84
Michigan,25.73


In [59]:
%%sql

select state, count(*) as count
from calls
group by state
order by count desc

 * postgresql://postgres:***@localhost/call_center_project
51 rows affected.


state,count
California,3631
Texas,3572
Florida,2834
New York,1786
Virginia,1164
Ohio,1160
District of Columbia,1110
Pennsylvania,1017
Georgia,926
Illinois,848


In [60]:
%%sql

select state, reason, count(*) as count
from calls
group by state, reason
order by count, reason, count desc

 * postgresql://postgres:***@localhost/call_center_project
153 rows affected.


state,reason,count
Vermont,Service Outage,1
Wyoming,Service Outage,1
Maine,Service Outage,1
Rhode Island,Service Outage,2
Vermont,Payments,3
Maine,Payments,3
Wyoming,Payments,4
New Hampshire,Payments,5
Wyoming,Billing Question,6
Rhode Island,Payments,6


In [63]:
%%sql

select state, sentiment, count(*) as count
from calls
group by state, sentiment
order by count, count desc

 * postgresql://postgres:***@localhost/call_center_project
250 rows affected.


state,sentiment,count
Rhode Island,Very Positive,1
Maine,Very Positive,1
Vermont,Very Negative,2
Wyoming,Neutral,3
Maine,Very Negative,3
New Hampshire,Very Positive,3
Vermont,Positive,3
Vermont,Negative,4
Maine,Negative,4
North Dakota,Very Positive,4


In [65]:
%%sql

select state, round(avg(csat_score),2) as avg_score
from calls
group by state
order by avg_score desc

 * postgresql://postgres:***@localhost/call_center_project
51 rows affected.


state,avg_score
Vermont,6.5
North Dakota,6.37
Wyoming,6.0
Massachusetts,5.91
Rhode Island,5.89
Hawaii,5.87
Mississippi,5.83
Idaho,5.82
Washington,5.76
Louisiana,5.74


In [66]:
%%sql

select sentiment, round(avg(duration_minutes),2) as avg_duration
from calls
group by sentiment
order by avg_duration desc

 * postgresql://postgres:***@localhost/call_center_project
5 rows affected.


sentiment,avg_duration
Negative,25.26
Very Negative,24.94
Neutral,24.94
Positive,24.86
Very Positive,24.76
