# Exercise: Reading TTL Data

## Read in __`agg_application_pod_hourly.csv`__ 
* as with the previous TTL data we examined, there are no column headers in the dataset
* if you get a __`DtypeWarning`__ about mixed types, figure out what's going on and how to fix

In [1]:
import pandas as pd
# We will get a DtypeWarning here if you forget to set na_values.
# In that case, we will have columns which have numeric values mixed
# with string values ('\N')
data = pd.read_csv('data/agg_application_pod_hourly.csv', header=None, na_values=[r'\N'])

## Inspect the data...

In [65]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,7,20170121,LON,SP9,cs86,37.9803,7.938677,5.929535,0.799236,,2017-02-27 20:38:10.59641,,
1,7,20170121,WAS,SP2,na4,69.257324,14.353811,11.269566,0.910468,,2017-02-27 20:38:10.59641,,
2,7,20170121,CHI,SP1,gs0,33.34111,17.466251,10.286361,1.25974,,2017-02-27 20:38:10.59641,,
3,7,20170121,CHI,SP3,cs23,62.84957,13.417571,9.052662,0.980685,,2017-02-27 20:38:10.59641,,
4,7,20170121,LON,SP9,cs87,34.985798,7.71101,5.902223,0.510657,,2017-02-27 20:38:10.59641,,


## Set the names of the columns to 
__`hour_key, date_key, datacenter, superpod, pod, mem_utilization, max_app_cpu, avg_app_cpu, gc_perc, p95_app_cpu, last_modified, app_host_count_active, app_transacting_host_count`__
* Remember your Python–you can use __`split()`__ to make this easier

In [66]:
data.columns = 'hour_key, date_key, datacenter, superpod, pod, mem_utilization, max_app_cpu, avg_app_cpu, gc_perc, p95_app_cpu, last_modified, app_host_count_active, app_transacting_host_count'.split(', ')
data.head()

Unnamed: 0,hour_key,date_key,datacenter,superpod,pod,mem_utilization,max_app_cpu,avg_app_cpu,gc_perc,p95_app_cpu,last_modified,app_host_count_active,app_transacting_host_count
0,7,20170121,LON,SP9,cs86,37.9803,7.938677,5.929535,0.799236,,2017-02-27 20:38:10.59641,,
1,7,20170121,WAS,SP2,na4,69.257324,14.353811,11.269566,0.910468,,2017-02-27 20:38:10.59641,,
2,7,20170121,CHI,SP1,gs0,33.34111,17.466251,10.286361,1.25974,,2017-02-27 20:38:10.59641,,
3,7,20170121,CHI,SP3,cs23,62.84957,13.417571,9.052662,0.980685,,2017-02-27 20:38:10.59641,,
4,7,20170121,LON,SP9,cs87,34.985798,7.71101,5.902223,0.510657,,2017-02-27 20:38:10.59641,,


## Step 4: inspect the column __`max_app_cpu`__

In [75]:
data['max_app_cpu']

0           7.938677
1          14.353811
2          17.466251
3          13.417571
4           7.711010
5          13.307458
6           8.655704
7           5.299644
8          15.303969
9           8.088383
10         13.510607
11          8.908620
12         11.102876
13          8.172059
14         12.451587
15         17.873167
16         12.917731
17         11.783527
18          8.032459
19         14.672307
20         10.865040
21          9.293353
22          8.562451
23          3.542215
24         14.666316
25         11.689043
26         10.150902
27          8.559204
28         21.759249
29          6.889261
             ...    
1265886     3.554025
1265887     4.670567
1265888     6.330143
1265889    45.573600
1265890    23.473309
1265891    12.034917
1265892    22.707987
1265893     8.201299
1265894    49.094920
1265895     5.206431
1265896     7.817691
1265897     4.093576
1265898     6.050304
1265899    29.513842
1265900     8.610672
1265901     7.082071
1265902     8

## Step 5: drop the missing data

In [76]:
data['max_app_cpu'].dropna()

0           7.938677
1          14.353811
2          17.466251
3          13.417571
4           7.711010
5          13.307458
6           8.655704
7           5.299644
8          15.303969
9           8.088383
10         13.510607
11          8.908620
12         11.102876
13          8.172059
14         12.451587
15         17.873167
16         12.917731
17         11.783527
18          8.032459
19         14.672307
20         10.865040
21          9.293353
22          8.562451
23          3.542215
24         14.666316
25         11.689043
26         10.150902
27          8.559204
28         21.759249
29          6.889261
             ...    
1265886     3.554025
1265887     4.670567
1265888     6.330143
1265889    45.573600
1265890    23.473309
1265891    12.034917
1265892    22.707987
1265893     8.201299
1265894    49.094920
1265895     5.206431
1265896     7.817691
1265897     4.093576
1265898     6.050304
1265899    29.513842
1265900     8.610672
1265901     7.082071
1265902     8