# Exercise: Reading TTL Data

## Step 1: Read in __`agg_application_pod_hourly.csv`__ 
* as with the previous TTL data we examined, there are no column headers in the dataset
* if you get a __`DtypeWarning`__ about mixed types, figure out what's going on and how to fix

In [5]:
import pandas as pd
data = pd.read_csv('data/agg_application_pod_hourly.csv', header=None, na_values="\N")

## Step 2: Inspect the data...

In [6]:
data.head(n=10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,7,20170121,LON,SP9,cs86,37.9803,7.938677,5.929535,0.79923624,\N,2017-02-27 20:38:10.59641,\N,\N
1,7,20170121,WAS,SP2,na4,69.257324,14.353811,11.269566,0.910468,\N,2017-02-27 20:38:10.59641,\N,\N
2,7,20170121,CHI,SP1,gs0,33.34111,17.466251,10.286361,1.25974,\N,2017-02-27 20:38:10.59641,\N,\N
3,7,20170121,CHI,SP3,cs23,62.84957,13.417571,9.052662,0.9806848,\N,2017-02-27 20:38:10.59641,\N,\N
4,7,20170121,LON,SP9,cs87,34.985798,7.71101,5.9022226,0.5106573,\N,2017-02-27 20:38:10.59641,\N,\N
5,7,20170121,CHI,SP3,na5,66.87769,13.307458,9.693862,0.75365007,\N,2017-02-27 20:38:10.59641,\N,\N
6,7,20170121,FRF,SP1,cs85,37.122364,8.655704,5.9495234,0.40772247,\N,2017-02-27 20:38:10.59641,\N,\N
7,7,20170121,DFW,SP1,gs1,29.801567,5.2996435,1.6142554,0.19939896,\N,2017-02-27 20:38:10.59641,\N,\N
8,7,20170121,CHI,SP3,na20,66.31277,15.303969,11.806783,0.80389035,\N,2017-02-27 20:38:10.59641,\N,\N
9,7,20170121,CHI,SP4,na29,35.65827,8.088383,5.58664,0.46791855,\N,2017-02-27 20:38:10.59641,\N,\N


## Step 3: Set the names of the columns to 
__`hour_key, date_key, datacenter, superpod, pod, mem_utilization, max_app_cpu, avg_app_cpu, gc_perc, p95_app_cpu, last_modified, app_host_count_active, app_transacting_host_count`__
* Remember your Python–you can use __`split()`__ to make this easier

In [11]:
spltd = "hour_key, date_key, datacenter, superpod, pod, mem_utilization, max_app_cpu, avg_app_cpu, gc_perc, p95_app_cpu, last_modified, app_host_count_active, app_transacting_host_count".split(", ")
data.columns=spltd
data

Unnamed: 0,hour_key,date_key,datacenter,superpod,pod,mem_utilization,max_app_cpu,avg_app_cpu,gc_perc,p95_app_cpu,last_modified,app_host_count_active,app_transacting_host_count
0,7,20170121,LON,SP9,cs86,37.9803,7.938677,5.929535,0.79923624,\N,2017-02-27 20:38:10.59641,\N,\N
1,7,20170121,WAS,SP2,na4,69.257324,14.353811,11.269566,0.910468,\N,2017-02-27 20:38:10.59641,\N,\N
2,7,20170121,CHI,SP1,gs0,33.34111,17.466251,10.286361,1.25974,\N,2017-02-27 20:38:10.59641,\N,\N
3,7,20170121,CHI,SP3,cs23,62.84957,13.417571,9.052662,0.9806848,\N,2017-02-27 20:38:10.59641,\N,\N
4,7,20170121,LON,SP9,cs87,34.985798,7.71101,5.9022226,0.5106573,\N,2017-02-27 20:38:10.59641,\N,\N
5,7,20170121,CHI,SP3,na5,66.87769,13.307458,9.693862,0.75365007,\N,2017-02-27 20:38:10.59641,\N,\N
6,7,20170121,FRF,SP1,cs85,37.122364,8.655704,5.9495234,0.40772247,\N,2017-02-27 20:38:10.59641,\N,\N
7,7,20170121,DFW,SP1,gs1,29.801567,5.2996435,1.6142554,0.19939896,\N,2017-02-27 20:38:10.59641,\N,\N
8,7,20170121,CHI,SP3,na20,66.31277,15.303969,11.806783,0.80389035,\N,2017-02-27 20:38:10.59641,\N,\N
9,7,20170121,CHI,SP4,na29,35.65827,8.088383,5.58664,0.46791855,\N,2017-02-27 20:38:10.59641,\N,\N


## Step 4: inspect the column __`max_app_cpu`__

In [12]:
data['max_app_cpu'].dropna()

0           7.938677
1          14.353811
2          17.466251
3          13.417571
4            7.71101
5          13.307458
6           8.655704
7          5.2996435
8          15.303969
9           8.088383
10         13.510607
11           8.90862
12         11.102876
13          8.172059
14         12.451587
15         17.873167
16         12.917731
17         11.783527
18          8.032459
19         14.672307
20          10.86504
21          9.293353
22          8.562451
23         3.5422146
24         14.666316
25         11.689043
26         10.150902
27          8.559204
28         21.759249
29         6.8892612
             ...    
1265886      3.55403
1265887      4.67057
1265888      6.33014
1265889      45.5736
1265890      23.4733
1265891      12.0349
1265892       22.708
1265893       8.2013
1265894      49.0949
1265895      5.20643
1265896      7.81769
1265897      4.09358
1265898       6.0503
1265899      29.5138
1265900      8.61067
1265901      7.08207
1265902      

## Step 5: drop the missing data

In [None]:
.dropna()