### Notebook Purpose 

As of 03-22-18, the goal of this notebook is to use turn this time series dataset into a 3d tuple ready for CNN or RNNs.
First, let's try to aggregate some time intervals into matrices. For example, the data is hourly so we can aggregate data from each week into a matrix and then stack the matrices, creating a 3d matrix 

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline

In [2]:
data_raw = pd.read_csv("train.csv")

In [3]:
data_raw.shape

(26496, 8)

In [4]:
data_raw.head()

Unnamed: 0,ID,datetime,temperature,var1,pressure,windspeed,var2,electricity_consumption
0,0,2013-07-01 00:00:00,-11.4,-17.1,1003.0,571.91,A,216.0
1,1,2013-07-01 01:00:00,-12.1,-19.3,996.0,575.04,A,210.0
2,2,2013-07-01 02:00:00,-12.9,-20.0,1000.0,578.435,A,225.0
3,3,2013-07-01 03:00:00,-11.4,-17.1,995.0,582.58,A,216.0
4,4,2013-07-01 04:00:00,-11.4,-19.3,1005.0,586.6,A,222.0


In [5]:
temp_dummies = pd.get_dummies(data_raw.var2).applymap(np.int)

In [6]:
data = pd.concat([data_raw, temp_dummies], axis = 1)
data = data.drop(["var2"], axis = 1)

In [7]:
data.shape

(26496, 10)

In [8]:
data.head()

Unnamed: 0,ID,datetime,temperature,var1,pressure,windspeed,electricity_consumption,A,B,C
0,0,2013-07-01 00:00:00,-11.4,-17.1,1003.0,571.91,216.0,1,0,0
1,1,2013-07-01 01:00:00,-12.1,-19.3,996.0,575.04,210.0,1,0,0
2,2,2013-07-01 02:00:00,-12.9,-20.0,1000.0,578.435,225.0,1,0,0
3,3,2013-07-01 03:00:00,-11.4,-17.1,995.0,582.58,216.0,1,0,0
4,4,2013-07-01 04:00:00,-11.4,-19.3,1005.0,586.6,222.0,1,0,0


For more details look at previous notebook -- Analytics Vidhya 01-12-17

### Begin aggregations

In [9]:
data.index = pd.to_datetime(data.datetime)

In [10]:
data_agg_prep = data.drop(["datetime"], axis =1)
# dropping var2 because it 

In [11]:
weekly_data_gb = data.groupby(pd.Grouper(freq = 'W'))

In [13]:
len(weekly_data_gb)

208

In [26]:
weekly_data_gb.apply(np.mean).head(20)

datetime                           
2013-07-07  ID                           83.500000
            temperature                  -7.747024
            var1                        -15.971429
            pressure                   1002.321429
            windspeed                   114.627440
            electricity_consumption     252.875000
            A                             1.000000
            B                             0.000000
            C                             0.000000
2013-07-14  ID                          251.500000
            temperature                  -7.352976
            var1                        -17.711905
            pressure                    997.857143
            windspeed                    85.427292
            electricity_consumption     245.160714
            A                             1.000000
            B                             0.000000
            C                             0.000000
2013-07-21  ID                          419.50

In [27]:
weekly_data_gb.size().head()

datetime
2013-07-07    168
2013-07-14    168
2013-07-21    168
2013-07-28     48
2013-08-04     96
Freq: W-SUN, dtype: int64

In [29]:
name_group_dict = {}
for name, group in weekly_data_gb:
    name_group_dict[name] = group

In [39]:
[type(key) for key in name_group_dict.keys()][0]

pandas._libs.tslib.Timestamp

In [60]:
print("Extra datapoints: ", len(data)%168)
print("Numberof full weeks: ", len(data)/168)

Extra datapoints:  120
Numberof full weeks:  157.71428571428572


In [61]:
data2 = data[60:-60].as_matrix()

In [62]:
print("Original shape: ",data.shape)
print("New shape: ", data2.shape)

Original shape:  (26496, 10)
New shape:  (26376, 10)


In [63]:
data2.shape[0]%168

0

In [65]:
data2.shape[0]/168

157.0

In [67]:
agg = data2.reshape(157,168,10)

In [69]:
len(agg)

157

In [70]:
agg[0]

array([[60, '2013-07-03 12:00:00', -5.0, ..., 1, 0, 0],
       [61, '2013-07-03 13:00:00', -5.7, ..., 1, 0, 0],
       [62, '2013-07-03 14:00:00', -3.6, ..., 1, 0, 0],
       ..., 
       [225, '2013-07-10 09:00:00', -10.0, ..., 1, 0, 0],
       [226, '2013-07-10 10:00:00', -7.9, ..., 1, 0, 0],
       [227, '2013-07-10 11:00:00', -5.7, ..., 1, 0, 0]], dtype=object)