# Time Series Data

Wikipedia definition of **Time Series**: 

            A time series is a series of data points indexed in time order. Most commonly, a
            time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of
            discrete-time data.

In [1]:
import numpy as np
import pandas as pd

For this example we will use my Apple Health data…

In [3]:
df = pd.read_csv('HealthData.csv')

df.head()

Unnamed: 0,timestamp,Distance (mi),Flights Climbed (count),Steps (count)
0,25-Dec-2014 11:00,0.012564,0.0,26.0
1,25-Dec-2014 12:00,0.005201,0.0,11.0
2,25-Dec-2014 13:00,0.023413,0.0,52.0
3,25-Dec-2014 14:00,0.032492,0.0,69.0
4,25-Dec-2014 15:00,0.002653,0.0,7.0


First, look at the data type of the columns…

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27435 entries, 0 to 27434
Data columns (total 4 columns):
timestamp                  27435 non-null object
Distance (mi)              27435 non-null float64
Flights Climbed (count)    27435 non-null float64
Steps (count)              27435 non-null float64
dtypes: float64(3), object(1)
memory usage: 857.4+ KB


### Create your DateTime Object

Notice that the data type of the 'Date' column is object. We will have to convert the data type of the column to datetime in order to use the special properties of the datetime data type…

In [5]:
df['timestamp'] = pd.to_datetime(df['timestamp'])

print(df.info())
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27435 entries, 0 to 27434
Data columns (total 4 columns):
timestamp                  27435 non-null datetime64[ns]
Distance (mi)              27435 non-null float64
Flights Climbed (count)    27435 non-null float64
Steps (count)              27435 non-null float64
dtypes: datetime64[ns](1), float64(3)
memory usage: 857.4 KB
None


Unnamed: 0,timestamp,Distance (mi),Flights Climbed (count),Steps (count)
0,2014-12-25 11:00:00,0.012564,0.0,26.0
1,2014-12-25 12:00:00,0.005201,0.0,11.0
2,2014-12-25 13:00:00,0.023413,0.0,52.0
3,2014-12-25 14:00:00,0.032492,0.0,69.0
4,2014-12-25 15:00:00,0.002653,0.0,7.0


Every date in the 'Date' column is now of data type **datetime64**. Datetime objects are in the format **yyyy-mm-dd hh:mm:ss** by default

### Make a DateTime Index

It is often useful (and common practice) to organize your time series data by making a datetime index.

In [6]:
df.set_index('timestamp', inplace = True)

df.head()

Unnamed: 0_level_0,Distance (mi),Flights Climbed (count),Steps (count)
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-12-25 11:00:00,0.012564,0.0,26.0
2014-12-25 12:00:00,0.005201,0.0,11.0
2014-12-25 13:00:00,0.023413,0.0,52.0
2014-12-25 14:00:00,0.032492,0.0,69.0
2014-12-25 15:00:00,0.002653,0.0,7.0


## DateTime Index Properties

### Extract the Year, Month, Day, Hour, Minute, Second, Day of the Week…

In [7]:
# Years
df.index.year

Int64Index([2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
            ...
            2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018],
           dtype='int64', name='timestamp', length=27435)

In [8]:
# Months
df.index.month

Int64Index([12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
            ...
             2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
           dtype='int64', name='timestamp', length=27435)

In [9]:
# Days
df.index.day

Int64Index([25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
            ...
            10, 10, 10, 10, 10, 10, 10, 10, 10, 10],
           dtype='int64', name='timestamp', length=27435)

### Groupby Year, Month, Day, Hour, Minute, Second, Day of the Week…

In [10]:
# by year
df.groupby(df.index.year).mean()

Unnamed: 0_level_0,Distance (mi),Flights Climbed (count),Steps (count)
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014,0.075968,0.242038,168.923567
2015,0.076026,0.220434,170.291096
2016,0.129324,0.514117,304.229819
2017,0.129073,0.802854,291.028797
2018,0.077536,0.645791,179.867561


In [11]:
# by month
df.groupby(df.index.month).mean()

Unnamed: 0_level_0,Distance (mi),Flights Climbed (count),Steps (count)
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,0.086009,0.424059,196.025459
2,0.111805,0.477974,251.185685
3,0.103234,0.397936,232.494977
4,0.126937,0.396759,292.187935
5,0.103049,0.432348,237.480177
6,0.101325,0.525463,232.674537
7,0.077562,0.396057,177.461022
8,0.110616,0.59095,259.130824
9,0.15542,0.774537,358.914896
10,0.134065,0.728495,305.605527


In [12]:
# by day
df.groupby(df.index.dayofweek).mean()

Unnamed: 0_level_0,Distance (mi),Flights Climbed (count),Steps (count)
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.107608,0.513548,249.043526
1,0.109002,0.506135,250.285089
2,0.1136,0.543967,259.312777
3,0.109513,0.510828,250.990825
4,0.122911,0.538618,279.750694
5,0.118371,0.523434,270.554661
6,0.089462,0.472904,204.240428
