## Loading Data from CSV file

- The Pandas library in python provides excellent built-in support for time series data
- Pandas represents time series datasets as a Series.
- A Series is a one-dimensional array with a time label for each row
- A Dataframe is a collection of series

In [2]:
import pandas as pd

In [3]:
# Loading csv files from our project folder

dataframe = pd.read_csv('/Users/mac/Documents/dataframe1.csv', header= 0)

#### First five records

In [4]:
dataframe.head(5)

Unnamed: 0,DATE,H1
0,11/1/19,16.0
1,11/2/19,21.0
2,11/3/19,17.0
3,11/4/19,12.0
4,11/5/19,26.0


### Data Type

In [5]:
#checking the data type of date column
#dtype('O') is an object or string
dataframe['DATE'].dtype

dtype('O')

### Loading Data with parse_dates

- using parse_dates=[col_index] e.g parse_dates=[0] for first column

In [21]:
df = pd.read_csv('/Users/mac/Documents/dataframe1.csv',header=0, parse_dates=[0])

### First five records

In [22]:
df.head(10)

Unnamed: 0,DATE,H1
0,2019-11-01,16.0
1,2019-11-02,21.0
2,2019-11-03,17.0
3,2019-11-04,12.0
4,2019-11-05,26.0
5,2019-11-06,24.0
6,2019-11-07,25.0
7,2019-11-08,25.0
8,2019-11-09,0.0
9,2019-11-10,26.0


### Data Type

In [23]:
df['DATE'].dtype

dtype('<M8[ns]')

### Loading Data as Series

In [9]:
ds = pd.read_csv('/Users/mac/Documents/dataframe1.csv',header=0,parse_dates=[0],index_col=0,squeeze=True)

In [10]:
ds.head(10)

DATE
2019-11-01    16.0
2019-11-02    21.0
2019-11-03    17.0
2019-11-04    12.0
2019-11-05    26.0
2019-11-06    24.0
2019-11-07    25.0
2019-11-08    25.0
2019-11-09     0.0
2019-11-10    26.0
Name: H1, dtype: float64

## Exploring Time Series Data

- Size

In [11]:
ds.shape

(121,)

In [12]:
df.shape

(121, 1)

### Querying by time

In [14]:
print(ds['2019-11'])

DATE
2019-11-01    16.0
2019-11-02    21.0
2019-11-03    17.0
2019-11-04    12.0
2019-11-05    26.0
2019-11-06    24.0
2019-11-07    25.0
2019-11-08    25.0
2019-11-09     0.0
2019-11-10    26.0
2019-11-11    20.0
2019-11-12    27.0
2019-11-13    29.0
2019-11-14    27.0
2019-11-15    29.0
2019-11-16    31.0
2019-11-17    25.0
2019-11-18    26.7
2019-11-19    15.0
2019-11-20    18.0
2019-11-21    23.0
2019-11-22    29.0
2019-11-23    24.0
2019-11-24    17.0
2019-11-25    25.0
2019-11-26    35.0
2019-11-27    25.0
2019-11-28    27.0
2019-11-29    25.5
2019-11-30    17.5
Name: H1, dtype: float64


In [24]:
df[(df['DATE'] > '2019-11-15') & (df['DATE'] <= '2020-02-01')]

Unnamed: 0,DATE,H1
15,2019-11-16,31.0
16,2019-11-17,25.0
17,2019-11-18,26.7
18,2019-11-19,15.0
19,2019-11-20,18.0
...,...,...
88,2020-01-28,26.0
89,2020-01-29,31.0
90,2020-01-30,22.6
91,2020-01-31,27.0


## Descriptive Statistics

In [26]:
ds.describe()

count    121.000000
mean      23.019835
std        7.118481
min        0.000000
25%       19.000000
50%       24.000000
75%       27.000000
max       36.000000
Name: H1, dtype: float64

In [27]:
df.describe()

Unnamed: 0,H1
count,121.0
mean,23.019835
std,7.118481
min,0.0
25%,19.0
50%,24.0
75%,27.0
max,36.0
