# Loading time series data into Pandas

This notebook demonstrates how to load a time series into a `pandas.DataFrame`.   There are a couple of extra steps we need to take when dealing with time series.

* parse dates and watch out for US and UK date format issues
* set the index of the `pandas.DataFrame` to be a `DateTimeIndex`
* Set the frequency of the `DateTimeIndex`

# Imports

In [5]:
import pandas as pd
pd.__version__

'1.0.3'

# Alcohol sales dataset

Let's download and open an example time series.  This particular one is alcohol sales and it looks like the figure below.  Its an interesting time series as the variation in it increases over time!

![image](images/alcohol_ts.png)


# Reading a CSV

To load a time series into `pandas` you need to use the `read_csv()` function. Some additional steps

* set `parse_dates=True` - this parses the dates
* set `index_col` to the name of the date column. In this example the column has a name called `DATE`
* Before you load the dataset check the date format.  If it is in UK day first format then set `dayfirst=True`
* After you have loaded the data set the frequency of the date time index
    * Daily frequency = 'D'
    * Monthly frequency = 'MS' (month start)


In [2]:
#read in data
url = 'https://raw.githubusercontent.com/hsma-master/hsma/master/12_forecasting/data/Alcohol_Sales.csv'
ts = pd.read_csv(url, 
                 parse_dates=True, 
                 index_col='DATE')

#set the frequency of the datetime index (monthly data)
ts.index.freq = 'MS'

In [3]:
ts.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 336 entries, 1992-01-01 to 2019-12-01
Freq: MS
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   sales   336 non-null    int64
dtypes: int64(1)
memory usage: 5.2 KB


In [4]:
ts.head(10)

Unnamed: 0_level_0,sales
DATE,Unnamed: 1_level_1
1992-01-01,3459
1992-02-01,3458
1992-03-01,4002
1992-04-01,4564
1992-05-01,4221
1992-06-01,4529
1992-07-01,4466
1992-08-01,4137
1992-09-01,4126
1992-10-01,4259
