# Loading time series data into Pandas

This notebook demonstrates how to load a time series into a `pandas.DataFrame`.   There are a couple of extra steps we need to take when dealing with time series.

* parse dates and watch out for US and UK date format issues
* set the index of the `pandas.DataFrame` to be a `DateTimeIndex`
* Set the frequency of the `DateTimeIndex`

# Imports

Ideally for this module you should have a version of pandas around version 1.2. 

In [None]:
import pandas as pd
pd.__version__

# Alcohol sales dataset

Let's download and open an example time series.  This particular one is alcohol sales and it looks like the figure below.  Its an interesting time series as the variation in it increases over time!

![image](images/alcohol_ts.png)


# Reading a CSV

To load a time series into `pandas` you need to use the `read_csv()` function. Some additional steps

* set `parse_dates=True` - this parses the dates
* set `index_col` to the name of the date column. In this example the column has a name called `DATE`
* Before you load the dataset check the date format.  If it is in UK day first format then set `dayfirst=True`
* After you have loaded the data set the frequency of the date time index
    * Daily frequency = 'D'
    * Monthly frequency = 'MS' (month start)


In [None]:
#read in data
url = 'https://raw.githubusercontent.com/hsma4/module_9_a/main/' \
        + 'data/Alcohol_Sales.csv'
ts = pd.read_csv(url, 
                 parse_dates=True, 
                 index_col='DATE')

#set the frequency of the datetime index (monthly data)
ts.index.freq = 'MS'

In [None]:
ts.info()

In [None]:
ts.head(10)