# Time labels

Several services, e.g. Yahoo Finance, provides time series in which time labels are automatically in the index of the dataframe. But if this is not the case, manual work is needed to change the format of them and to move the time labels into the index.

### Time related information and reading it from a string

Time related information represented as a string (text) has to be transformed to time labels that Python can understand.

**Method 1**:

- Open the data and see the form of time related data.
- Transform time related information using the function **to_datetime** and place them to be the index.

**Method 2**:
- Change time related information usinf the function **strptime**.
- Place time related information to be the index in the beginning when you open the data.

When transforming time related information styling codes are needed. They can be found, e.g., at https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [6]:
### Import the libraries

import pandas as pd
import numpy as np

### Example 1 of method 1

In [7]:
### Open the data with time related information and view the form of time related data.

df1 = pd.read_excel('http://myy.haaga-helia.fi/~menetelmat/Data-analytiikka/Teaching/CO2_en.xlsx')
df1.head()

Unnamed: 0,Month,CO2
0,1999-12,368.04
1,2000-1,369.25
2,2000-2,369.5
3,2000-3,370.56
4,2000-4,371.82


In [8]:
### Place the time information to be the index
### The function to_datetime transforms strings to datetime information

df1.index = pd.to_datetime(df1['Month'], format = '%Y-%m')
df1 = df1.drop('Month', axis = 1)
df1.head()

Unnamed: 0_level_0,CO2
Month,Unnamed: 1_level_1
1999-12-01,368.04
2000-01-01,369.25
2000-02-01,369.5
2000-03-01,370.56
2000-04-01,371.82


### Example 1 of method 2

In [9]:
### A function that transforms string into datetime information

def parser(x):
    return pd.datetime.strptime(x, '%Y-%m')

### Datetime information is placed to be the index when the data is opened.
### index_col = 0 means that the first column of the data is defined to be the index

df2 = pd.read_excel('http://myy.haaga-helia.fi/~menetelmat/Data-analytiikka/Teaching/CO2_en.xlsx', index_col = 0, date_parser = parser)

df2.head()

Unnamed: 0_level_0,CO2
Month,Unnamed: 1_level_1
1999-12,368.04
2000-1,369.25
2000-2,369.5
2000-3,370.56
2000-4,371.82


### Example 2 of method 1

In [16]:
### Open the file with time related infomation and check the form of time related labels

df3 = pd.read_csv('http://myy.haaga-helia.fi/~menetelmat/Data-analytiikka/Teaching/Electric_Production.csv')
df3.head()

Unnamed: 0,DATE,IPG2211A2N
0,1/1/1985,72.5052
1,2/1/1985,70.672
2,3/1/1985,62.4502
3,4/1/1985,57.4714
4,5/1/1985,55.3151


In [17]:
### Place the time related information to be the index
### The function to_datetime transforms a string into a time related information

df3.index = pd.to_datetime(df3['DATE'], format = '%m/%d/%Y')
df3 = df3.drop('DATE', axis = 1)
df3.head()

Unnamed: 0_level_0,IPG2211A2N
DATE,Unnamed: 1_level_1
1985-01-01,72.5052
1985-02-01,70.672
1985-03-01,62.4502
1985-04-01,57.4714
1985-05-01,55.3151


### Example 2 of method 2

In [19]:
### A function that transforms string into time related information.

def parser(x):
    return pd.datetime.strptime(x, '%m/%d/%Y')

### Time related information is placed to the index when the data is opened.

df4 = pd.read_csv('http://myy.haaga-helia.fi/~menetelmat/Data-analytiikka/Teaching/Electric_Production.csv', index_col = 0, date_parser =parser)

df4.head()

  after removing the cwd from sys.path.


Unnamed: 0_level_0,IPG2211A2N
DATE,Unnamed: 1_level_1
1985-01-01,72.5052
1985-02-01,70.672
1985-03-01,62.4502
1985-04-01,57.4714
1985-05-01,55.3151


## Creating time labels

A series of time labels can be created using the function **date_range**. The function needs exactly three of the following parameters: start, end, periods, freq. More on this can be read at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html. 

Possible values for the parameter **freq** can be found at https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.

In [20]:
### Numbers in a time series

series = [500, 350, 250, 400, 450, 350, 200, 300, 350, 200, 150, 400, 550, 350, 250, 550, 400, 350, 600, 750, 500, 400, 650, 850]

### Creating time related information quarter annually (Q)

index = pd.date_range(start = '2013-03-31', periods =len(series), freq = 'Q')

### Create a dataframe

df5 = pd.DataFrame(series, index = index)

### Header for numbers in time series

df5.columns = ['Demand']

df5

Unnamed: 0,Demand
2013-03-31,500
2013-06-30,350
2013-09-30,250
2013-12-31,400
2014-03-31,450
2014-06-30,350
2014-09-30,200
2014-12-31,300
2015-03-31,350
2015-06-30,200


Source and origin of inspiration:<br /> 
Aki Taanila: Data-analytiikka Pythonilla: <a href="https://tilastoapu.wordpress.com/python/">https://tilastoapu.wordpress.com/python/</a>