# Time series in pandas

In [None]:
import pandas as pd
%matplotlib inline

## Dates and times in Python (and Pandas)

Pythons Datetime classes:
- **datetime**  – Allows us to manipulate times and dates together (month, day, year, hour, second, microsecond)
- **date**      – Allows us to manipulate dates independent of time (month, day, year)
- **time**      – Allows us to manipulate time independent of date (hour, minute, second, microsecond). 

Get actual date:

In [None]:
from datetime import datetime
from datetime import date

today = date.today()
print(today,type(today))

now = datetime.now()
print(now, type(now))

Get more specific informations from date/datetime:

In [None]:
# datetime.date class
print('year:', today.year)
print('month:', today.month)
print('day:', today.day)

# datetime.datetime class
print('hour:', now.hour)
print('minute:', now.minute)
print('seconds:', now.second)
# .weekday returns integers from 0(Mon) - 6(Sun)
print('weekday:', now.weekday())

Alternatively, we can use the `strftime()` method, which converts a datetime object containing current date and time to different string formats: 

In [None]:
print('year:', now.strftime("%Y"))
print('month:', now.strftime("%m"))
print('day:', now.strftime("%d"))
print('hour:', now.strftime("%H"))
print('minute:', now.strftime("%M"))
print('seconds:',now.strftime("%S"))
print('weekday:', now.strftime("%A"))
print(now.strftime("Today is %m/%d/%Y, %H:%M:%S"))

We can use pandas to create a `datetime` object:

In [None]:
pd.to_datetime("31th of October, 2019")

In [None]:
pd.to_datetime('31/10/2019')

In [None]:
pd.to_datetime('19.11.2019,12:01:34')

#### Frequencies
When the data points of a time series are uniformly spaced in time (e.g., hourly, daily, monthly, etc.), the time series can be associated with a frequency in pandas. For example, let’s use the `date_range()` function to create a sequence of uniformly spaced dates from 1998-03-10 through 1998-03-15 at daily frequency.

In [None]:
pd.date_range('1998-03-10', '1998-03-15', freq='D')

The resulting DatetimeIndex has an attribute freq with a value of 'D', indicating daily frequency. Available frequencies in pandas include per minute ('T), hourly ('H'), calendar daily ('D'), business daily ('B'), weekly ('W'), monthly ('M'), quarterly ('Q'), annual ('A'). Frequencies can also be specified as multiples of any of the base frequencies, for example '5D' for every five days.

In [None]:
pd.date_range('2004-09-20', periods=8, freq='H')

## Exercise 1
Use the datetime class to find out the weekday of your birth. Print it out using the `strftime()` function. Print out a proper sentence: 

e.g. `I was born in Rüdersdorf on January 15th, 1991. This was a Tuesday.`

In [None]:
# put your code here

## Exercise 2
Print our all leap years from your birth until today. You can use the following rules: 

* Years are leap years, if their year is divisible by 400.
* Years are leap years, if their year is divisible by 4, except the year is divisible by 100.

In [None]:
# put your code here

## Data from the weather station in Oetztal
Reading the file:

In [None]:
import pandas as pd
df = pd.read_csv('../data/Oetztal.dat')
df.head(5)

The table looks a little bit weird, right? Can you access the column "RECORD"? No? What could be the problem?

Try out: 

In [None]:
pd.read_csv('../data/Oetztal.dat', header=1).head(5)

Looks better, but not perfect! Use:

In [None]:
df = pd.read_csv('../data/Oetztal.dat',skiprows=[0,2,3])
df.head(5)

In [None]:
print(df.columns)

In [None]:
# tell python which column is the index
df = df.set_index('TIMESTAMP')
df.head(5)

In [None]:
df.AirTC_Avg.plot()

In [None]:
print(df.index)
print(df.index.year)

To make use of the datetime attributes, we need to convert the `Index` to an `DatetimeIndex`:

In [None]:
df.index = pd.DatetimeIndex(df.index)
df.index

We can see that it has no frequency `(freq=None)`. This makes sense, since the index was created from a sequence of dates in our CSV file, without explicitly specifying any frequency for the time series.

If we know that our data should be at a specific frequency, we can use the DataFrame’s `.asfreq()` method to assign a frequency. If any date/times are missing in the data, new rows will be added for those date/times, which are either empty (NaN), or filled according to a specified data filling method such as forward filling or interpolation.

In [None]:
df = df.asfreq('10T')
df.head()

In [None]:
# Now, we repeat the plotting command
df.AirTC_Avg.plot()

### Acces data by Indices

#### 1. Access group of rows and columns by integer position(s)

In [None]:
# get the first element
df.iloc[0]

In [None]:
# get last element
df.iloc[-1]

In [None]:
df.iloc[10:25]

#### 2. Access a group of rows and columns by label(s) 

In [None]:
# from specific time 
df.loc['2019-09-02 22:50:00']

In [None]:
# from date1 to date2
df.loc['2019-09-02 12:0:00' : '2019-09-02 14:00:00']


In [None]:
# all data from a specific date
df.loc['2019-09-02']
 
# or from a specific month
df.loc['2019-09'].head()

In [None]:
# all rows, but only one specific column
df.loc[:,'AirTC_Avg']

# the same can be done by
df.AirTC_Avg

In [None]:
# all temperature values from 1st Sep to 4th Sep
df.loc['2019-09-01':'2019-09-04','AirTC_Avg']

In [None]:
# all rows that fulfill a specific condition (here all Mondays)
df.loc[df.index.weekday==0].head()