<!--NAVIGATION-->
< [Group and Pivot Functions](05-group-and-pivot-functions.ipynb) | [Contents](Index.ipynb) | [Visualisation](07-visualisation.ipynb) >

## Apply and TimeStamp

Here you will learn about apply and delta operations for pandas

To be able to test the functions, lets import a dataframe from dwh

In [None]:
import pandas as pd

df = pd.read_csv('../Data/airlineDT.csv', sep=',')
df.head(5)

### Apply

You can apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=1) or the DataFrame’s columns (axis=0).

In [None]:
df.head(5)

By selecting a dataset and using **apply**, you can define what to apply. Let's say we want to apply a square root function for every value or a more complicated numerical formulas. Numpy support a rich mathematical formulas for the values alone which we will going to use it.

In [None]:
df['ARR_TIME']

In [None]:
import numpy as np
df['ARR_TIME_SQRT'] = df['ARR_TIME'].apply(np.sqrt)

df[['ARR_TIME','ARR_TIME_SQRT']].head(10)

You can notice that out dataframe stays the same, only all the values have been modified by root square.

Lambda can be a powerful tool to create a custom function to instantly apply for every value.

In [None]:
df['ARR_TIME_LAMBDA'] = df['ARR_TIME'].apply(lambda x: x ** 2)

df[['ARR_TIME_LAMBDA','ARR_TIME']].head(10)

In [None]:
df['ARR_TIME_LAMBDA_2'] = df['ARR_TIME'].apply(lambda x: (x + 1)**2)

df[['ARR_TIME_LAMBDA_2','ARR_TIME']].head(10)

You can notice that for every value, x: function() was applied. Reference:
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

### TimeStamp

TimeStamp has a format of our datetime which we can explore more

In [None]:
df['TIME_HOUR'].head(10)

We selected Series, so it can not be clear what kind of specific format does have our variable, by by selecting only one value, we can see datatype of sperate one value

In [None]:
df['TIME_HOUR'][0]

We can explore a specific value of DataStamp separatelly from the Series

In [None]:
df['TIME_HOUR'][0].dayofweek

But if we want to explore the whole Series, you cannot apply the same functions as for one TimeStamp value.

When it comes for dataframe, Timestamp in Series works in the same way as regular Series. Lets explore more about timestamp in dataframe

In [None]:
df['TIME_HOUR'].min()

In [None]:
df['TIME_HOUR'].max()

But it support additional date time operations

In [None]:
df['TIME_HOUR'].dt.month

In [None]:
df['TIME_HOUR'].dt.weekday

Given operations can help us to take out specific values from the time series, as well to apply some basics math operations. Lets say we want to filter out the values, which are between two date times.

In [None]:
june = pd.Timestamp('2013-06-01') # June
september = pd.Timestamp('2013-09-01') # September

In [None]:
june

In [None]:
september

In [None]:
df.loc[ (df['TIME_HOUR'] > june) & (df['TIME_HOUR'] < september), 'TIME_HOUR']

Or to select rows where is tim equals to 21 hours

In [None]:
df.loc[df['TIME_HOUR'].dt.hour == 21,'TIME_HOUR']

If you want to see more advanced level, you can look at the given reference:
- https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html
- https://pandas.pydata.org/docs/user_guide/timeseries.html

<!--NAVIGATION-->
< [Group and Pivot Functions](05-group-and-pivot-functions.ipynb) | [Contents](Index.ipynb) | [Visualisation](07-visualisation.ipynb) >