## Comprehensive Guide on Pandas Datetime

## Import time-series data

### parse_dates attributes in read_csv() function

###### We are using parse_date attribute to parse and convert the date columns in the csv files to numpy datetime64 type

In [2]:
import pandas as pd
import numpy as np

df=pd.read_csv('./electric_production.csv',parse_dates=['DATE'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 397 entries, 0 to 396
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   DATE        397 non-null    datetime64[ns]
 1   IPG2211A2N  397 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 6.3 KB


### Pandas to_datetime

###### Alternatively, you can use to_datetime to convert any column to datetime

In [3]:
df=pd.read_csv('./electric_production.csv',parse_dates=['DATE'])
df['DATE']=pd.to_datetime(df['DATE'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 397 entries, 0 to 396
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   DATE        397 non-null    datetime64[ns]
 1   IPG2211A2N  397 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 6.3 KB


### Extract Month and Year from datetime using datetime accessor

#### We will create 3 new columns here for Year, Month and day after extracting it from the Date column

In [4]:
df['Year']=df['DATE'].dt.year
df['month']=df['DATE'].dt.month
df['day']=df['DATE'].dt.day
df.head()

Unnamed: 0,DATE,IPG2211A2N,Year,month,day
0,1985-01-01,72.5052,1985,1,1
1,1985-02-01,70.672,1985,2,1
2,1985-03-01,62.4502,1985,3,1
3,1985-04-01,57.4714,1985,4,1
4,1985-05-01,55.3151,1985,5,1


In [5]:
# OR

df['Year']=df['DATE'].apply(lambda x: x.year)
df['month']=df['DATE'].apply(lambda x: x.month)
df['day']=df['DATE'].apply(lambda x: x.day)
df.head()

Unnamed: 0,DATE,IPG2211A2N,Year,month,day
0,1985-01-01,72.5052,1985,1,1
1,1985-02-01,70.672,1985,2,1
2,1985-03-01,62.4502,1985,3,1
3,1985-04-01,57.4714,1985,4,1
4,1985-05-01,55.3151,1985,5,1


## Time Series- Aggregation
### resample to find sum on the date index date

###### resample() is a method in pandas that can be used to summarize data by date or time. 

###### Let's find the Yearly sum of Electricity Consumption

In [6]:
df.set_index('DATE').resample('1Y').sum().head()

Unnamed: 0_level_0,IPG2211A2N,Year,month,day
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1985-12-31,745.988,23820,78,12
1986-12-31,752.5187,23832,78,12
1987-12-31,788.8833,23844,78,12
1988-12-31,836.5963,23856,78,12
1989-12-31,862.742,23868,78,12


### resample to find mean on the date index date

##### Lets find the Electricity consumption mean for each year

In [7]:
df.set_index('DATE').resample('1Y').mean().head()

Unnamed: 0_level_0,IPG2211A2N,Year,month,day
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1985-12-31,62.165667,1985.0,6.5,1.0
1986-12-31,62.709892,1986.0,6.5,1.0
1987-12-31,65.740275,1987.0,6.5,1.0
1988-12-31,69.716358,1988.0,6.5,1.0
1989-12-31,71.895167,1989.0,6.5,1.0


## Datetime index and slice

#### Just ensure that the datetime column is set as index for the dataframe. I am using set_index() function to set that before index and slice

#### Filter using the date

In [8]:
df.set_index('DATE')['1987'].head(2)

Unnamed: 0_level_0,IPG2211A2N,Year,month,day
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1987-01-01,73.8152,1987,1,1
1987-02-01,70.062,1987,2,1


#### Filter all rows between two dates i.e. 1989-JAN and 1995-Apr here

In [9]:
df.set_index('DATE')['1989-01':'1995-04'].head()

Unnamed: 0_level_0,IPG2211A2N,Year,month,day
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1989-01-01,77.9188,1989,1,1
1989-02-01,76.6822,1989,2,1
1989-03-01,73.3523,1989,3,1
1989-04-01,65.1081,1989,4,1
1989-05-01,63.6892,1989,5,1


### Date Offset

####  Its a kind of date increment used for a date range. As per the documentation:

#### DateOffset work as follows. Each offset specify a set of dates that conform to the DateOffset. For example, Bday defines this #### set to be the set of dates that are weekdays (M-F). To test if a date is in the set of a DateOffset dateOffset we can use the #### onOffset method: dateOffset.onOffset(date).

#### If a date is not on a valid date, the rollback and rollforward methods can be used to roll the date to the nearest valid date #### before/after the date.

#### DateOffsets can be created to move dates forward a given number of valid dates. For example, Bday(2) can be added to a #### date to move it two business days forward. If the date does not start on a valid date, first it is moved to a valid date

#### Add one day

#### Here we are adding a day(timedelta of 1 day) to the Date column in dataframe and creating a new column called as next_day

In [10]:
df['next_day']=df['DATE']+pd.Timedelta('1 day')
df.head()

Unnamed: 0,DATE,IPG2211A2N,Year,month,day,next_day
0,1985-01-01,72.5052,1985,1,1,1985-01-02
1,1985-02-01,70.672,1985,2,1,1985-02-02
2,1985-03-01,62.4502,1985,3,1,1985-03-02
3,1985-04-01,57.4714,1985,4,1,1985-04-02
4,1985-05-01,55.3151,1985,5,1,1985-05-02


### Adding a Business day

#### Here we are adding a Business day using Bday param, it will add a day between Mon-Fri. if a date is Sat then add a bday will return the next Monday i.e. a Business day instead of a Saturday

In [11]:
df['next_day']=df['DATE'].apply(lambda x: x+pd.offsets.BDay(1))
df.head()

Unnamed: 0,DATE,IPG2211A2N,Year,month,day,next_day
0,1985-01-01,72.5052,1985,1,1,1985-01-02
1,1985-02-01,70.672,1985,2,1,1985-02-04
2,1985-03-01,62.4502,1985,3,1,1985-03-04
3,1985-04-01,57.4714,1985,4,1,1985-04-02
4,1985-05-01,55.3151,1985,5,1,1985-05-02


### Add 2 business days

#### Addind two days to the current DATE column using days parameter and create a new column day_after

In [12]:
df['day_after']=df['DATE'].apply(lambda x: x+pd.DateOffset(days=2))
df.head()

Unnamed: 0,DATE,IPG2211A2N,Year,month,day,next_day,day_after
0,1985-01-01,72.5052,1985,1,1,1985-01-02,1985-01-03
1,1985-02-01,70.672,1985,2,1,1985-02-04,1985-02-03
2,1985-03-01,62.4502,1985,3,1,1985-03-04,1985-03-03
3,1985-04-01,57.4714,1985,4,1,1985-04-02,1985-04-03
4,1985-05-01,55.3151,1985,5,1,1985-05-02,1985-05-03


### Add next month date

#### Adding a month to the DATE column using months parameter

In [13]:
df['next_month_day']=df['DATE'].apply(lambda x: x+pd.DateOffset(months=1))
df.head()

Unnamed: 0,DATE,IPG2211A2N,Year,month,day,next_day,day_after,next_month_day
0,1985-01-01,72.5052,1985,1,1,1985-01-02,1985-01-03,1985-02-01
1,1985-02-01,70.672,1985,2,1,1985-02-04,1985-02-03,1985-03-01
2,1985-03-01,62.4502,1985,3,1,1985-03-04,1985-03-03,1985-04-01
3,1985-04-01,57.4714,1985,4,1,1985-04-02,1985-04-03,1985-05-01
4,1985-05-01,55.3151,1985,5,1,1985-05-02,1985-05-03,1985-06-01


#### For the complete list of parameters check this link
#### https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.DateOffset.html

## Using date_range to create datetime index

#### it is Immutable numpy ndarray of datetime64 data, We will see how to create datetime index and eventually create a dataframe using these datetime index arrays

### Datetime index with Hourly frequency

#### It gives the array of date and time starting from '2018-01-01' with a Hourly frequency and period=3 means total elements of 3

In [14]:
import pandas as pd
dti = pd.date_range('2018-01-01', periods=3, freq='H')
dti

DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',
               '2018-01-01 02:00:00'],
              dtype='datetime64[ns]', freq='H')

### Monthly Frequency

#### Now change the frequency to Monthly and create total 10 date array

In [15]:
index = pd.date_range('2018-01-01',periods=10, freq='M')
index

DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
               '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
               '2018-09-30', '2018-10-31'],
              dtype='datetime64[ns]', freq='M')

### Weekly Frequency with start and end

#### Change the frequency to Weekly and create dates between two dates using start and end dates

In [16]:
pd.date_range(start='2019-01-01', end='2019-04-30', freq='W')

DatetimeIndex(['2019-01-06', '2019-01-13', '2019-01-20', '2019-01-27',
               '2019-02-03', '2019-02-10', '2019-02-17', '2019-02-24',
               '2019-03-03', '2019-03-10', '2019-03-17', '2019-03-24',
               '2019-03-31', '2019-04-07', '2019-04-14', '2019-04-21',
               '2019-04-28'],
              dtype='datetime64[ns]', freq='W-SUN')

### Datetime index with start and end

In [17]:
import datetime
start = datetime.datetime(2011, 1, 1)

end = datetime.datetime(2011, 2, 1)

index = pd.date_range(start, end)
index

DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
               '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14', '2011-01-15', '2011-01-16',
               '2011-01-17', '2011-01-18', '2011-01-19', '2011-01-20',
               '2011-01-21', '2011-01-22', '2011-01-23', '2011-01-24',
               '2011-01-25', '2011-01-26', '2011-01-27', '2011-01-28',
               '2011-01-29', '2011-01-30', '2011-01-31', '2011-02-01'],
              dtype='datetime64[ns]', freq='D')

### Create dataframe using date time index

#### Create dataframe with datetime as index

#### Here index: dti is the date_range created above with hourly frequency

In [37]:
import numpy as np
df= pd.DataFrame({'price':np.random.uniform(0,20,size=3)},index=dti)
df

Unnamed: 0,price
2018-01-01 00:00:00,12.330101
2018-01-01 01:00:00,1.623283
2018-01-01 02:00:00,9.879159


#### Create dataframe with datetime as a column

In [38]:
import numpy as np
df= pd.DataFrame({'price':np.random.uniform(0,20,size=3),'date':dti})
df

Unnamed: 0,price,date
0,6.398526,2018-01-01 00:00:00
1,7.160035,2018-01-01 01:00:00
2,2.347623,2018-01-01 02:00:00


### Datetime Index using Origin Parameter

#### You can set the origin date and a list of days as a parameter and add that to origin date. Here the origin is 2019-10-25 
#### and adding 1 day to it gives 2019-10-26 and similarly adding 2 and 3 gives 2019-10-27 and 2019-10-28 resp

In [39]:
pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('2019-10-25'))

DatetimeIndex(['2019-10-26', '2019-10-27', '2019-10-28'], dtype='datetime64[ns]', freq=None)

### Using Truncate

#### Two date attributes after and before is used to filter the records

In [29]:
df.truncate(after='2018-10')

Unnamed: 0,price,date
0,3.781862,2018-01-01 00:00:00
1,4.087518,2018-01-01 01:00:00
2,7.985541,2018-01-01 02:00:00


### Difference between two date columns

#### Lets see how to find difference between two datetime columns in dataframe in terms of no of days, seconds etc

In [40]:
import pandas as pd
from datetime import datetime
import numpy as np

# create dataframe
df = pd.DataFrame(data=[['A', '2019-10-06T12:25:53', '2019-10-04T10:10:53'],
                        ['A', '2019-10-04T10:10:53', '2019-10-01T08:10:53'],
                        ['B', '2019-10-01T08:10:53', '2019-09-23T01:24:53'],
                        ['B', '2019-09-23T01:24:53', '2019-09-23T15:58:17']],
                  columns=['Letter', 'First_Day', 'Last_Day'])

df['First_Day']=pd.to_datetime(df['First_Day'])
df['Last_Day']=pd.to_datetime(df['Last_Day'])
df

Unnamed: 0,Letter,First_Day,Last_Day
0,A,2019-10-06 12:25:53,2019-10-04 10:10:53
1,A,2019-10-04 10:10:53,2019-10-01 08:10:53
2,B,2019-10-01 08:10:53,2019-09-23 01:24:53
3,B,2019-09-23 01:24:53,2019-09-23 15:58:17


### Difference between two dates in days and seconds

In [41]:
df['diff']=(pd.to_datetime(df['First_Day']) - pd.to_datetime(df['Last_Day'])).dt.days
df['diff_time_delta']=df['First_Day']-df['Last_Day']
df['diff-simple_subtract']=((df['First_Day']-df['Last_Day']).dt.total_seconds())//3600

In [42]:
df.head()

Unnamed: 0,Letter,First_Day,Last_Day,diff,diff_time_delta,diff-simple_subtract
0,A,2019-10-06 12:25:53,2019-10-04 10:10:53,2,2 days 02:15:00,50.0
1,A,2019-10-04 10:10:53,2019-10-01 08:10:53,3,3 days 02:00:00,74.0
2,B,2019-10-01 08:10:53,2019-09-23 01:24:53,8,8 days 06:46:00,198.0
3,B,2019-09-23 01:24:53,2019-09-23 15:58:17,-1,-1 days +09:26:36,-15.0
