# Working with Dates in Pandas


In [25]:
import pandas as pd
from random import randint
from datetime import timedelta, date

## Date Ranges 
- `pandas.date_range(start, end)` is the simplest use which produces a `DatetimeIndex` which can be converted to a Series object
    - see [the official documentation](https://pandas.pydata.org/docs/reference/api/pandas.date_range.html)

### Date Range Examples

#### Date Range from Date Strings

In [24]:
# Daily Range
dr = pd.date_range('2000-01-01', '2000-01-03')
dr

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03'], dtype='datetime64[ns]', freq='D')

#### From Date Objects

In [26]:
# Daily Range from Date types
dr = pd.date_range(date(2000, 1, 1), date(2000, 1, 3))
dr

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03'], dtype='datetime64[ns]', freq='D')

#### With a Monthly Frequency
See [frequency string](https://pandas.pydata.org/pandas-docs/dev/user_guide/timeseries.html#timeseries-offset-aliases) documentation for more information.

In [29]:
# Month end Frequency
dr = pd.date_range(start='2000-01-01', end='2000-04-01', freq='M')
dr

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='M')

In [31]:
# Month end Frequency with a number of periods
dr = pd.date_range(start='2000-01-01', periods=3, freq='M')
dr

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='M')

In [32]:
# Month begin Frequency with a number of periods
dr = pd.date_range(start='2000-01-01', periods=3, freq='MS')
dr

DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01'], dtype='datetime64[ns]', freq='MS')

#### With a Weekly Frequency
- Always note which day weeks start on.

In [40]:
# Weekly 
dr = pd.date_range(start='2000-01-01', periods=3, freq='W')
dr

DatetimeIndex(['2000-01-02', '2000-01-09', '2000-01-16'], dtype='datetime64[ns]', freq='W-SUN')

In [41]:
# DateTimeIndex objects have a day_name() method (locale specific)
dr.day_name()

Index(['Sunday', 'Sunday', 'Sunday'], dtype='object')

In [42]:
# Weekly with a Monday start date
dr = pd.date_range(start='2000-01-01', periods=3, freq='W-MON')
dr

DatetimeIndex(['2000-01-03', '2000-01-10', '2000-01-17'], dtype='datetime64[ns]', freq='W-MON')

## Dates in Dataframe Columns
Here, we create a simple dataframe for the purpose of demonstrating date methods on dataframe columns.

The official documentation extensive examples for:
- [datetime-like attributes](https://pandas.pydata.org/docs/reference/series.html#datetime-properties)
- [datetime-like methods](https://pandas.pydata.org/docs/reference/series.html#datetime-methods)

In [45]:
# DatetimeIndex
start_date = pd.date_range('2000-01-01', '2000-01-05')

# Some random timedeltas (in days)
days = pd.Series([timedelta(days=randint(0, 4)) for _ in range(5)])

# Add the timedeltas to the index.  This produces a pd.Series
end_date = d_start + days

# Create a pd.DataFrame adding column names and a labeled 'event' column
df = pd.DataFrame(
    {
        'event': list('ABCDE'),
        'start_date': d_start,
        'end_date': d_end,
    }
)
df

Unnamed: 0,event,start_date,end_date
0,A,2000-01-01,2000-01-02
1,B,2000-01-02,2000-01-06
2,C,2000-01-03,2000-01-07
3,D,2000-01-04,2000-01-04
4,E,2000-01-05,2000-01-08


### Add Day Name Columns

In [48]:
df2 = (
    df
    .assign(start_day=df.start_date.dt.day_name())
    .assign(end_day=df.end_date.dt.day_name())
)
df2

Unnamed: 0,event,start_date,end_date,start_day,end_day
0,A,2000-01-01,2000-01-02,Saturday,Sunday
1,B,2000-01-02,2000-01-06,Sunday,Thursday
2,C,2000-01-03,2000-01-07,Monday,Friday
3,D,2000-01-04,2000-01-04,Tuesday,Tuesday
4,E,2000-01-05,2000-01-08,Wednesday,Saturday


In [49]:
# re-order columns
df2 = df2[['event', 'start_day', 'start_date',
           'end_day', 'end_date',          
]]
df2

Unnamed: 0,event,start_day,start_date,end_day,end_date
0,A,Saturday,2000-01-01,Sunday,2000-01-02
1,B,Sunday,2000-01-02,Thursday,2000-01-06
2,C,Monday,2000-01-03,Friday,2000-01-07
3,D,Tuesday,2000-01-04,Tuesday,2000-01-04
4,E,Wednesday,2000-01-05,Saturday,2000-01-08


### Add an event length column
Determine the duration of an event in days and add it to the dataframe.

In [54]:
(
    df2
    .assign(event_duration=(df2.end_date - df2.start_date).dt.days)
)
df2.end_date

0   2000-01-02
1   2000-01-06
2   2000-01-07
3   2000-01-04
4   2000-01-08
Name: end_date, dtype: datetime64[ns]

#### How it works
- `df2.end_date` is a `pd.Series` with `dtype` `datetime`. 
- `df2.end_date - df2.start_date` is a `pd.Series` with `dtype` `timedelta`.
- `timedelta` [attributes](https://pandas.pydata.org/docs/reference/series.html#timedelta-properties) provide access to the time difference in days. 

We can see the build up in the next few cells.

In [55]:
df2.end_date - df2.start_date

0   1 days
1   4 days
2   4 days
3   0 days
4   3 days
dtype: timedelta64[ns]

In [57]:
event_timedelta = df2.end_date - df2.start_date
event_timedelta

0   1 days
1   4 days
2   4 days
3   0 days
4   3 days
dtype: timedelta64[ns]

In [60]:
# use dt methods on Series with timedelta types.
event_timedelta.dt.days

0    1
1    4
2    4
3    0
4    3
dtype: int64

### Combine all operations using Chained Methods
On the original dataframe...

In [65]:
columns = ['event', 'start_day', 'start_date',
           'end_day', 'end_date', 'event_duration']

(
    df
    .assign(start_day=df.start_date.dt.day_name())
    .assign(end_day=df.end_date.dt.day_name())
    .assign(event_duration=(df.end_date - df.start_date).dt.days)
    .reindex(columns=columns) # Change the order of the columns
)

Unnamed: 0,event,start_day,start_date,end_day,end_date,event_duration
0,A,Saturday,2000-01-01,Sunday,2000-01-02,1
1,B,Sunday,2000-01-02,Thursday,2000-01-06,4
2,C,Monday,2000-01-03,Friday,2000-01-07,4
3,D,Tuesday,2000-01-04,Tuesday,2000-01-04,0
4,E,Wednesday,2000-01-05,Saturday,2000-01-08,3
