# Review of Python's datetime objects

In [3]:
import pandas as pd
import datetime as dt

## Creating a date object

In [4]:
someday = dt.date(2018, 12, 1)
someday

datetime.date(2018, 12, 1)

In [5]:
print(someday.year)
print(someday.month)
print(someday.day)

2018
12
1


---
## Creating a datetime object

If we only specify the date, it defaults to midnight, that is, 00:00 Hrs

In [6]:
pd.datetime(2018,12,1)

datetime.datetime(2018, 12, 1, 0, 0)

<br/>
Creating another `datetime` object, specifying only the hour.

In [8]:
pd.datetime(2018,12, 1, 8)

datetime.datetime(2018, 12, 1, 8, 0)

<br/>
Creating another `datetime` object

In [9]:
pd.datetime(2018,12,1,17,30,45)

datetime.datetime(2018, 12, 1, 17, 30, 45)

<br/>
Can print in regular formatting by converting to string

In [10]:
str(pd.datetime(2018,12,1,9,12,23))

'2018-12-01 09:12:23'

---

# Transitioning to Pandas

In [11]:
import pandas as pd

## The Pandas `Timestamp` object
`Timestamp` can parse strings from a variety of formats

In [12]:
pd.Timestamp("2018-09-07")

Timestamp('2018-09-07 00:00:00')

In [13]:
pd.Timestamp("2018, 09, 07")

Timestamp('2018-09-07 00:00:00')

In [14]:
pd.Timestamp("2018/09/07")

Timestamp('2018-09-07 00:00:00')

In [15]:
pd.Timestamp("September 7, 2018")

Timestamp('2018-09-07 00:00:00')

In [16]:
pd.Timestamp("7 Sep 2018")

Timestamp('2018-09-07 00:00:00')

In [17]:
pd.Timestamp("2018 Sept 7")

Timestamp('2018-09-07 00:00:00')

In [18]:
pd.Timestamp("09/07/2018") #MM/DD/YYYY

Timestamp('2018-09-07 00:00:00')

In [19]:
pd.Timestamp("2018-09-07 8:35:12")

Timestamp('2018-09-07 08:35:12')

In [20]:
pd.Timestamp("2018-09-07 8:35:12 PM")

Timestamp('2018-09-07 20:35:12')

```Timestamp``` also accepts Python ```datetime``` object

In [22]:
pd.Timestamp(dt.date(2018,9,7))

Timestamp('2018-09-07 00:00:00')

---
## The `pandas` `DatetimeIndex` object

Creating a `DatetimeIndex` object from list of string dates

In [25]:
dates = ["2018-09-07", "2018-07-09", "2018-10-08"]
pd.DatetimeIndex(dates)

DatetimeIndex(['2018-09-07', '2018-07-09', '2018-10-08'], dtype='datetime64[ns]', freq=None)

Creating a `DatetimeIndex` object from Python `datetime` object

In [21]:
dates = [dt.date(2018,9,7), dt.date(2018,7,9), dt.date(2018,10,8)]
dtIndex = pd.DatetimeIndex(dates)
dtIndex

DatetimeIndex(['2018-09-07', '2018-07-09', '2018-10-08'], dtype='datetime64[ns]', freq=None)




The `DatetimeIndex` is typically used as an index for a `pandas` `Series` or a `Dataframe`.

In [22]:
values = [100, 200, 300]
pd.Series(data = values, index= dtIndex)

2018-09-07    100
2018-07-09    200
2018-10-08    300
dtype: int64

---

## Pandas `pd.to_datetime(..)` method
The `pd.to_datetime(..)` is extremely versatile and can convert from a vareity of date and time formats. 

For example,

In [23]:
pd.to_datetime('2001-09-07')

Timestamp('2001-09-07 00:00:00')

In [24]:
pd.to_datetime(dt.date(2004,9,7))

Timestamp('2004-09-07 00:00:00')

In [25]:
pd.to_datetime(dt.datetime(2005,9,7,14,23))

Timestamp('2005-09-07 14:23:00')

In [27]:
pd.to_datetime(['2013-09-07', '2016/09/07', '7 September 2018', 'Oct 5, 2017', '2016 Nov 19th'])

DatetimeIndex(['2013-09-07', '2016-09-07', '2018-09-07', '2017-10-05',
               '2016-11-19'],
              dtype='datetime64[ns]', freq=None)

Note how the `pd.to_datetime(..)` method was able to easily parse dates from different string formats

If we provide a list that has one or more of elements that cannot be possibly be converted to dates, then `pd.to_datetime(..)` returns a `NaT`, or may throw an error depending on the values of additonal parameters passed to it.

**ERROR**
```
dates = pd.Series(['2013-09-07', '2016/09/07', '30-Feb-2019', 'Hello'])
pd.to_datetime(dates)
```

This is because, by default the value of the `errors` parameter to the `pd.to_datetime(..)` is set to `raise`.
If the value is set to `error = 'coerce'`, then `NaT` is returned.

In [33]:
dates = pd.Series(['2013-09-07', '2016/09/07', '30-Feb-2019', 'Hello'])
pd.to_datetime(dates, errors='coerce')

0   2013-09-07
1   2016-09-07
2          NaT
3          NaT
dtype: datetime64[ns]

### Using the Unix format
In the Unix format the time is stored as number of seconds elapsed since Jan 1, 1970. 

In [53]:
pd.to_datetime([1235433535, 1549710100], unit = 's')

DatetimeIndex(['2009-02-23 23:58:55', '2019-02-09 11:01:40'], dtype='datetime64[ns]', freq=None)

---
## The `pd.date_range(..)` method
**Creates a range of dates**

### Creating date ranges by specifying start and end date

In [56]:
pd.date_range(start='2018-01-01', end='2018-01-10', freq='D')

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
               '2018-01-09', '2018-01-10'],
              dtype='datetime64[ns]', freq='D')

This creates a `DatetimeIndex` that starts at the start date and proceeds onto the end date, and the interval in between is going to be one day, since we have passed `'D'` to the `freq` parameter, short for days.

We can play around with the `freq` parameter to explore different options available to us.

In [57]:
# Specifically passing an interval of 1 day.
pd.date_range(start='2018-01-01', end='2018-01-10', freq='1D')

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
               '2018-01-09', '2018-01-10'],
              dtype='datetime64[ns]', freq='D')

In [58]:
# An interval of 2 Days. NOTE that the freqency parameter is not included.
pd.date_range(start='2018-01-01', end='2018-01-10', freq='2D')

DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05', '2018-01-07',
               '2018-01-09'],
              dtype='datetime64[ns]', freq='2D')

In [59]:
# Only the Business days instead of all days
pd.date_range(start='2018-01-01', end='2018-01-10', freq='B')

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-08', '2018-01-09', '2018-01-10'],
              dtype='datetime64[ns]', freq='B')

**Note:** 2018-01-06 and 2018-01-07 were weekend days, and hence excluded from the returned `DatetimeIndex`.

In [64]:
# Only the Week start dates
pd.date_range(start='2018-01-01', end='2018-01-20', freq='W')

DatetimeIndex(['2018-01-07', '2018-01-14'], dtype='datetime64[ns]', freq='W-SUN')

**Note:** Both 2018-01-07 and 2018-01-14 were Mondays, denoting start of the week.

In [69]:
# Only the week start date, but exclusively specifying Monday as the week start day
pd.date_range(start='2018-01-01', end='2018-01-20', freq='W-MON')

DatetimeIndex(['2018-01-01', '2018-01-08', '2018-01-15'], dtype='datetime64[ns]', freq='W-MON')

In [65]:
# Can also specify the frequency in hours
pd.date_range(start='2019-01-09', end='2019-01-10', freq='6H')

DatetimeIndex(['2019-01-09 00:00:00', '2019-01-09 06:00:00',
               '2019-01-09 12:00:00', '2019-01-09 18:00:00',
               '2019-01-10 00:00:00'],
              dtype='datetime64[ns]', freq='6H')

In [72]:
# Last date of every month
pd.date_range(start='2019-01-12', end='2019-09-07',freq='M')

DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
               '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31'],
              dtype='datetime64[ns]', freq='M')

In [73]:
# Month start date for every month in range
pd.date_range(start='2019-01-12', end='2019-09-07',freq='MS')

DatetimeIndex(['2019-02-01', '2019-03-01', '2019-04-01', '2019-05-01',
               '2019-06-01', '2019-07-01', '2019-08-01', '2019-09-01'],
              dtype='datetime64[ns]', freq='MS')

In [74]:
# Last date for each year in range
pd.date_range(start='2019-01-12', end='2030-09-07',freq='A')

DatetimeIndex(['2019-12-31', '2020-12-31', '2021-12-31', '2022-12-31',
               '2023-12-31', '2024-12-31', '2025-12-31', '2026-12-31',
               '2027-12-31', '2028-12-31', '2029-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')

In [76]:
# First date for each year in range
pd.date_range(start='2019-01-12', end='2030-09-07',freq='AS')

DatetimeIndex(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01',
               '2024-01-01', '2025-01-01', '2026-01-01', '2027-01-01',
               '2028-01-01', '2029-01-01', '2030-01-01'],
              dtype='datetime64[ns]', freq='AS-JAN')

In [77]:
# First date for each year in range considering year starts from April
pd.date_range(start='2019-01-12', end='2030-09-07',freq='AS-APR')

DatetimeIndex(['2019-04-01', '2020-04-01', '2021-04-01', '2022-04-01',
               '2023-04-01', '2024-04-01', '2025-04-01', '2026-04-01',
               '2027-04-01', '2028-04-01', '2029-04-01', '2030-04-01'],
              dtype='datetime64[ns]', freq='AS-APR')

### Creating date ranges by specifying the start date and periods
`periods` parameter specifyies the number of results that we want to get.

In [78]:
pd.date_range(start='2018-09-07', periods = 10, freq='1W')

DatetimeIndex(['2018-09-09', '2018-09-16', '2018-09-23', '2018-09-30',
               '2018-10-07', '2018-10-14', '2018-10-21', '2018-10-28',
               '2018-11-04', '2018-11-11'],
              dtype='datetime64[ns]', freq='W-SUN')

This created a `DatetimeIndex` with 10 dates on Sundays starting from the satrt date.


Now, Creating a `DatetimeIndex` with 30 business days (that is, weekends won't appear in the result).

In [79]:
pd.date_range(start='2019-01-9', periods = 30, freq='B')

DatetimeIndex(['2019-01-09', '2019-01-10', '2019-01-11', '2019-01-14',
               '2019-01-15', '2019-01-16', '2019-01-17', '2019-01-18',
               '2019-01-21', '2019-01-22', '2019-01-23', '2019-01-24',
               '2019-01-25', '2019-01-28', '2019-01-29', '2019-01-30',
               '2019-01-31', '2019-02-01', '2019-02-04', '2019-02-05',
               '2019-02-06', '2019-02-07', '2019-02-08', '2019-02-11',
               '2019-02-12', '2019-02-13', '2019-02-14', '2019-02-15',
               '2019-02-18', '2019-02-19'],
              dtype='datetime64[ns]', freq='B')

**Note:** We can use any of the `freq` parameter value options mentioned above.

---

### Other parameters to `pd.date_range(..)` method

Take me 20 days back in time from December 31, 1999

In [82]:
pd.date_range(end='1999-12-31', periods=20, freq='D')

DatetimeIndex(['1999-12-12', '1999-12-13', '1999-12-14', '1999-12-15',
               '1999-12-16', '1999-12-17', '1999-12-18', '1999-12-19',
               '1999-12-20', '1999-12-21', '1999-12-22', '1999-12-23',
               '1999-12-24', '1999-12-25', '1999-12-26', '1999-12-27',
               '1999-12-28', '1999-12-29', '1999-12-30', '1999-12-31'],
              dtype='datetime64[ns]', freq='D')

**NOTE:**  All the parameter value options to the `freq` parameter mentioned above are valid here as well.

---

## The `.dt` accessor
Just like strings, the `.dt` accessor is required to be specified before calling the regular Python date methods

Creating a `DatetimeIndex` with the `pd.date_range(..)` method

In [87]:
bunch_of_dates = pd.date_range(start='2019-01-01', end='2025-09-07', freq='24D')
bunch_of_dates

DatetimeIndex(['2019-01-01', '2019-01-25', '2019-02-18', '2019-03-14',
               '2019-04-07', '2019-05-01', '2019-05-25', '2019-06-18',
               '2019-07-12', '2019-08-05',
               ...
               '2025-01-17', '2025-02-10', '2025-03-06', '2025-03-30',
               '2025-04-23', '2025-05-17', '2025-06-10', '2025-07-04',
               '2025-07-28', '2025-08-21'],
              dtype='datetime64[ns]', length=102, freq='24D')

In [89]:
s = pd.Series(bunch_of_dates)
s.head(3)

0   2019-01-01
1   2019-01-25
2   2019-02-18
dtype: datetime64[ns]

In [91]:
s.dt.day.head(3)

0     1
1    25
2    18
dtype: int64

In [92]:
s.dt.month.head(3)

0    1
1    1
2    2
dtype: int64

In [93]:
s.dt.weekday_name.head(3)

0    Tuesday
1     Friday
2     Monday
dtype: object

In [95]:
s[  s.dt.is_quarter_start  ]

0    2019-01-01
19   2020-04-01
38   2021-07-01
dtype: datetime64[ns]

In [96]:
s[ s.dt.is_month_end  ]

57   2022-09-30
90   2024-11-30
dtype: datetime64[ns]