# Playing with Time Series data (datetime) in Pandas

## Imports

In [14]:
from faker import Faker
import numpy as np
import pandas as pd

## Creating Sample Time Series Data

This approach uses the `faker` library to generate random `datetime` instances. There is more information on stackoverflow here:

https://stackoverflow.com/questions/553303/generate-a-random-date-between-two-other-dates

In [83]:
# Instantiate a faker object for sample datetime generation
fake = Faker()

Let's test out creating one sample `datetime` first:

In [86]:
fake.date_time_between(start_date='-10y', end_date='now')

datetime.datetime(2014, 10, 1, 6, 58, 47)

We can also do a simple `date` object:

In [87]:
fake.date_between(start_date='today', end_date='+10y')

datetime.date(2023, 2, 22)

Now we can simple Python array syntax to create samples across a range:

In [91]:
datetimes = [fake.date_time_between(start_date='-30y', end_date='now') for i in range(10)]; datetimes

[datetime.datetime(2008, 12, 2, 15, 0, 5),
 datetime.datetime(2006, 4, 14, 12, 44, 6),
 datetime.datetime(2019, 10, 1, 17, 45, 42),
 datetime.datetime(2009, 9, 25, 15, 22, 49),
 datetime.datetime(2001, 3, 6, 8, 10, 3),
 datetime.datetime(1996, 5, 31, 9, 34, 49),
 datetime.datetime(1992, 9, 25, 1, 45, 31),
 datetime.datetime(2015, 5, 28, 0, 27, 56),
 datetime.datetime(1990, 11, 26, 15, 29, 14),
 datetime.datetime(2008, 5, 4, 11, 11, 42)]

## Create a pandas dataframe

Next, we can put our sample dates into a pandas dataframe for processing.

In [94]:
df = pd.Series(datetimes)
df = df.reset_index(name = "datetime")
df

Unnamed: 0,index,datetime
0,0,2008-12-02 15:00:05
1,1,2006-04-14 12:44:06
2,2,2019-10-01 17:45:42
3,3,2009-09-25 15:22:49
4,4,2001-03-06 08:10:03
5,5,1996-05-31 09:34:49
6,6,1992-09-25 01:45:31
7,7,2015-05-28 00:27:56
8,8,1990-11-26 15:29:14
9,9,2008-05-04 11:11:42


## Adding Additional Columns Based on Existing Columns

After we have it in dataframe format, we can do all sorts of stuff with the data, like adding additional columns based on our original datetime objects.

### Add a 'date' column based on 'datetime'

Converting `datetime` to `date` is as simple as calling `.date()` on the `datetime` instance.

In [75]:
df['datetime'][0], df['datetime'][0].date()

(Timestamp('2011-12-18 04:04:35'), datetime.date(2011, 12, 18))

If we want a whole new column with just the `date` info, we can use the `apply` operator of the dataframe.

In [77]:
df['date'] = df.apply(lambda row: row['datetime'].date(), axis = 1); df

Unnamed: 0,index,datetime,date
0,0,2011-12-18 04:04:35,2011-12-18
1,1,1992-08-23 17:28:41,1992-08-23
2,2,2011-08-07 08:38:46,2011-08-07
3,3,1996-07-12 22:42:29,1996-07-12
4,4,1993-03-04 16:43:05,1993-03-04
5,5,2006-06-30 22:12:21,2006-06-30
6,6,2016-12-12 19:06:07,2016-12-12
7,7,2010-12-17 11:50:00,2010-12-17
8,8,1992-12-14 16:22:06,1992-12-14
9,9,2012-11-02 11:51:09,2012-11-02


### Add 'datetime_to_int' column based on type conversion

In [82]:
df['datetime_to_int'] = df['datetime'].astype(int); df

Unnamed: 0,index,datetime,date,datetime_to_int
0,0,2011-12-18 04:04:35,2011-12-18,1324181075000000000
1,1,1992-08-23 17:28:41,1992-08-23,714590921000000000
2,2,2011-08-07 08:38:46,2011-08-07,1312706326000000000
3,3,1996-07-12 22:42:29,1996-07-12,837211349000000000
4,4,1993-03-04 16:43:05,1993-03-04,731263385000000000
5,5,2006-06-30 22:12:21,2006-06-30,1151705541000000000
6,6,2016-12-12 19:06:07,2016-12-12,1481569567000000000
7,7,2010-12-17 11:50:00,2010-12-17,1292586600000000000
8,8,1992-12-14 16:22:06,1992-12-14,724350126000000000
9,9,2012-11-02 11:51:09,2012-11-02,1351857069000000000
