## More Data Processing with Pandas

### Date/Time Functionality

Manipulating dates and time is quite flexible in Pandas because it allows to conduct more analysis such as time series analysis.

Pandas has four main time-related classes:

* `Timestamp`
* `DatetimeIndex`
* `Period`
* `PeriodIndex`

In [1]:
import pandas as pd
import numpy as np

**Timestamp**:

`Timestamp` represents a single timestamp and associates values with points in time. For example, a string `'9/1/2019 10:05AM'` is a timestamp. Timestamp is also interchangeable with Python's datetime in most cases.

In [2]:
# Let's create a timestamp
pd.Timestamp('9/1/2019 10:05AM')

Timestamp('2019-09-01 10:05:00')

In [4]:
# We can also pass in the year, month, day, etc.
pd.Timestamp(2019, 12, 20, 0, 0)

Timestamp('2019-12-20 00:00:00')

The `isoweekday()` attribute shows the weekday of the timestamp (1 represents Monday, 7 represents Sunday)

In [5]:
pd.Timestamp(2019, 12, 20, 0, 0).isoweekday() # Returns a weekday: 5 -> Friday

5

In [6]:
# Extract specific year, month, day etc.
pd.Timestamp(2019, 12, 20, 5, 2, 23).second

23

**Period**:

`Period` represents a single time span, such as a specific point in time, like a day or month.

In [7]:
pd.Period('1/2016') # Period is M for month

Period('2016-01', 'M')

In [8]:
pd.Period('3/5/2016') # D for day

Period('2016-03-05', 'D')

In [9]:
# Arithmetic works on Period: for example, 5 months after January 2016 -> June 2016
pd.Period('1/2016') + 5

Period('2016-06', 'M')

In [10]:
# 2 days before March 5th, 2016: March 3rd, 2016
pd.Period('3/5/2016') - 2

Period('2016-03-03', 'D')

**DatetimeIndex and PeriodIndex**:

`DatetimeIndex`: The index of a Timestamp.

Let's create a Series `t1` and add a Timestamp of September 1st to 3rd of 2016 as index of the values 'a', 'b' and 'c'.

`PeriodIndex`: The index of a Period.

We'll do something similar to `t1`: Create `t2` using period-based index.

In [11]:
t1 = pd.Series(list('abc'), [pd.Timestamp('2016-09-01'), pd.Timestamp('2016-09-02'), pd.Timestamp('2016-09-02')])
t1

2016-09-01    a
2016-09-02    b
2016-09-02    c
dtype: object

In [12]:
# If we check the index of t1, we find that it's DatetimeIndex
type(t1.index)

pandas.core.indexes.datetimes.DatetimeIndex

In [13]:
t2 = pd.Series(list('def'), [pd.Period('2016-09'), pd.Period('2016-10'), pd.Period('2016-11')])
t2

2016-09    d
2016-10    e
2016-11    f
Freq: M, dtype: object

In [14]:
type(t2.index)

pandas.core.indexes.period.PeriodIndex

**Converting to Datetime**:

It's common to have a set of dates as strings and we want to convert them `to_datetime`.

In [15]:
# Suppose the following list of dates in different formats
d1 = ['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']
d1

['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']

In [16]:
# Now we create some random data and assign d1 as index
ts3 = pd.DataFrame(np.random.randint(10, 100, (4, 2)), index=d1, columns=list('ab'))
ts3

Unnamed: 0,a,b
2 June 2013,22,29
"Aug 29, 2014",23,58
2015-06-26,29,98
7/12/16,20,27


In [17]:
type(ts3.index) # Index

pandas.core.indexes.base.Index

In [18]:
# Use to_datetime to convert these to Datetime and in a standard form
ts3.index = pd.to_datetime(ts3.index)
ts3

Unnamed: 0,a,b
2013-06-02,22,29
2014-08-29,23,58
2015-06-26,29,98
2016-07-12,20,27


In [19]:
type(ts3.index) # DatetimeIndex

pandas.core.indexes.datetimes.DatetimeIndex

**Timedelta**:

Timedeltas are differences in times. For example,

In [20]:
pd.Timestamp('9/3/2016') - pd.Timestamp('9/1/2016')

Timedelta('2 days 00:00:00')

In [21]:
# For more precise (hard-to-calculate) timedeltas:
pd.Timestamp('9/2/2016 8:10AM') + pd.Timedelta('12D 3H')

Timestamp('2016-09-14 11:10:00')

**Offset**:

Offset is similar to timedelta, but it follows specific calendar duration rules. It adds flexibility in terms of types of time intervals: e.g. including business day, end of month, semi month begins, etc.

In [22]:
pd.Timestamp('9/4/2016').weekday()

6

In [23]:
# Let's add an offset of one week
pd.Timestamp('9/4/2016') + pd.offsets.Week()

Timestamp('2016-09-11 00:00:00')

In [24]:
# Let's offset to the month end
pd.Timestamp('9/4/2016') + pd.offsets.MonthEnd()

Timestamp('2016-09-30 00:00:00')

**Working with Dates in DataFrames**:

* `date_range()`:

Let's suppose we want to look at nine dates for measuring data: bi-weekly, every sunday, starting in October 2016. For this, we use `date_range()` and specify the start or end date, the number of periods and a frequency (bi-weekly = `'2W-SUN'`)

In [25]:
# Bi-weekly
dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN') # 2 weeks, starting on Sunday
dates

DatetimeIndex(['2016-10-02', '2016-10-16', '2016-10-30', '2016-11-13',
               '2016-11-27', '2016-12-11', '2016-12-25', '2017-01-08',
               '2017-01-22'],
              dtype='datetime64[ns]', freq='2W-SUN')

In [26]:
# Business days
pd.date_range('10-01-2016', periods=9, freq='B')

DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
               '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
               '2016-10-13'],
              dtype='datetime64[ns]', freq='B')

In [27]:
# Quaterly, with the quarter starting in June
pd.date_range('04-01-2016', periods=12, freq='QS-JUN') # Quaterly, starting in June

DatetimeIndex(['2016-06-01', '2016-09-01', '2016-12-01', '2017-03-01',
               '2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01',
               '2018-06-01', '2018-09-01', '2018-12-01', '2019-03-01'],
              dtype='datetime64[ns]', freq='QS-JUN')

We can also check what day of the week a specific date is using `day_name()`.

In [28]:
# New example with random data
df = pd.DataFrame({'Count 1': 100 + np.random.randint(-5, 10, 9).cumsum(),
                   'Count 2': 120 + np.random.randint(-5, 10, 9)}, index=dates)
df

Unnamed: 0,Count 1,Count 2
2016-10-02,103,116
2016-10-16,108,118
2016-10-30,103,119
2016-11-13,102,117
2016-11-27,98,118
2016-12-11,104,128
2016-12-25,110,123
2017-01-08,115,119
2017-01-22,119,129


In [32]:
# Check weekday name
df.index.day_name()

Index(['Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday',
       'Sunday', 'Sunday'],
      dtype='object')

**`resample()`**:

Suppose we want to know what the mean count is for each month in our DataFrame. We can do this using `resample()`. Converting from a *higher* frequency to a *lower* frequency is called **downsampling**.

In [33]:
df.resample('M').mean()

Unnamed: 0,Count 1,Count 2
2016-10-31,104.666667,117.666667
2016-11-30,100.0,117.5
2016-12-31,107.0,125.5
2017-01-31,117.0,124.0


**Indexing and Slicing dates**:

We use a partial string indexing:

In [35]:
df.loc['2017'] # 2017

Unnamed: 0,Count 1,Count 2
2017-01-08,115,119
2017-01-22,119,129


In [36]:
df.loc['2016-12'] # specific month

Unnamed: 0,Count 1,Count 2
2016-12-11,104,128
2016-12-25,110,123


In [37]:
df.loc['2016-12':] # slicing: 2016-12 onwards

Unnamed: 0,Count 1,Count 2
2016-12-11,104,128
2016-12-25,110,123
2017-01-08,115,119
2017-01-22,119,129


In [38]:
import pandas as pd
(pd.Timestamp('11/29/2019') + pd.offsets.MonthEnd()).weekday()

5