# More Time Series Functionality

In this chapter, we cover several more important pieces of time series data functionality in pandas.

In [None]:
import pandas as pd
df = pd.read_csv('../data/stocks/stocks10.csv', parse_dates=['date'], 
                 index_col='date')
df.head(3)

## Selecting multiple rows at specific frequencies

Thus far, we covered how to select **entire time periods** of data using partial date matching with the `loc` indexer. We then selected **single datetimes** at fixed frequencies with `asfreq`. Now, we will cover how to select entire time periods at fixed frequencies. For example, let's say we are interested in selecting the first three days of each month. We can select the first three days of one particular month with `loc` and the first day of every month with `asfreq`, but we cannot select the first three days of every month. For that we'll need a different approach. Below we review the methods previously covered, beginning by selecting the first three days of one particular month.

In [None]:
df.loc['2010-6-1':'2010-6-3']

The `asfreq` method can select the start of every month.

In [None]:
df.asfreq('MS').head(3)

Perhaps the most effective way to select the first three rows of every month is using the `groupby` method, which has  more methods available to chain from it than `resample`. The first `head` method below is performed on each of the 241 groups. It returns the frist three rows for each group. The second `head` method is performed on the resulting DataFrame (723 rows) to show that the first four months each returned three rows of data. Note that the first three rows are not necessarily the first three days of the month, but the first the days that have values.

In [None]:
df.groupby(pd.Grouper(freq='MS')).head(3).head(12)

### Using the DatetimeIndex to select specific frequencies

Sometimes, it is necessary to use the DatetimeIndex to help with selecting specific frequencies. The DatetimeIndex has all of the same attributes and methods as the a datetime column. For instance, in order to select every June for all the years we'll need to use the `month` attribute (or `month_name` method) and test equality against the integer 6 (or string `'June'`).

In [None]:
filt = df.index.month == 6
filt

Using this filter for boolean selection returns all the rows where the month is June.

In [None]:
df[filt].head(3)

In [None]:
df[filt].tail(3)

Many other types of selections like this are possible. Let's select the 200th day of the year for each year.

In [None]:
filt = df.index.dayofyear == 200
df[filt].head()

Here, we select the 11th day of each month.

In [None]:
filt = df.index.day == 11
df[filt].head()

Selecting every day of a certain week is a bit different. You are forced to call the `isocalendar` method which returns the year, week, and day number as DataFrame.

In [None]:
df.index.isocalendar().head()

Using this DataFrame, select the week as a Series and test equality against a particular week number. Here, all rows within week 22 for all years are returned.

In [None]:
filt = df.index.isocalendar()['week'] == 22
df[filt].head(10)

You can even test for multiple values with the `isin` method. Here, we select the 5th, 10th, and 15th values of each month.

In [None]:
filt = df.index.day.isin([5, 10, 15])
df[filt].head(10)

## Shifting the data

It's possible to shift the data up or down by any number of rows or by specific frequencies using offset aliases with the `shift` method. Pass an **integer** to the `shift` method to move the data up/down by a specific number of rows, keeping the index the same. Here, we shift the first three rows down. The original last three rows are removed from the DataFrame.

In [None]:
df.shift(3).head()

Shifting using negative integers is possible. Here, we shift the data up one row. The original first row is no longer part of the DataFrame.

In [None]:
df.shift(-1).head(3)

The last row is now missing.

In [None]:
df.shift(-1).tail(3)

### Shifting the dates in the index

The DatetimeIndex can be shifted up or down by a specific frequency using an offset alias, which is passed as a string as the second argument in the `shift` method. This will NOT cause the data itself to shift, just the index. To help better understand what is happening, a new column containing the original date will be inserted as the first column in the DataFrame.

In [None]:
df.insert(0, 'original date', df.index)
df.head(3)

 Let's begin by shifting the index up five hours.

In [None]:
df.shift(5, 'H').head(3)

Here we shift the index back ten days.

In [None]:
df.shift(-10, 'D').head(3)

### Caution when shifting by week, month, quarter, or year

It's possible to shift by any number of weeks, months, quarters, or years, but care must be taken for those dates on the start or end of the particular time period shifted. Below, we shift up by one week anchored to Friday. Instead of all of the dates incrementing by 7 days, they get rounded up to the nearest Friday. 

Notice that October 29, 1999 was a Friday. The four trading days before this date are shifted up to October 29, 1999. But, October 29, 1999 itself is moved up to the next Friday, November 5, 1999. This is counterintuitive to me, as I would expect each Friday to remain together in the same week with the other days. November 5, 1999 is also a Friday and is shifted up to the next Friday.

In [None]:
df.shift(1, 'W-FRI').head(10)

The same thing happens when shifting by months, quarters, or years. Here, we shift up 1 month and see that all days in November get shifted up to the end of the month except the last day, November 30th, which gets shifted up to December 31st.

In [None]:
df.loc['1999-11-24':'1999-12-2'].shift(1, 'M')

### Converting to a period and then shifting

It may be more intuitive to first convert the dates in the index to period objects when shifting by weeks, months, quarters, or years. The `to_period` method is available directly from DataFrame with a DatetimeIndex. Below, we pass it the offset alias to convert it to a week time period ending on a Friday. The two rows at the end of the week (October 29th and November 5th) remain in their week and are not moved up to the next week as was done with shift.

In [None]:
df2 = df.to_period('W-Fri')
df2.head(10)

The index can now be shifted up 1 week with the `shift` method. I find this procedure more intuitive than using `shift` directly.

In [None]:
df2.shift(1, 'W-Fri').head(10)

### The PeriodIndex

The index object is now a PeriodIndex and has similar attributes and methods as a DatetimeIndex.

In [None]:
df2.index[:5]

Convert the DataFrame back to one with a DatetimeIndex with the `to_timestamp` method.

In [None]:
df2.shift(1, 'W-Fri').to_timestamp().head()

## Creating date ranges

The `pd.date_range` function (not a method) allows you to create a different sequences of datetimes. The resulting object will always be a DatetimeInex. Pass it a start and end date as strings to create all the days between the two values. Here we create seven datetimes beginning on June 10, 2021 and ending on June 16, 2021.

In [None]:
pd.date_range('2021-6-10', '2021-6-16')

Set the `freq` parameter to an offset alias to create equally spaced timestamps between the start and end date. By default, `freq` is set to one day.

In [None]:
pd.date_range('2021-6-10', '2021-6-12', freq='6H')

Set the `periods` parameter to an integer to control the number of timestamps produced. When doing so, you are only allowed to use either a start or end date.

In [None]:
pd.date_range('2021-6-10', periods=4)

Here we use June 10, 2021 as the end date and create four periods that are 18 seconds apart.

In [None]:
pd.date_range(end='2021-6-10', periods=4, freq='18s')

### Timedelta ranges

The `pd.timedelta_range` works similarly to create sequences of timedeltas which result in a TimedeltaIndex. Here, we create timedeltas every 30 minutes beginning at 8 hours and ending at 17 and a half hours.

In [None]:
pd.timedelta_range('8:00:00', '17:30:00', freq='30min')

For specific frequencies not available using offset aliases, create a Timedelta object. Here, we create the first 10 timedeltas after 8 hours separated by 15.6 seconds.

In [None]:
pd.timedelta_range('8:00:00', freq=pd.Timedelta('15.6s'), periods=10)

### Period ranges

As you might expect, a `period_range` function exists to create sequences of periods in the same manner.

In [None]:
pd.period_range('2010-1-1', periods=20, freq='M')

## Exercises

Continue using the stocks dataset for the following exercises.

### Exercise 1

<span style="color:green; font-size:16px">Select the first three trading days of every sixth year.</span>

### Exercise 2

<span style="color:green; font-size:16px">Select the Wednesday in the 19th week of all the leap years.</span>

### Exercise 3

<span style="color:green; font-size:16px">Shift all dates in the index up three weeks ending on Sunday.</span>

### Exercise 4

<span style="color:green; font-size:16px">Create a DatetimeIndex object containing every Friday in the year 2021.</span>

### Exercise 5

<span style="color:green; font-size:16px">Starting from January 1, 1900, create 20 timestamps separated by 57 days each.</span>

### Exercise 6

<span style="color:green; font-size:16px">Using January 1, 2021 as the end date. Find the previous 15 timestamps separated by 23 month starts</span>

### Exercise 7

<span style="color:green; font-size:16px">You are running an experiment beginning at 8:32 a.m. and concluding at 4 p.m. You need to check in on it every 19 minutes. Create a TimedeltaIndex of the check-in times.</span>