# Calculate Shifts, Percent Change, and Windows on Time Series

Use this template to perform foundational manipulations on your time-series data. It covers shifting your data backward or forwards in time (i.e., lags and leads), calculating the percent change between periods, and window aggregations. There are many applications for these types of manipulations, including tracking financial assets, sales forecasting, and analyzing marketing data.

To swap in your dataset in this template, the following is required:
- You must have a dataset with a date column that can be parsed by pandas. This is checked in the code, and if you encounter difficulties, you can consult the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) for further assistance.
- You must have at least one variable that you are interested in analyzing (e.g., price, sales, etc.).

The placeholder dataset in this template is Google stock price data, containing the closing price on each trading day.

In [3]:
# Load packages
import pandas as pd
from pandas.api.types import is_datetime64_any_dtype as is_datetime

## Setting Up Your Data

Before you begin, you will want to set up your data correctly. The [read_csv()](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function from pandas offers several arguments that make working with time series easier from the start:
- `index_col` will allow you to immediately set the date as the index, allowing easier manipulations afterward.
- `parse_dates` instructs pandas to parse the index as a date if possible.

You can then slice the DataFrame to select the dates/times you are most interested in using. You can read more about time series and date functionality within pandas [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#slice-vs-exact-match). The string you use to slice the DataFrame can be a partial match or a full match.

In [4]:
# Replace this with the name of the column that contains your date information
date_col = "date"

# Replace with the file you want to use and load your dataset into a DataFrame
df = pd.read_csv("google.csv", index_col=date_col, parse_dates=True)

# Check that the index is correctly converted to a date
print("The index been parsed as a date: " + str(is_datetime(df.index)))

The index been parsed as a date: True


If the code above returns False, then you will need to use pandas' [to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) function to correctly set the index to a date.

Next, you can specify the variable of interest and choose a date range.

In [5]:
# Specify the variable of interest
var_col = "close"  # Replace this with the name of the column you want to analyze

# Reduce the DataFrame down to the relevant columns
df_subset = df[[var_col]]  # Pass in the column(s) of interest as a list here

# Select the date range you want to explore
df_time = df_subset[
    "2020-1-1":"2021-1-1"  # Pass in the date ranges you are interested in here
].copy()

# Reset the index
df_time.reset_index(inplace=True)

# Preview the DataFrame
df_time

Unnamed: 0,date,close
0,2020-01-02 00:00:00+00:00,1367.37
1,2020-01-03 00:00:00+00:00,1360.66
2,2020-01-06 00:00:00+00:00,1394.21
3,2020-01-07 00:00:00+00:00,1393.34
4,2020-01-08 00:00:00+00:00,1404.32
...,...,...
248,2020-12-24 00:00:00+00:00,1738.85
249,2020-12-28 00:00:00+00:00,1776.09
250,2020-12-29 00:00:00+00:00,1758.72
251,2020-12-30 00:00:00+00:00,1739.52


## Shifting Data

The [.shift()](https://pandas.pydata.org/docs/reference/api/pandas.Series.shift.html) method allows you to shift data by a given number of periods. A negative number will produce a lag backward in time, and a positive number will produce a lead forward in time.

Below, we use Workspace's Visualize feature (available in the DataFrame output) to plot the resulting DataFrame. To create the plot, we select "Line" as the type, assign "date" and "price" to the x and y axis respectively, and group by the "period". To add a title to the plot, simply click in the space where the title should appear and add your own!

In [6]:
# Use this line to specify the number of periods to shift
shift_periods = 60

# Create a column shifted backward by the number of periods specified above (lag)
df_time["lag_data"] = df_time[var_col].shift(-shift_periods)

# Create a column shifted forward by the number of periods specified above (lead)
df_time["lead_data"] = df_time[var_col].shift(shift_periods)

# Melt the DataFrame in preparation for visualization and drop null values
df_shift = df_time.melt(id_vars="date", var_name="period", value_name="price").dropna()

# Inspect the DataFrame
df_shift

Unnamed: 0,date,period,price
0,2020-01-02 00:00:00+00:00,close,1367.37
1,2020-01-03 00:00:00+00:00,close,1360.66
2,2020-01-06 00:00:00+00:00,close,1394.21
3,2020-01-07 00:00:00+00:00,close,1393.34
4,2020-01-08 00:00:00+00:00,close,1404.32
...,...,...,...
754,2020-12-24 00:00:00+00:00,lead_data,1469.60
755,2020-12-28 00:00:00+00:00,lead_data,1490.09
756,2020-12-29 00:00:00+00:00,lead_data,1458.42
757,2020-12-30 00:00:00+00:00,lead_data,1486.02


In [7]:
# This is a chart, switch to the DataCamp editor to view and configure it.

Unnamed: 0,date,period,price
0,2020-01-02 00:00:00+00:00,close,1367.37
1,2020-01-03 00:00:00+00:00,close,1360.66
2,2020-01-06 00:00:00+00:00,close,1394.21
3,2020-01-07 00:00:00+00:00,close,1393.34
4,2020-01-08 00:00:00+00:00,close,1404.32
...,...,...,...
754,2020-12-24 00:00:00+00:00,lead_data,1469.60
755,2020-12-28 00:00:00+00:00,lead_data,1490.09
756,2020-12-29 00:00:00+00:00,lead_data,1458.42
757,2020-12-30 00:00:00+00:00,lead_data,1486.02


## Percent Change

The [.pct_change()](https://pandas.pydata.org/docs/reference/api/pandas.Series.pct_change.html) method allows you to calculate the percentage change between the current row and another previous row. There are two things to note about the code shown below.
- The `periods` parameter specifies which row to use when calculating the percentage change. It defaults to 1, which means it uses the immediately previous row. Here, 30 periods are used.
- By default, `.pct_change()` returns a decimal. `.mul(100)` multiplies the percentage by 100 for easier reading.

_Note: We again use the Workspace "Visualize" feature in the output of the DataFrame to easily create a line plot of our data._

In [8]:
# Use this line to specify the rate of change you want to calculate
pct_return_periods = 30

# Create a column with the percentage increase and multiply by 100
df_time["percent_change"] = (
    df_time[var_col].pct_change(periods=pct_return_periods).mul(100)
)

# Select relevant columns and drop null values
df_pct = df_time[["date", "percent_change"]].dropna()

# Preview the DataFrame
df_pct

Unnamed: 0,date,percent_change
30,2020-02-14 00:00:00+00:00,11.216423
31,2020-02-18 00:00:00+00:00,11.686241
32,2020-02-19 00:00:00+00:00,9.502155
33,2020-02-20 00:00:00+00:00,8.957613
34,2020-02-21 00:00:00+00:00,5.752962
...,...,...
248,2020-12-24 00:00:00+00:00,-0.790775
249,2020-12-28 00:00:00+00:00,1.500137
250,2020-12-29 00:00:00+00:00,-1.029814
251,2020-12-30 00:00:00+00:00,-2.349864


In [9]:
# This is a chart, switch to the DataCamp editor to view and configure it.

Unnamed: 0,date,percent_change
30,2020-02-14 00:00:00+00:00,11.216423
31,2020-02-18 00:00:00+00:00,11.686241
32,2020-02-19 00:00:00+00:00,9.502155
33,2020-02-20 00:00:00+00:00,8.957613
34,2020-02-21 00:00:00+00:00,5.752962
...,...,...
248,2020-12-24 00:00:00+00:00,-0.790775
249,2020-12-28 00:00:00+00:00,1.500137
250,2020-12-29 00:00:00+00:00,-1.029814
251,2020-12-30 00:00:00+00:00,-2.349864


## Window Functions

You can use window functions to perform aggregations of data over time. A window function can be rolling, such as the average price over the past 30 days. A window function can also be expanding, such as the total sum of products sold over time. This example uses two pandas methods:
- [`.rolling()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html) performs a rolling window calculation on a specified `window`. An aggregation function can be then added. 
- [`.expanding()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.expanding.html) performs an expanding window calculation (i.e., of all previous rows). Again, an aggregation function can then be added.

In this example, [`.mean()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html) is used. However, there are many different aggregation functions available, including [`.sum()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sum.html), [`.median()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.median.html), [`.max()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.max.html), and [`.min()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.min.html).

_Note: We again use the Workspace "Visualize" feature to create a line plot, selecting "period" as the variable to group by._

In [10]:
# Use this line to specify the size of the window you wish to use
window = 30  # Specify your chosen window here

# Create a column with a moving average using the window specified above
df_time["moving_average"] = df_time[var_col].rolling(window).mean()

# Create a column with an expanding average over all previous rows
df_time["expanding_average"] = df_time[var_col].expanding().mean()

# Select relevant columns and melt the DataFrame
df_window = df_time[["date", var_col, "moving_average", "expanding_average"]].\
	melt(id_vars="date", var_name="period", value_name="price")

# Drop null values
df_window.dropna(inplace=True)

# Preview the DataFrame
df_window

Unnamed: 0,date,period,price
0,2020-01-02 00:00:00+00:00,close,1367.370000
1,2020-01-03 00:00:00+00:00,close,1360.660000
2,2020-01-06 00:00:00+00:00,close,1394.210000
3,2020-01-07 00:00:00+00:00,close,1393.340000
4,2020-01-08 00:00:00+00:00,close,1404.320000
...,...,...,...
754,2020-12-24 00:00:00+00:00,expanding_average,1476.983735
755,2020-12-28 00:00:00+00:00,expanding_average,1478.180160
756,2020-12-29 00:00:00+00:00,expanding_average,1479.297849
757,2020-12-30 00:00:00+00:00,expanding_average,1480.330476


In [11]:
# This is a chart, switch to the DataCamp editor to view and configure it.

Unnamed: 0,date,period,price
0,2020-01-02 00:00:00+00:00,close,1367.370000
1,2020-01-03 00:00:00+00:00,close,1360.660000
2,2020-01-06 00:00:00+00:00,close,1394.210000
3,2020-01-07 00:00:00+00:00,close,1393.340000
4,2020-01-08 00:00:00+00:00,close,1404.320000
...,...,...,...
754,2020-12-24 00:00:00+00:00,expanding_average,1476.983735
755,2020-12-28 00:00:00+00:00,expanding_average,1478.180160
756,2020-12-29 00:00:00+00:00,expanding_average,1479.297849
757,2020-12-30 00:00:00+00:00,expanding_average,1480.330476


These techniques lay the foundation for drawing insights from time series data. The next step is analyzing your data. If you're interested, be sure to check out the [Time Series Analysis in Python](https://app.datacamp.com/learn/courses/time-series-analysis-in-python) course.

If you would like to learn more about time series manipulations, be sure to review the [Manipulating Time Series Data in Python](https://app.datacamp.com/learn/courses/manipulating-time-series-data-in-python).