# Decomposing Time Series Data into Trend and Seasonality

Time series decomposition involves thinking of a series as a combination of level, trend, seasonality, and noise components.

Decomposition provides a useful abstract model for thinking about time series generally and for better understanding problems during time series analysis and forecasting.

In this notebook we will learn about the different **time series components** and learn how to automatically **split a time series into it's components using Python.**

It's always useful to break down time series into it's components before applying forecasting models. The two components are:

1). **Systematic Component**s: As the name suggests, it means that the time-series has a system or in other words has recurrence that can be modeled.

2). **Non-Systematic Components**: Components of the time series that can not be modeled.

We are going to have a look at these components and how to model them as we move ahead!

**Do upvote the notebook if you liked it!**

## Importing the libraries

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('/kaggle/input/time-series-forecasting-with-yahoo-stock-price/yahoo_stock.csv', parse_dates = ['Date'], index_col = 'Date')

In [None]:
df.head()

**There are six columns given:**

**High** -> Highest Price of the stock for that particular date.

**Low** -> Lowest Price of the stock for that particular date.

**Open** -> Opening Price of the stock.

**Close** -> Closing Price of the stock.

**Volume** -> Total amount of Trading Activity.

**AdjClos**e -> Adjusted values factor in corporate actions such as dividends, stock splits, and new share issuance.

In [None]:
print(df.index.min())
print(df.index.max())

In [None]:
df.shape

The Stock Price for 1825 days are given in this dataset, starting from 23rd November 2015 to 20th November 2020.

In [None]:
df[['High', 'Low']].plot(figsize = (15, 5), alpha = 0.5)

In [None]:
df[['Open', 'Close']].plot(figsize = (15, 5), alpha = 0.5)

We can observe that there are no huge variations in the opening-closing price and the high-low prices. 

* There were huge dips in the stock prices 2 times, once close to 2019(due to Brexit) and once in March 2020(owing to Pandemic). 

* There was an overall increase in the stock price from 2017 to 2018.

* The stock prices started to increase from the latter half for the year 2020

* The stock price went drastically down from starting of 2018 to 2019

## Decomposition

A given time series is thought to consist of **three systematic components including level, trend, seasonality,** and one non-systematic component called **noise.**

These components are defined as follows:

* **Level:** The average value in the series.

* **Trend**: The increasing or decreasing value in the series.

* **Seasonality:** The repeating short-term cycle in the series.

* **Noise:** The random variation in the series.

All series have a level and noise. The trend and seasonality components are optional. It is helpful to think of the components as combining either **additively or multiplicatively.**

An **additive model** suggests that the components are added together as follows:

> **y(t) = Level + Trend + Seasonality + Noise**

An additive model is linear where changes over time are consistently made by the same amount. A linear seasonality has the same frequency (width of cycles) and amplitude (height of cycles).

A **multiplicative model** suggests that the components are multiplied together as follows:

> **y(t) = Level * Trend * Seasonality * Noise**

A multiplicative model is nonlinear, such as quadratic or exponential. Changes increase or decrease over time. A non-linear seasonality has an increasing or decreasing frequency and/or amplitude over time.

Decomposition provides a structured way of thinking about a time series forecasting problem, both generally in terms of modeling complexity and specifically in terms of how to best capture each of these components in a given model.

Each of these components are something you may need to think about and address during data preparation, model selection, and model tuning. You may address it explicitly in terms of modeling the trend and subtracting it from your data, or implicitly by providing enough history for an algorithm to model a trend if it may exist.

In order to implement the naive or classical decomposition method, we use the seasonal_decompose() method provided by the statsmodels library. It requires you to specify whether the model is Additive or Multiplicative. 

## Decompoistion Implementation

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
def decompose(df, column_name):
    """
    A function that returns the trend, seasonality and residual captured by applying both multiplicative and
    additive model.
    df -> DataFrame
    column_name -> column_name for which trend, seasonality is to be captured
    """
    result_mul = seasonal_decompose(df[column_name], model='multiplicative', extrapolate_trend = 'freq')
    result_add = seasonal_decompose(df[column_name], model = 'additive', extrapolate_trend='freq')

    plt.rcParams.update({'figure.figsize': (20, 10)})
    result_mul.plot().suptitle('Multiplicative Decompose', fontsize=30)
    result_add.plot().suptitle('Additive Decompose', fontsize=30)
    plt.show()
    
    return result_mul, result_add

The **seasonal_decompose() function returns a result object**. The result object contains arrays to access four pieces of data from the decomposition: Observed Series, Trend, Seasonality, and residual. We have plotted both Multiplicative as well as Additive model, so that we can decide which one of the two should be used.

## Open

In [None]:
result_mul, result_add = decompose(df, 'Open')

The residuals plotted should not be following any kind of pattern, it should be spread randomly. 

We can see that the trend and seasonality information extracted from the series does seem reasonable. The residuals are also interesting, showing periods of high variability during the rapid falls and rise in the series.

The trend can be clearly observed in the plots above. We had also said that decomposing using additive model will represent the series as a sum of seasonality, trend and residual. Let's check that out:

In [None]:
df_reconstructed = pd.concat([result_add.seasonal, result_add.trend, result_add.resid, result_add.observed], axis = 1)
df_reconstructed.columns = ['seas', 'trend', 'resid', 'actual_values']
df_reconstructed

**Indeed, the sum of the columns seas, trend and resid is equal to the actual values.**

Let's try it out for other columns too.

## Close

In [None]:
result_mul, result_add = decompose(df, 'Close')

In [None]:
df_reconstructed = pd.concat([result_add.seasonal, result_add.trend, result_add.resid, result_add.observed], axis = 1)
df_reconstructed.columns = ['seas', 'trend', 'resid', 'actual_values']
df_reconstructed

## High

In [None]:
result_mul, result_add = decompose(df, 'High')

In [None]:
df_reconstructed = pd.concat([result_add.seasonal, result_add.trend, result_add.resid, result_add.observed], axis = 1)
df_reconstructed.columns = ['seas', 'trend', 'resid', 'actual_values']
df_reconstructed

## Low

In [None]:
result_mul, result_add = decompose(df, 'Low')

In [None]:
df_reconstructed = pd.concat([result_add.seasonal, result_add.trend, result_add.resid, result_add.observed], axis = 1)
df_reconstructed.columns = ['seas', 'trend', 'resid', 'actual_values']
df_reconstructed

**In this way, we are able to capture the trend, seasonality and residuals. By looking at these parameters, we can use trend or seasonality as features or use it to study the time-series dataset.**

**Hope you enjoyed the notebook and most importantly learnt something new.**

**Do upvote the notebook if you liked it!**