---
# Time Series in Pandas
---
## 1. Time Series - Definition:

- __Time series data is a sequence of data points that occur in successive order over a time period__. 
- In its essence, a __time series__ of a given __variable__ is a collection of discrete variable observations, made over a continuous time period. 
- When data is contextualised within a time frame, we can do some unique data analysis: spot trends, identify seasonal variations and relate the data to other related events across the same time period.
- An example of time series data:
    <center>
        <div>
            <img src="time_series_graph.png" width="500"/>
        </div>
        The share price of a company between 2010-18
    </center>

---
## 2. Time Series in Pandas:

In Pandas, a time series can be represented by a __pandas.Series object, indexed by time__. Typically time series data is numeric.

For the purpose of this lesson, we will be using [yahoofinancials](https://pypi.org/project/yahoofinancials/) - a powerful financial data package, used for pulling both fundamental and technical data from Yahoo Financials. Yahoo Finance have large datasets of historical financials such as stock prices, stock betas (measuring the volatility of a stock), etc.

Finally, let's do some simple cleaning and transformation steps:
- drop unnecessary columns
- convert column 'formatted_date' into actual date object
- set the index of the DataFrame to be 'formatted_date' column


So far we have managed to get our financial data in the form of a DataFrame, we cleaned it and set its index to be a __datetime__ column! 

Recall that every column in a DataFrame is a Series - this, together with the __date__ index, allows us to interpret and manipulate the DataFrame as a __collection of time series variables__!

---
## 2.1 Time Series - Resampling:

In what follows we are going to explore the concept of __time series resampling__:
- __Downsampling__: the process of reducing the sampling rate of observation
    - e.g. downsampling from __daily__ to __monthly__ observations
- __Upsampling__: the process of increasing the sampling rate of observation
     - e.g. upsampling from __yearly__ to __monthly__ observations
     
Syntax:
- `df.resample('frequency').aggregation_function()` where `frequency = M, D, etc.` 

A few notes:
- Note the aggregation function here doesn't matter as we only have one observation/month.
- In the process of downsampling the data we lost the more granular data.
- When we try to upsample, pandas just fills the lost granular data with NaNs.

---
## 2.2 Time Series - Rolling Statistics:

We can calculate __rolling statistics__ with the `.rolling()` method. This takes a moving window of time and performs a specific calculation over the observations in that time window. 

One example of rolling statistics is a __moving average__ - at any data point, it takes the average of of the last __X__ observations (say, last 30 daily observations) and returns a 'rolling/moving average' value. When performed across a Time Series, __moving average__ also returns a __time series of averages__.

Rolling Statistics are conducted by using the `.rolling()` method:
- `series.rolling(X).aggregation_function` where `X = number of observations`

---
## 2.3 Time Series - Other Methods:
Time series data in Pandas has many other interesting methods. Let's explore:
- __Shifting__ a time series by a step-size `x` (also known as __lagging__) - `.shift(x)`
- Calculating the __percentage change__ between values in a time series - `pct_change()`
- __Differencing__ of a time series - taking the difference between consecutive values of a time series with step-size `x` - `.diff(x)`

---
## 3. Summary:
- A __time series__ is a sequence of data points that occur in successive order over a time period.
- In Pandas, every numerical Series with a __Datetime Index__ can be viewed as a time series.
- Time Series allows us to perform operations such as:
     - Resampling - `.resample()`
     - Calculating rolling statistics - `.rolling()`
     - Plotting timelines - `.plot()`
     - Lagging, taking percentage changes and differencing - `.shift()`, `.pct_change()`, `.diff()`

---
## 4. Concept Check:

1. What is a __Time Series__? Explain in simple words the difference between a Pandas Series Object and a Time Series
2. Suppose we have a Time Series, `df` containing a full year worth of observations on variable __X__ (that is 365 daily observations):
- What is the output produced by `df_m = df.resample('M').mean()` - explain the concept of downsampling
- What is the output produced by `df_m_d = df_m.resample('D').mean()` - explain the concept of upsampling
- After resampling to the initial frequency of observation (daily) what can you say about the majority of values in our Time Series. How could you best tackle them? What is the main difference between the initial Time Series and the final Time Series you have produced?