# **<center>Time Series Analyis & Forecasting Notes</center>**

### **What Is Time Series Analysis?**
- Time series analysis is a specific way of analyzing a sequence of data points collected over time. In TSA, analysts record data points at consistent intervals over a set period rather than just recording the data points intermittently or randomly.

**Time Series Analysis Objectives:**

1. **Understanding Changes Over Time:**
   - Figure out how time influences variables.
   - Identify factors affecting things at different times.

2. **Unveiling Dataset Insights:**
   - Analyze consequences and insights from changing dataset features.

3. **Predicting the Future:**
   - Help predict future values of time series variables.

4. **Assumption:**
- Only one assumption: "Stationary," meaning time's start doesn't impact statistical properties of the process.

### **Analyzing Time Series:**

To analyze time series effectively, follow these steps:

1. **Collect and Clean Data:**
   - Gather your data and make sure it's tidy.

2. **Create Time vs Key Feature Visualizations:**
   - Make charts that show how your key feature changes over time.

3. **Check Series Stationarity:**
   - See if your series stays consistent over time.

4. **Understand the Nature with Charts:**
   - Develop charts to grasp the characteristics of your data.

5. **Build Models – AR, MA, ARMA,ARIMA, SARIMA, SARIMAX, FBProphet, LSTM etc :**
   - Use different models to analyze your time series.

6. **Extract Insights from Predictions:**
   - Gain valuable insights from the predictions made by your models.


### **Understanding Time Series Analysis Components:**

1. **Trend:** This is like the continuous flow of a storyline in the data. It could be going up (Positive), down (Negative), or staying constant (Null Trend).

2. **Seasonality:** Imagine regular shifts in the data happening at fixed intervals, creating a pattern like a bell curve or a sawtooth.

3. **Cyclical:** Think of this as a less predictable, uncertain movement in the data with no fixed intervals.

4. **Irregularity:** These are the unexpected surprises in the data—sudden spikes or unusual events happening in a short period.

![Alt Text](./Images/39815Components%20of%20Time%20Series%20Analysis.png) 


### **Understanding the Limits of Time Series Analysis:**

1. **No Room for Missing Values:**
   Time series analysis doesn't handle missing values well, just like some other models. So, when digging into our data, we need to be mindful of this.

2. **Linear Relationships Required:**
   The data points in time series must have a linear relationship. Think of it like the dots on a straight line, which helps the analysis process.

3. **Cost of Data Transformations:**
   Transforming data is a must, but it comes at a price. It can be a bit costly, so we should be aware of the expenses involved.

4. **Primarily for Uni-variate Data:**
   Time series models are most effective with one variable at a time. They might not perform as well with multiple variables in the mix.

### **Data Types of Time Series:**

**Stationary:**
For a dataset to be stationary, it must follow these rules:
1. It shouldn't have Trend, Seasonality, Cyclical, or Irregularity components.
2. The mean value should stay constant throughout the analysis.
3. The variance (how spread out the values are) should remain constant over time.
4. Covariance, which measures the relationship between two variables, should also be constant.

**Non-Stationary:**
If either the mean-variance or covariance changes over time, the dataset is called non-stationary.

![Alt text](./Images/99388Stationary%20Vs%20Non-Stationary.png)

### **Methods to Check Stationarity:**

1. **Augmented Dickey-Fuller (ADF) Test:** This is a popular test with these assumptions:
   - Null Hypothesis (H0): The series is non-stationary.
   - Alternate Hypothesis (HA): The series is stationary.
   - If p-value > 0.05, we fail to reject (H0).
   - If p-value <= 0.05, we accept (H1).

2. **Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test:** 
   - These tests compare a NULL Hypothesis (HO) that sees the time series as stationary around a trend against the alternative of a unit root. Since TSA needs stationary data, we want to make sure our dataset is stationary.

### **Converting Non-Stationary Into Stationary:**
Let's simplify the methods for converting non-stationary time series data into a stationary form for better modeling. There are three main techniques: detrending, differencing, and transformation.

**Detrending:**
Remove trend effects from the data, highlighting only differences from the trend. This helps identify cyclical patterns.

![](./Images/26672Detrending.png)

**Differencing:**
Transform the series into a new one, reducing its dependence on time. This stabilizes the mean, minimizing trend and seasonality.

- ${Y_t = Y_t - Y_{t-1}}$
- ${Y_t}$ : Value with time

![](./Images/98810Converting%20Non-%20Stationarity%20data%20into%20Stationarity.png)

**Transformation:**
Use methods like Power Transform, Square Root, or Log Transfer. Log Transfer is commonly employed.

**Moving Average Methodology:**
- A popular technique is the Moving Average, effective for smoothing short-term variations.

**Types of Moving Averages:**
1. **Simple Moving Average (SMA):**
- Calculate the unweighted mean of the previous \( M \) or \( N \) points. Adjust \( M \) or \( N \) for desired smoothing; higher values smooth more but reduce accuracy.
   
   ![](./Images/27331SMA.png)

   ![](./Images/17974SMA%20over%20a%20period%20of%2010%20and%2020%20years.png)


2. **Cumulative Moving Average (CMA):**
- Unweighted mean of past values up to the current time.

   ![](./Images/16788CMA.png)

   ![](./Images/40978CMA-visi.png)

3. **Exponential Moving Average (EMA):**
- Identify trends and filter noise. It gives more weight to recent data, changing faster than SMA. The smoothing factor **${\alpha}$**  determines weight, ranging from 0 to 1. Smaller **${\alpha}$** values focus on recent data.
- Let's apply exponential moving averages with smoothing factors of 0.1 and 0.3 to our air temperature dataset for a practical understanding.

   ![](./Images/81657EMA.png)

   ![](./Images/41322EMA-visi.png)



### **Time Series Analysis in Data Science and Machine Learning:**
- In the world of Data Science and Machine Learning, when we dive into Time Series Analysis (TSA), we encounter various model options. One notable model is Autoregressive–Moving-Average (ARMA), characterized by parameters [p, d, and q].

- Breaking it down:
    - **p**: autoregressive lags
    - **q**: moving average lags
    - **d**: difference in the order

But, before we jump into ARIMA, let's grasp some key terms.

**Auto-Correlation Function (ACF):**
- ACF tells us how much a value in a time series resembles the previous one. It's like measuring the similarity between a current time series and its past version at different intervals.
- Python's Statsmodels library is handy here, calculating autocorrelation and revealing trends and the impact of past values on present ones.

**Partial Auto-Correlation Function (PACF):**
- PACF is a bit trickier but closely related to ACF. It shows the correlation of a sequence with itself, considering a specific number of time units per sequence order. It focuses on the direct effect, removing other intermediary effects from the time series.

![](./Images/41503Auto-correlation%20and%20Partial%20Auto-Correlation.png)

Observation:
- The previous temperature influences the current temperature, but the significance of that influence decreases and slightly increases from the above visualization along with the temperature with regular time intervals.

So, in a nutshell, ACF and PACF help us unveil patterns and relationships within a time series, guiding our journey in Time Series Analysis. 📈🕰️

**Types of Auto-Correlation:**

![](./Images/16336Types%20of%20AR.png)

| ACF                        | PACF                        | Perfect ML Model           |
| -------------------------- | --------------------------- | -------------------------- |
| Plot declines gradually    | Plot drops instantly        | Auto Regressive model      |
| Plot drops instantly        | Plot declines gradually    | Moving Average model      |
| Plot decline gradually     | Plot decline gradually     | ARMA                       |
| Plot drops instantly        | Plot drops instantly        | No specific model         |

Note: Both ACF and PACF analysis assumes a stationary time series for accurate interpretation.

### **Auto-Regressive Model:**
- An auto-regressive model is a simple model that predicts future performance based on past performance. It is mainly used for forecasting when there is some correlation between values in a given time series and those that precede and succeed (back and forth).
- An AR is a Linear Regression model that uses lagged variables as input. By indicating the input, the Linear Regression model can be easily built using the scikit-learn library. Statsmodels library provides autoregression model-specific functions where you must specify an appropriate lag value and train the model. It is provided in the AutoTeg class to get the results using simple steps.
    - Creating the model AutoReg()
    - Call fit() to train it on our dataset.
    - Returns an AutoRegResults object.
    - Once fit, make a prediction by calling the predict () function

- The equation for the AR model (Let’s compare Y=mX+c)
    - Yt =C+b1 Yt-1+ b2 Yt-2+……+ bp Yt-p+ Ert
- Key Parameters
    - p=past values
    - Yt=Function of different past values
    - Ert=errors in time
    - C=intercept

- Lets’s check whether the given data set or time series is random or not.

Code

https://archive.li/AkQU8