
---

# 🔹 Handling Missing Values in Time Series Data

Since **time order matters**, we can’t just blindly impute like normal tabular data. Here are the main techniques:

---

## 1. **Drop Missing Values**

```python
df.dropna(inplace=True)
```

* **Explanation**: Remove missing rows.
* ✅ Advantage: Simple, no assumption.
* ❌ Disadvantage: Loses data, risky if gaps are big.

---

## 2. **Forward Fill (ffill)**

```python
df.fillna(method='ffill', inplace=True)
```

* **Explanation**: Replace missing value with **previous known value**.
* ✅ Advantage: Good for continuous signals (stock price, sensor).
* ❌ Disadvantage: Can extend old value too far, not good for rapid changes.

---

## 3. **Backward Fill (bfill)**

```python
df.fillna(method='bfill', inplace=True)
```

* **Explanation**: Replace missing value with **next known value**.
* ✅ Advantage: Easy, sometimes realistic.
* ❌ Disadvantage: Uses “future” info → not valid for forecasting.

---

## 4. **Linear Interpolation**

```python
df.interpolate(method='linear', inplace=True)
```

* **Explanation**: Draw a straight line between neighboring points.
* ✅ Advantage: Smooth, good for gradual trends.
* ❌ Disadvantage: Wrong if data has sudden jumps.

---

## 5. **Time-based Interpolation**

```python
df.interpolate(method='time', inplace=True)
```

* **Explanation**: Uses time index → better for irregular intervals.
* ✅ Advantage: More realistic for uneven time steps.
* ❌ Disadvantage: Still assumes smoothness.

---

## 6. **Mean/Median/Mode Imputation**

```python
df['value'].fillna(df['value'].mean(), inplace=True)
```

* **Explanation**: Replace with average of entire series.
* ✅ Advantage: Very simple.
* ❌ Disadvantage: Ignores seasonality & time trend.

---

## 7. **Moving Average (Rolling Window)**

```python
df['value'].fillna(df['value'].rolling(3, min_periods=1).mean(), inplace=True)
```

* **Explanation**: Replace with local average (last 3 values here).
* ✅ Advantage: Smooths noise, respects local structure.
* ❌ Disadvantage: Can oversmooth, lags real trend.

---

## 8. **Seasonal/Trend Decomposition**

```python
from statsmodels.tsa.seasonal import seasonal_decompose
# decompose into trend, seasonality, residuals → fill gaps
```

* **Explanation**: Estimate missing values using seasonal/trend pattern.
* ✅ Advantage: Good for seasonal data (daily, monthly).
* ❌ Disadvantage: Needs lots of data, more complex.

---

## 9. **Model-based Imputation**

```python
from sklearn.linear_model import LinearRegression
# predict missing values using regression, ARIMA, or ML models
```

* **Explanation**: Predict missing points using statistical/ML models.
* ✅ Advantage: Captures dependencies & patterns.
* ❌ Disadvantage: Computationally expensive, risk of overfitting.

---

# ✅ Quick Summary Table

| Method                  | Best for                 | Advantage     | Disadvantage               |
| ----------------------- | ------------------------ | ------------- | -------------------------- |
| Drop Missing            | Few missing points       | Simple        | Data loss                  |
| Forward Fill            | Continuous signals       | Easy          | Extends old values too far |
| Backward Fill           | Rare gaps                | Simple        | Uses “future” info         |
| Linear Interpolation    | Gradual changes          | Smooth        | Bad for sudden jumps       |
| Time Interpolation      | Irregular time intervals | Accurate      | Assumes smoothness         |
| Mean/Median Imputation  | Few missing values       | Simple        | Ignores trend/season       |
| Moving Average          | Local smoothing          | Reduces noise | Oversmoothing              |
| Seasonal/Trend Fill     | Strong seasonality       | Accurate      | Complex                    |
| Model-based (ARIMA, ML) | Large gaps, complex data | Smart         | Expensive                  |

---

👉 Rule of Thumb:

* **Small gaps** → Forward/Backward fill, Interpolation.
* **Seasonal data** → Seasonal decomposition.
* **Big gaps** → Model-based imputation.
* **Very few missing values** → Drop rows.

---



Great question 👍 — **ACF (Autocorrelation Function)** and **PACF (Partial Autocorrelation Function)** are two of the most important tools in **time series analysis**, especially when working with **AR, MA, ARMA, ARIMA models**. These are almost always asked in **interviews** for time series-related roles. Let’s go deep 👇

---

# 1. **Autocorrelation Function (ACF)**

* **Definition**: Measures the correlation between the time series and its **lagged versions**.
* Simply: *How much does today’s value depend on yesterday’s, the day before, etc.?*
* Formula:

  $$
  ACF(k) = Corr(Y_t, Y_{t-k})
  $$

  where $k$ is the lag.

✅ **Key use**: Helps identify the **order of MA (q)** component in ARIMA.

---

### Example intuition:

If today’s stock price is highly correlated with yesterday’s price, and less with the day before, the **ACF** will show a slow decay pattern.

---

# 2. **Partial Autocorrelation Function (PACF)**

* **Definition**: Measures the correlation between the series and its **lag**, after removing the effect of all intermediate lags.
* Simply: *What is the “direct” effect of lag $k$, ignoring the effect of lag $1, 2, … k-1$?*
* Formula (conceptual): Residual correlation after controlling for intermediate lags.

✅ **Key use**: Helps identify the **order of AR (p)** component in ARIMA.

---

### Example intuition:

Suppose today’s value depends on yesterday’s value. Naturally, it will also seem correlated with the day before yesterday (through yesterday).

* **ACF** will show both correlations.
* **PACF** will filter out the “indirect” correlation and keep only the direct one.

---

# 3. **Graphical Patterns for Model Selection**

When you plot ACF and PACF, you use them to determine **p** and **q** in ARIMA:

* **AR(p) process**

  * PACF: Cuts off after lag *p*.
  * ACF: Tails off gradually.

* **MA(q) process**

  * ACF: Cuts off after lag *q*.
  * PACF: Tails off gradually.

* **ARMA(p,q) process**

  * Both ACF and PACF tail off gradually.

---

# 4. **Simple Code (Snapshot)**

```python
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

# Assuming 'data' is your time series
plot_acf(data, lags=30)
plot_pacf(data, lags=30)
plt.show()
```

---

# 5. **Advantages & Disadvantages**

### ✅ Advantages:

* Easy diagnostic tool for ARIMA model selection.
* Helps separate AR vs MA processes.
* Provides intuition about temporal dependencies.

### ❌ Disadvantages:

* Can be misleading if the series is **non-stationary** (spurious correlations).
* Noise in data makes interpretation tricky.
* Higher-order dependencies are hard to spot visually.

---

# 6. **Interview Tip**

If asked: *“Why do we need both ACF and PACF?”*
👉 Answer: Because ACF captures total correlation (direct + indirect), while PACF isolates **direct correlation** at each lag. Together, they allow us to identify whether the time series is better modeled by AR, MA, or ARMA.

---

Would you like me to also make a **cheat sheet visualization** of ACF vs PACF patterns for AR, MA, ARMA (like what interviewers expect you to sketch on a whiteboard)?



---

# 🔹 Moving Averages (MA) for Missing Values

A **moving average** smooths data by replacing a missing value with the average of nearby values (past, future, or both).

---

## 1. **Simple Moving Average (SMA)**

👉 Replace missing value with the average of values in a fixed-size **window**.

```python
df['value'] = df['value'].fillna(df['value'].rolling(window=3, min_periods=1).mean())
```

* **Explanation**: Takes the mean of the last 3 values (you can choose window size).
* **Advantage**: Smooths noise, simple, works well for short gaps.
* **Disadvantage**: Can lag behind real trend, not good for highly volatile data.

---

## 2. **Centered Moving Average**

👉 Uses both past and future values to fill missing points.

```python
df['value'] = df['value'].fillna(df['value'].rolling(window=3, center=True, min_periods=1).mean())
```

* **Explanation**: Looks at neighbors on both sides of the missing value.
* **Advantage**: Balances forward and backward trends.
* **Disadvantage**: Not possible at the edges (start or end of series).

---

## 3. **Exponential Moving Average (EMA)**

👉 More weight to **recent observations** instead of equal weight.

```python
df['value'] = df['value'].fillna(df['value'].ewm(span=3, adjust=False).mean())
```

* **Explanation**: Recent values are considered more important.
* **Advantage**: Better for financial data, captures quick changes.
* **Disadvantage**: Still a smoothing method → may distort sudden spikes.

---

✅ **Summary of Moving Averages for Missing Values**

| Method          | When to Use             | Advantage               | Disadvantage                 |
| --------------- | ----------------------- | ----------------------- | ---------------------------- |
| **SMA**         | Short gaps, stable data | Simple, smooths noise   | Lags, may oversmooth         |
| **Centered MA** | Symmetric missing data  | Balanced trend filling  | Not for edges                |
| **EMA**         | Financial/volatile data | Captures recent changes | May ignore long-term pattern |

---

👉 Moving averages are best when:

* Data is **continuous** and not too seasonal.
* Gaps are **short** (few missing points).
* You want **smoothed values** instead of exact estimates.

---

Do you want me to also show how **moving averages compare with interpolation** visually (with a quick Python plot)?


Perfect 👌 — let’s break down **Time Series Stationarity Testing** with the **ADF (Augmented Dickey-Fuller)** and **KPSS (Kwiatkowski–Phillips–Schmidt–Shin)** tests.

---

# 🔹 Why Test for Stationarity?

Most time series models (like **ARIMA, SARIMA**) assume **stationarity**:

* Mean, variance, and autocorrelation are **constant over time**.

If the series is **non-stationary** (has trend/seasonality), forecasts will be unreliable.
So, we use **statistical tests**.

---

# 🔹 1. Augmented Dickey-Fuller (ADF) Test

👉 Tests for **unit root** (non-stationarity).

### Hypotheses:

* **Null (H₀):** Series has a unit root → **non-stationary**.
* **Alternative (H₁):** Series is stationary.

### Python Example:

```python
from statsmodels.tsa.stattools import adfuller

result = adfuller(df['value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
```

### Interpretation:

* **p-value < 0.05** → Reject H₀ → Series is **stationary**.
* **p-value ≥ 0.05** → Fail to reject H₀ → Series is **non-stationary**.

✅ **Good for detecting non-stationarity due to trend.**
❌ **Weak when series has strong seasonality.**

---

# 🔹 2. KPSS Test

👉 Tests for **trend stationarity** (opposite of ADF).

### Hypotheses:

* **Null (H₀):** Series is stationary (trend-stationary).
* **Alternative (H₁):** Series is **non-stationary**.

### Python Example:

```python
from statsmodels.tsa.stattools import kpss

result = kpss(df['value'], regression='c', nlags="auto")
print('KPSS Statistic:', result[0])
print('p-value:', result[1])
```

### Interpretation:

* **p-value < 0.05** → Reject H₀ → Series is **non-stationary**.
* **p-value ≥ 0.05** → Fail to reject H₀ → Series is **stationary**.

✅ **Better at catching trend-stationarity**.
❌ Can over-reject in small samples.

---

# 🔹 Using Both Together

Since ADF and KPSS test **opposite null hypotheses**:

* **ADF rejects, KPSS does not reject → Stationary.**
* **ADF does not reject, KPSS rejects → Non-stationary.**
* **Both reject → Inconclusive (may need transformation).**
* **Both do not reject → Inconclusive (try differencing/transforming).**

---

# 🔹 Summary Table

| Test     | Null Hypothesis (H₀)     | If p < 0.05    | If p ≥ 0.05    |
| -------- | ------------------------ | -------------- | -------------- |
| **ADF**  | Series is non-stationary | Stationary     | Non-stationary |
| **KPSS** | Series is stationary     | Non-stationary | Stationary     |

---

✅ **Rule of Thumb**:

* Always use **both ADF & KPSS** to confirm.
* If series is **non-stationary**, fix it using:

  * **Differencing** (remove trend).
  * **Seasonal differencing**.
  * **Log/Box-Cox transformation** (stabilize variance).

---

Would you like me to also show a **step-by-step Python example** where we run both ADF & KPSS on a dataset (like Airline Passengers) and then make it stationary with differencing?


Perfect 👌 — let’s go into **AR, MA, ARMA, and ARIMA** in **master-level depth** so you can handle **interview questions** with confidence.

---

# 🔹 1. **AR (AutoRegressive) Model**

### Idea:

* Current value of the time series depends **linearly** on its **past values**.
* Captures **momentum & persistence** in the series.

**Equation (AR(p)):**

$$
y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t
$$

* $y_t$: current value
* $y_{t-1}, y_{t-2}, …$: past values
* $\phi_i$: AR coefficients
* $\epsilon_t$: white noise

📌 Example: Stock prices, temperature (where today depends on yesterday + previous days).

✅ **Advantage**: Great at capturing **trend and persistence**.
❌ **Disadvantage**: Can fail if there is **seasonality** or irregular shocks.

---

# 🔹 2. **MA (Moving Average) Model**

### Idea:

* Current value depends on **past forecast errors (shocks/noise)**.
* Captures **short-term shocks**.

**Equation (MA(q)):**

$$
y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q}
$$

* $\epsilon_t$: white noise
* $\theta_i$: MA coefficients

📌 Example: Demand fluctuations influenced by **sudden external shocks**.

✅ **Advantage**: Handles **random shocks** well.
❌ **Disadvantage**: Cannot capture long memory or trend.

---

# 🔹 3. **ARMA (AutoRegressive Moving Average) Model**

### Idea:

* Combines **AR (dependence on past values)** and **MA (dependence on shocks)**.
* Works for **stationary** series.

**Equation (ARMA(p,q)):**

$$
y_t = c + \sum_{i=1}^p \phi_i y_{t-i} + \epsilon_t + \sum_{j=1}^q \theta_j \epsilon_{t-j}
$$

📌 Example: Financial returns, rainfall data (stationary, no trend).

✅ **Advantage**: Captures both **memory + shocks**.
❌ **Disadvantage**: Requires **stationarity** (no trend/seasonality).

---

# 🔹 4. **ARIMA (AutoRegressive Integrated Moving Average)**

### Idea:

* Extension of ARMA for **non-stationary** series.
* Adds a **differencing step** to remove trend.

**Equation (ARIMA(p,d,q)):**

* $p$: order of AR
* $d$: differencing order (number of times differenced to achieve stationarity)
* $q$: order of MA

**Form:**

$$
(1 - \sum_{i=1}^p \phi_i L^i)(1-L)^d y_t = (1 + \sum_{j=1}^q \theta_j L^j)\epsilon_t
$$

Where $L$ is the **lag operator**.

📌 Example: Stock market indices, GDP, sales (with trend/seasonality removed).

✅ **Advantage**: Works for **non-stationary** series.
❌ **Disadvantage**: Doesn’t directly handle **seasonality** → need **SARIMA**.

---

# 🔹 How to Choose Between AR, MA, ARMA, ARIMA?

* **Autocorrelation Function (ACF)** & **Partial Autocorrelation Function (PACF)** are key:

  * **AR(p):** PACF cuts off after lag p, ACF decays gradually.
  * **MA(q):** ACF cuts off after lag q, PACF decays gradually.
  * **ARMA(p,q):** Both ACF & PACF decay gradually.
  * **ARIMA:** If series is non-stationary → difference first.

---

# 🔹 Interview-Level Insights

✅ **Q1. How do you identify AR vs MA vs ARMA?**

* By looking at ACF/PACF plots.

✅ **Q2. Why ARIMA over ARMA?**

* ARMA requires stationarity, ARIMA allows differencing.

✅ **Q3. What if seasonality exists?**

* Use **SARIMA / Seasonal ARIMA**.

✅ **Q4. Limitations of ARIMA?**

* Assumes linearity, poor for long-term forecasting with regime changes.

✅ **Q5. Alternatives to ARIMA?**

* **Exponential smoothing (ETS), Prophet (Facebook), State-Space Models, LSTMs (Deep Learning)**.

---

👉 Do you want me to also create a **step-by-step comparison table** (AR vs MA vs ARMA vs ARIMA) with **ACF/PACF patterns + use cases + pros/cons**? That will be interview gold.


Got it 👍 Let’s go **step by step** and make anomaly detection & autoencoders very clear (both theory + intuition + math + use cases).

---

# 🔹 1. What is **Anomaly Detection**?

### Definition

Anomaly detection is the task of identifying **data points that deviate significantly from the majority of the data**. These unusual data points are called **anomalies** or **outliers**.

### Why important?

* Fraud detection (bank transactions, credit card misuse)
* Intrusion detection (network traffic)
* Fault detection (manufacturing, IoT sensors)
* Healthcare (rare diseases in medical scans)
* Predictive maintenance (detecting unusual machine vibrations)

---

### Types of Anomalies:

1. **Point Anomalies**

   * A single data point far away from the rest.
   * Example: A transaction of \$100,000 when typical spending is \$500.

2. **Contextual Anomalies**

   * Normal in one context, abnormal in another.
   * Example: \$200 electricity bill in summer (normal), but same in winter (anomaly).

3. **Collective Anomalies**

   * A group of related points is abnormal.
   * Example: A sudden spike in server requests (possible DDoS attack).

---

### Approaches for Anomaly Detection:

1. **Statistical Methods**

   * Assume data follows distribution (e.g., Gaussian).
   * If probability is very low → anomaly.
   * Example: Z-score, Grubbs’ test.

2. **Distance-based Methods**

   * If a point is too far from neighbors → anomaly.
   * Example: k-NN, Mahalanobis distance.

3. **Density-based Methods**

   * Look at local density of data.
   * If a point is in a sparse region → anomaly.
   * Example: LOF (Local Outlier Factor).

4. **Machine Learning Methods**

   * Supervised: Train classifier with “normal” vs “anomaly” labels.
   * Unsupervised: Assume anomalies are rare and different from normal.
   * Semi-supervised: Train only on “normal” data, detect anomalies at test time.

5. **Deep Learning Methods**

   * **Autoencoders, LSTM, GANs** → useful for high-dimensional & complex data.

---

# 🔹 2. What is an **Autoencoder**?

### Definition

An **Autoencoder (AE)** is a type of **unsupervised neural network** used to learn an efficient, compressed representation of data.

It has two parts:

1. **Encoder** → Compress input data into a smaller **latent space** (feature vector).
2. **Decoder** → Reconstructs input from latent space.

The goal: **Output ≈ Input**

---

### Architecture

```
Input → [Encoder] → Latent Representation → [Decoder] → Reconstructed Output
```

* Encoder: Reduces dimensionality. (Dense / CNN / LSTM layers)
* Latent Space: Compressed vector (bottleneck).
* Decoder: Expands back to original size.

---

### Loss Function

* For continuous data: **Mean Squared Error (MSE)**
  $L = \frac{1}{N}\sum (x - \hat{x})^2$
* For binary data: **Binary Cross-Entropy**

---

### Why useful?

* Forces the model to **learn key features** instead of memorizing.
* Learns compressed patterns of "normal" data.

---

# 🔹 3. **Autoencoders for Anomaly Detection**

The intuition is simple:
👉 Train Autoencoder **only on normal data**.
👉 During inference, pass a new data point:

* If it’s normal → autoencoder reconstructs well (low error).
* If it’s anomaly → reconstruction is poor (high error).

Thus, **Reconstruction Error** acts as an anomaly score.

---

### Steps:

1. Collect normal data (majority).
2. Train Autoencoder → minimize reconstruction error.
3. For new data:

   * Compute error = |x - reconstructed\_x|.
   * If error > threshold → anomaly.

---

### Example

* Suppose we train an autoencoder on ECG signals of healthy patients.
* When fed abnormal heartbeat patterns → the reconstruction is poor.
* High error → detected as anomaly (possible disease).

---

# 🔹 4. Types of Autoencoders used in Anomaly Detection

1. **Vanilla Autoencoder**

   * Fully connected encoder-decoder.
   * Works for tabular data.

2. **Convolutional Autoencoder (CAE)**

   * Encoder: Conv layers.
   * Decoder: Deconv layers.
   * Works well for images & video.

3. **Recurrent Autoencoder**

   * Encoder: LSTM/GRU.
   * Decoder: LSTM/GRU.
   * Works well for time-series (e.g., stock prices, sensor data).

4. **Variational Autoencoder (VAE)**

   * Learns probability distribution of data (not just compression).
   * Can generate samples + detect anomalies.

---

# 🔹 5. Pros & Cons

✅ Pros:

* Works in high-dimensional data (images, time series).
* No need for labeled anomalies.
* Learns complex patterns.

❌ Cons:

* Needs lots of normal data.
* Sensitive to hyperparameters (latent size, threshold).
* May reconstruct anomalies too well if trained improperly.

---

# 🔹 6. Real-World Applications

* **Finance**: Fraud detection in transactions.
* **Cybersecurity**: Intrusion detection in network traffic.
* **Healthcare**: Detecting tumors or rare diseases from scans.
* **IoT & Manufacturing**: Predictive maintenance.
* **Retail**: Detect unusual customer behaviors.

---

⚡Quick Recap:

* **Anomaly detection** = identifying unusual points.
* **Autoencoders** = compress + reconstruct data.
* For anomaly detection → Train AE on **normal data** → anomalies have **high reconstruction error**.

---

Do you want me to also give you a **step-by-step Python implementation (with PyTorch or TensorFlow)** for anomaly detection using autoencoders, so you can connect theory → practice?
