In [None]:
#Question 1:  What is Anomaly Detection? Explain its types (point, contextual, and collective anomalies) with examples

​Anomaly Detection is the process of identifying data points, events, or observations that deviate significantly from a dataset's normal behavior.
​Point Anomalies: A single data point that is far removed from the rest of the data. 
​Example: A credit card transaction for $10,000 when the user's typical spending is under $100.
​Contextual Anomalies: Data points that are considered anomalous only within a specific context (like time or location). 
​Example: A temperature reading of 30^\circ C is normal in summer but an anomaly in winter.
​Collective Anomalies: A collection of data points that are normal individually but anomalous when they happen together. 
​Example: A sequence of login attempts from different countries within minutes of each other.
    
#Question 2: Compare Isolation Forest, DBSCAN, and Local Outlier Factor (LOF) in terms of their approach and suitable use cases.
    
:Isolation Forest is an anomaly detection algorithm based on the idea that outliers are easier to isolate than normal points.
It works by randomly selecting features and splitting values to build decision trees. Data points that require fewer splits to isolate are considered anomalies.
Use cases: Fraud detection, network intrusion detection, and large high-dimensional datasets where speed and scalability are important.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) identifies clusters based on data density. Points in low-density regions are treated as noise or outliers.
It can find clusters of arbitrary shapes and does not require the number of clusters in advance.
Use cases: Spatial data analysis, geolocation data, image segmentation, and datasets where clusters are irregularly shaped and noise detection is important.
Local Outlier Factor (LOF) detects anomalies by comparing the local density of a point with the density of its neighbors.
A point is considered an outlier if it has significantly lower density than its surrounding points.
Use cases: Detecting local anomalies in datasets with varying densities, such as sensor data, credit card transactions, and medical data.


#Question 3: What are the key components of a Time Series? Explain each with one example.

​Trend: The long-term increase or decrease in the data. Example: Rise in global temperatures over decades.
​Seasonality: Patterns that repeat over a fixed period. Example: High toy sales every December.
​Cyclical: Fluctuations that occur without a fixed period (often economic). Example: Business boom and recession cycles.
​Residual (Irregular): The random noise left over after removing trend and seasonality. Example: A sudden stock drop due to a specific news event.

#Question 4: Define Stationary in time series. How can you test and transform a non-stationary series into a stationary one? 

​A Stationary Series is one whose statistical properties (mean, variance, and autocorrelation) do not change over time.
​Test: The Augmented Dickey-Fuller (ADF) test is commonly used. A p-value < 0.05 suggests the series is stationary.
​Transform: You can make a series stationary through Differencing (Y_t - Y_{t-1}) or applying mathematical transforms like Logarithms.

#Question 5: Differentiate between AR, MA, ARIMA, SARIMA, and SARIMAX models in terms of structure and application. 

​AR (Auto-Regressive): Predicts based on past values.
​MA (Moving Average): Predicts based on past forecast errors.
​ARIMA: Combines AR, MA, and Differencing (I) for non-stationary data.
​SARIMA: Adds Seasonal components to ARIMA.
​SARIMAX: Adds eXogenous variables (external factors like weather or holidays) to SARIMA.
​Part 3: Python Implementation


In [None]:
#Question 6: Load a time series dataset (e.g., AirPassengers), plot the original series, and decompose it into trend, seasonality, and residual components 

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Load Data
df = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv', 
                 index_col='Month', parse_dates=True)

# Question 6: Decomposition
result = seasonal_decompose(df['Passengers'], model='multiplicative')
result.plot()
plt.show()

plt.plot(df['Passengers'], label='Observed')
plt.plot(forecast, label='Forecast', color='red')
plt.legend()
plt.title("12-Month Airline Passenger Forecast")
plt.show()

In [None]:
#Question 7: Apply Isolation Forest on a numerical dataset (e.g., NYC Taxi Fare) to detect anomalies. Visualize the anomalies on a 2D scatter plot

from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
import numpy as np

# Generate dummy NYC Taxi-like numerical data
X = np.random.randn(100, 2)
X = np.r_[X + 3, X - 3, [[0,0], [10,10]]] # Adding cluster and outliers

# Question 7: Isolation Forest
iso = IsolationForest(contamination=0.05)
y_iso = iso.fit_predict(X)

# Visualization (plotting Isolation Forest results)
plt.scatter(X[:, 0], X[:, 1], c=y_iso, cmap='RdYlGn')
plt.title("Isolation Forest Anomaly Detection (Green=Normal, Red=Anomaly)")
plt.show()

In [None]:
#Question 8: Train a SARIMA model on the monthly airline passengers dataset. Forecast the next 12 months and visualize the results. 

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Load Data
df = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv', 
                 index_col='Month', parse_dates=True

# Question 8: SARIMA Forecasting
model = SARIMAX(df['Passengers'], order=(1,1,1), seasonal_order=(1,1,1,12))
results = model.fit()
forecast = results.get_forecast(steps=12).predicted_mean

plt.plot(df['Passengers'], label='Observed')
plt.plot(forecast, label='Forecast', color='red')
plt.legend()
plt.title("12-Month Airline Passenger Forecast")
plt.show()

In [None]:
#Question 9: Apply Local Outlier Factor (LOF) on any numerical dataset to detect anomalies and visualize them using matplotlib

from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
import numpy as np

# Generate dummy NYC Taxi-like numerical data
X = np.random.randn(100, 2)
X = np.r_[X + 3, X - 3, [[0,0], [10,10]]] # Adding cluster and outliers

# Question 9: LOF
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.05)
y_lof = lof.fit_predict(X)

# Visualization (plotting Isolation Forest results)
plt.scatter(X[:, 0], X[:, 1], c=y_iso, cmap='RdYlGn')
plt.title("Isolation Forest Anomaly Detection (Green=Normal, Red=Anomaly)")
plt.show()

In [None]:
#Question 10: You are working as a data scientist for a power grid monitoring company. Your goal is to forecast energy demand and also detect abnormal spikes or drops in real-time consumption data collected every 15 minutes.
#The dataset includes features like timestamp, region, weather conditions, and energy usage. Explain your real-time data science workflow: ● How would you detect anomalies in this streaming data (Isolation Forest / LOF / DBSCAN)?
#● Which time series model would you use for short-term forecasting (ARIMA / SARIMA / SARIMAX)? ● How would you validate and monitor the performance over time? ● How would this solution help business decisions or operations? 

:-    ​Anomaly Detection: I would use Isolation Forest because it handles high-frequency streaming data well and effectively identifies sudden
       "spikes" or "drops" by isolating them from the dense bulk of normal consumption.
       Short-term Forecasting: I would use SARIMAX. Power demand is highly seasonal (daily/weekly) and influenced by exogenous variables like weather (temperature affects AC/heating usage).
       Validation: Use a rolling window cross-validation (Walk-Forward Validation) to monitor the Mean Absolute Percentage Error (MAPE) over time.
       Business Impact: This solution helps prevent grid overloads, optimizes energy purchasing costs, and allows for proactive maintenance when hardware failure causes an anomaly.