# 📊 Anomaly Detection in Synthetic Network Traffic using ARIMA

**Objective:** Use ARIMA to forecast time series network traffic and detect anomalies in synthetic data.

We simulate traffic, inject anomalies, and apply the ARIMA model for time-series anomaly detection.

In [None]:
# Step 1: Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller

In [None]:
# Step 2: Simulate Synthetic Traffic Data
np.random.seed(42)
time_index = pd.date_range(start='2024-01-01', periods=1000, freq='T')
traffic = np.random.normal(loc=100, scale=10, size=1000)

# Inject Anomalies
traffic[300] = 200
traffic[700] = 250

df = pd.DataFrame({'timestamp': time_index, 'bytes': traffic})
df.set_index('timestamp', inplace=True)
df.head()

In [None]:
# Step 3: Visualize the Data
plt.figure(figsize=(14, 4))
plt.plot(df['bytes'], label='Network Traffic')
plt.title("Synthetic Network Traffic")
plt.xlabel("Time")
plt.ylabel("Bytes per Minute")
plt.legend()
plt.show()

In [None]:
# Step 4: Check for Stationarity
result = adfuller(df['bytes'])
print(f"ADF Statistic: {result[0]}")
print(f"p-value: {result[1]}")

In [None]:
# Step 5: Fit ARIMA Model
model = ARIMA(df['bytes'], order=(5,1,0))
model_fit = model.fit()
model_fit.summary()

In [None]:
# Step 6: Forecast and Compare
forecast = model_fit.predict(start=1, end=len(df)-1, typ='levels')
residuals = abs(df['bytes'][1:] - forecast)
threshold = residuals.mean() + 3 * residuals.std()
anomalies = residuals[residuals > threshold]

In [None]:
# Step 7: Plot Anomalies
plt.figure(figsize=(14, 5))
plt.plot(df['bytes'], label='Actual')
plt.plot(forecast, color='green', label='Predicted')
plt.scatter(anomalies.index, df.loc[anomalies.index, 'bytes'], color='red', label='Anomaly', zorder=5)
plt.title("Anomaly Detection using ARIMA")
plt.xlabel("Time")
plt.ylabel("Bytes per Minute")
plt.legend()
plt.show()

### ✅ Summary
- We simulated network traffic and injected anomalies.
- ARIMA modeled normal behavior and forecasted expected values.
- Deviations beyond a threshold were flagged as anomalies.
- This technique can be applied to logs, traffic flows, or other temporal features in cybersecurity.