# Exploratory Data Analysis for AI-based Intrusion Detection System

This notebook provides an exploratory data analysis (EDA) of the network traffic data used for training the AI-based Intrusion Detection System (IDS). The goal is to understand the data distribution, identify patterns, and visualize anomalies.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [None]:
# Load the processed data
data_path = '../data/processed/your_processed_data.csv'
data = pd.read_csv(data_path)

# Display the first few rows of the dataset
data.head()

In [None]:
# Summary statistics
data.describe()

In [None]:
# Check for missing values
missing_values = data.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Visualize the distribution of a specific feature
plt.figure(figsize=(10, 6))
sns.histplot(data['feature_name'], bins=30, kde=True)
plt.title('Distribution of Feature Name')
plt.xlabel('Feature Name')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Insights
Based on the exploratory data analysis, we can derive insights about the data distribution, feature correlations, and potential anomalies that may require further investigation.