# Exploratory Data Analysis for SIEM

This notebook is designed for exploratory data analysis (EDA) of logs collected by the SIEM system. It will help in understanding the structure, patterns, and anomalies in the log data.

In [1]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

## Load Sample Log Data

We will load sample log data for analysis. This data can be from the `examples/sample_events` directory.

In [2]:
# Load sample log data
linux_logs = pd.read_csv('../examples/sample_events/linux_syslog.log')
windows_logs = pd.read_csv('../examples/sample_events/windows_evtx_sample.evtx')

# Display the first few rows of the Linux logs
linux_logs.head()

## Data Overview

Let's take a look at the structure of the loaded data and check for any missing values.

In [3]:
# Overview of the Linux logs
linux_logs.info()

# Check for missing values
linux_logs.isnull().sum()

## Visualize Log Frequency

We can visualize the frequency of log entries over time to identify any patterns or anomalies.

In [4]:
# Convert timestamp to datetime
linux_logs['timestamp'] = pd.to_datetime(linux_logs['timestamp'])

# Plot log frequency over time
plt.figure(figsize=(12, 6))
sns.histplot(linux_logs['timestamp'], bins=50, kde=True)
plt.title('Log Frequency Over Time')
plt.xlabel('Timestamp')
plt.ylabel('Frequency')
plt.show()

## Conclusion

This notebook provides a starting point for exploratory data analysis of logs in the SIEM system. Further analysis can be conducted to identify specific patterns, anomalies, and insights.