Import Library

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


### Importing Libraries

In this section, we import the necessary libraries:
- `pandas` for data manipulation and analysis.
- `matplotlib.pyplot` for creating static, animated, and interactive visualizations.
- `seaborn` for making statistical graphics in Python.


In [None]:
# Load the cleaned dataset
cleaned_file_path = '/Users/username/Documents/git/CodeUcapstone/cleaned_cybersecurity_attacks.csv'
df = pd.read_csv(cleaned_file_path)


### Loading the Dataset

We load the cleaned dataset from a CSV file into a pandas DataFrame. This dataset contains information about cybersecurity attacks.


In [None]:
# Display the first few rows of the dataset to inspect
print("Initial dataset:")
print(df.head())


### Inspecting the Dataset

We display the first few rows of the dataset to understand its structure and contents.


In [None]:
# Ensure the 'Timestamp' column is correctly parsed as datetime
df['Timestamp'] = pd.to_datetime(df['Timestamp'])


### Parsing the Timestamp Column

The 'Timestamp' column is converted to a datetime format to facilitate time-based analysis.


In [None]:
# Extract year from the 'Timestamp' column
df['Year'] = df['Timestamp'].dt.year


### Extracting Year

We extract the year from the 'Timestamp' column and store it in a new column called 'Year'.


In [None]:
# Visualization 1: Bar Plot (number of attacks per year)
plt.figure(figsize=(10, 6))
attacks_per_year = df['Year'].value_counts().sort_index()
sns.barplot(x=attacks_per_year.index, y=attacks_per_year.values, palette='viridis')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')
plt.title('Number of Cybersecurity Attacks per Year')
plt.xticks(rotation=45)
plt.show()


### Visualization 1: Bar Plot of Attacks per Year

This bar plot shows the number of cybersecurity attacks per year. The x-axis represents the year, and the y-axis represents the number of attacks. We use a 'viridis' color palette to enhance visual appeal.


In [None]:
# Check the dataframe structure and summary statistics
print("DataFrame info:")
print(df.info())
print("DataFrame description:")
print(df.describe())


### Checking DataFrame Structure and Summary Statistics

We examine the structure of the DataFrame and generate summary statistics to better understand the data.


In [None]:
# Visualization 2: Heatmap (correlation matrix of numerical features)
plt.figure(figsize=(10, 8))
numerical_columns = df.select_dtypes(include='number').columns
correlation_matrix = df[numerical_columns].corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Matrix of Features')
plt.show()


### Visualization 2: Heatmap of Correlation Matrix

This heatmap displays the correlation matrix of numerical features in the dataset. It helps us understand the relationships between different numerical variables. The color map used is 'coolwarm'.


In [None]:
# Visualization 3: Line Plot (trend of attacks over time)
plt.figure(figsize=(10, 6))
df.set_index('Timestamp', inplace=True)
attacks_per_month = df.resample('M').size()
sns.lineplot(x=attacks_per_month.index, y=attacks_per_month.values, marker='o')
plt.xlabel('Date')
plt.ylabel('Number of Attacks')
plt.title('Trend of Cybersecurity Attacks Over Time')
plt.xticks(rotation=45)
plt.show()


### Visualization 3: Line Plot of Attacks Over Time

This line plot illustrates the trend of cybersecurity attacks over time. The x-axis represents the date, and the y-axis represents the number of attacks per month. Markers are used to highlight each data point.


In [None]:
# Visualization 4: Box Plot (distribution of anomaly scores by day)
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 8))
sns.boxplot(x='Day', y='Anomaly Scores', data=df, palette='viridis')
plt.xlabel('Day')
plt.ylabel('Anomaly Scores')
plt.title('Distribution of Anomaly Scores by Day')
plt.xticks(rotation=45)
plt.show()


### Visualization 4: Box Plot of Anomaly Scores by Day

This box plot shows the distribution of anomaly scores by day of the week. The x-axis represents the day of the week, and the y-axis represents the anomaly scores. The 'viridis' color palette is used to enhance visual differentiation.
