# ASD 2023–24 Cyber Threat Report. Exploratory Analysis

THis notebook loads cleaned dataset and answers the 3 main questions. You have to run cells in order.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import os
%matplotlib inline
BASE = '..'
DATA_PATH = os.path.join(BASE, 'data', 'ASD-Annual-Report-2023-24-Dataset-23-24.csv')
df = pd.read_csv(DATA_PATH)
df.head()

## Dataset summary
We show basic counts and check for key categories.

In [None]:
print('Rows:', len(df))
print('Columns:', list(df.columns))
print('\nCategory value counts:\n', df['category'].value_counts())

### Q1 — When did incident activity peak?

In [None]:
m = df[df['subcategory']=='By month'].copy()
m['month'] = m['metric'].str.extract(r'\((\d{4}-\d{2})\)')
m['count'] = pd.to_numeric(m['value'], errors='coerce')
m = m.sort_values('month')
plt.figure(figsize=(10,4))
plt.plot(m['month'], m['count'], marker='o')
plt.xticks(rotation=45)
plt.title('Monthly incidents (ASD responses) — FY2023–24')
plt.xlabel('Month')
plt.ylabel('Incidents')
plt.grid(axis='y', linestyle='--', alpha=0.4)
plt.tight_layout()
plt.show()
print('\nPeak month:', m.loc[m['count'].idxmax()].month if not m['count'].isna().all() else 'n/a')

### Q2 — CI vs non-CI incident types and vectors

In [None]:
ci = df[df['category']=='Critical Infrastructure']
ci_types = ci[ci['metric'].str.contains('%') | ci['metric'].str.contains('incidents', case=False)]
ci_types[['metric','value','unit']]


### Q3 — Financial impact by business size

In [None]:
biz = df[df['category']=='Business Impact']
print(biz[['metric','value','unit','change_vs_prev_year']])
# BEC losses
bec = df[df['metric'].str.contains('BEC', case=False, na=False)]
print('\nBEC-related rows:')
print(bec[['metric','value','unit','context']])


## Conclusions
- Monthly spikes: March 2024 peak; Jan is lowest.
- CI experiences higher DoS/DDoS and credential compromises.
- BEC accounts for the largest aggregate financial loss in the dataset.

---
Notebook generated programmatically.