## Exploring Attack Surface Management Data with Pandas and Seaborn

This notebook demonstrates how to import and analyze an **Attack Surface Management (ASM)** dataset.  
We'll use **pandas** for data manipulation and **seaborn** for visualization.

**Setup!**

You are a new analyst in your security operations center. As a new member of the team, you have been asked to explore the **attack surface** of your organization. The **attack surface** is the set of points on the boundary of a system, a system element, or an environment where an attacker can try to enter, cause an effect on, or extract data from, that system, system element, or environment.

As a member of an organization's security team, it is imperative that you have an understanding of what you are trying to keep secure-- your data, assets, and even your personnel! **YOU CANNOT SECURE WHAT YOU DON"T KNOW** 

We will explore the assets in your organization as part of this exercise.


### Key Questions:
- What is the distribution of risk levels?
- How many assets are in the cloud vs. on-prem?
- Which services are most exposed?
- How does vulnerability count vary across risk levels?

In [1]:
# Packages to import

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Matplotlib is building the font cache; this may take a moment.


In [None]:

# Configure visualization style
sns.set_style("whitegrid")

# Load dataset (Update path if necessary)
file_path = "fake_tenable_asm_us_canada.csv"  # Ensure the file is in the same directory
df = pd.read_csv(file_path)

# Display first few rows
df.head()

In [None]:
# Display dataset info
df.info()

# Check for missing values -- a good idea if you want to check the quality of your data!
df.isnull().sum()

# Summary statistics for numerical columns
df.describe()

In [None]:
# Count plot for risk levels
plt.figure(figsize=(8, 5))
sns.countplot(data=df, x="Risk Level", order=["Critical", "High", "Medium", "Low", "Info"], palette="Reds_r")
plt.title("Distribution of Risk Levels")
plt.xlabel("Risk Level")
plt.ylabel("Count")
plt.show()

In [None]:
# Count cloud vs. on-prem assets
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="Cloud Provider", order=["AWS", "Azure", "GCP", "DigitalOcean", "Linode", "On-Premises"], palette="Blues_d")
plt.title("Cloud vs. On-Prem Assets")
plt.xticks(rotation=30)
plt.xlabel("Hosting Provider")
plt.ylabel("Count")
plt.show()

In [None]:
# Count most common exposed services
plt.figure(figsize=(8, 5))
sns.countplot(data=df, y="Service", order=df["Service"].value_counts().index, palette="viridis")
plt.title("Most Exposed Services")
plt.xlabel("Count")
plt.ylabel("Service")
plt.show()

In [None]:
# Box plot: Risk Level vs. Vulnerabilities Count
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x="Risk Level", y="Vulnerabilities Count", order=["Critical", "High", "Medium", "Low", "Info"], palette="coolwarm")
plt.title("Vulnerabilities Count by Risk Level")
plt.xlabel("Risk Level")
plt.ylabel("Vulnerabilities Count")
plt.ylim(0, df["Vulnerabilities Count"].quantile(0.95))  # Limit extreme outliers
plt.show()

## Key Takeaways:
- **Risk Level Distribution**: Helps prioritize mitigation efforts.
- **Cloud vs. On-Prem Assets**: Identifies potential exposure in cloud environments.
- **Exposed Services**: Highlights commonly exposed attack vectors.
- **Risk vs. Vulnerabilities**: Shows correlation between risk level and detected issues.

### Next Steps:
- Drill down into specific IPs and domains for targeted mitigation.
- Identify misconfigured or outdated technologies.
- Monitor high-risk assets for frequent scanning.