<a href="https://colab.research.google.com/github/guilhermelaviola/CybersecurityProblemSolvingWithDataScience/blob/main/Class05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Threat Modeling**
Threat modeling is a continuous and essential information security practice that helps organizations identify, analyze, and mitigate risks to systems, applications, and data. It relies on high-quality data preparation, careful selection of relevant variables, and collaboration across disciplines to focus on meaningful threats. By applying appropriate modeling techniques, regularly evaluating results, and integrating data science and machine learning, organizations can proactively detect emerging threats, optimize security resources, and embed security throughout the software development lifecycle, while still relying on human expertise and organizational security awareness.

In [1]:
# Importing all the necessary libraries and resources:
import pandas as pd
from sklearn.ensemble import IsolationForest

## **Threat Modeling example**
The example below shows how Python could be used to support threat modeling by identifying anomalies in security log data using a machine learning approach.

In [2]:
# Example security log data:
data = {
    'failed_logins': [2, 3, 1, 50, 2, 3],
    'data_transfer_mb': [10, 12, 9, 500, 11, 10]
}

df = pd.DataFrame(data)

# Training an anomaly detection model:
model = IsolationForest(contamination=0.1, random_state=42)
df['anomaly'] = model.fit_predict(df)

# Marking suspicious events:
df['anomaly'] = df['anomaly'].map({1: 'Normal', -1: 'Potential Threat'})

print(df)

   failed_logins  data_transfer_mb           anomaly
0              2                10            Normal
1              3                12            Normal
2              1                 9            Normal
3             50               500  Potential Threat
4              2                11            Normal
5              3                10            Normal


## **Threat Risk Scoring**
The example below shows risk scoring, which is a common task in threat modeling. It combines the likelihood and impact of threats to help prioritize security actions.

In [3]:
# Example threat data:
data = {
    'threat': [
        'SQL Injection',
        'Phishing Attack',
        'Malware Infection',
        'Insider Threat'
    ],
    'likelihood': [4, 5, 3, 2], # Scale: 1 (low) to 5 (high)
    'impact': [5, 4, 4, 5] # Scale: 1 (low) to 5 (high)
}

df = pd.DataFrame(data)

# Calculating risk score:
df['risk_score'] = df['likelihood'] * df['impact']

# Classifying the risk level:
def classify_risk(score):
    if score >= 20:
        return 'Critical'
    elif score >= 12:
        return 'High'
    elif score >= 6:
        return 'Medium'
    else:
        return 'Low'

df['risk_level'] = df['risk_score'].apply(classify_risk)

print(df)

              threat  likelihood  impact  risk_score risk_level
0      SQL Injection           4       5          20   Critical
1    Phishing Attack           5       4          20   Critical
2  Malware Infection           3       4          12       High
3     Insider Threat           2       5          10     Medium
