# Network Infrastructure Risk Recommender System

This notebook demonstrates how to build a predictive analytics system to identify potential risks in network infrastructure projects using historical data (or simulated data) and your risk register.

It uses:
- **scikit-learn** for modeling
- **matplotlib & seaborn** for static visualizations
- **plotly** for interactive visualizations

## Goals
- Load and explore risk register data
- (Optionally) combine with historical network failure/maintenance data
- Encode categorical features for ML
- Train a simple recommender to suggest risk mitigations
- Visualize risk patterns and recommendations
- Provide alternatives if no historical data is available

----

In [ ]:
# Install dependencies (uncomment if using Google Colab)
# !pip install pandas numpy scikit-learn matplotlib seaborn plotly

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import NearestNeighbors

## 1. Load Risk Register Data

Replace the filename below with the path to your CSV risk register.

In [ ]:
# Load your risk register
df = pd.read_csv('network_infrastructure_waterfall_risk_register.csv')
df.head()

## 2. Exploratory Data Analysis

Let's explore the most common risk types, categories, and recommended actions.

In [ ]:
# Static visualization: Risk type distribution
plt.figure(figsize=(10,5))
sns.countplot(y='Risk Type', data=df, order=df['Risk Type'].value_counts().index)
plt.title('Distribution of Risk Types')
plt.xlabel('Count')
plt.ylabel('Risk Type')
plt.show()

In [ ]:
# Interactive visualization: Risk Category
fig = px.histogram(df, x='Risk Category', color='Severity', barmode='group', title='Risk Category by Severity')
fig.show()

## 3. Encode Categorical Features

Machine learning models require numeric features. We'll encode the categorical columns for modeling.

In [ ]:
categorical_cols = ['Risk Category', 'Risk Type', 'Severity', 'Likelihood', 'Status', 'RMS Step']
encoders = {col: LabelEncoder() for col in categorical_cols}
for col in categorical_cols:
    df[col + '_enc'] = encoders[col].fit_transform(df[col])
df.head()

## 4. Prepare Feature Matrix

We'll use the encoded columns and impact score as model features.

In [ ]:
feature_cols = [col + '_enc' for col in categorical_cols] + ['Impact Score']
X = df[feature_cols]
actions = df['Recommended Action']

## 5. (Optional) Prepare Simulated Historical Data

If you don't have real historical failure/maintenance data, you can generate simulated data for demonstration purposes.

In [ ]:
# Simulate historical failure/maintenance data
np.random.seed(42)
sim_hist_df = pd.DataFrame({
    'FailureType': np.random.choice(['Router Outage','Cable Cut','Power Loss','Software Bug','Firewall Breach'], size=50),
    'MaintenanceAction': np.random.choice(['Replaced Router','Repaired Cable','Installed UPS','Patched Software','Upgraded Firewall'], size=50),
    'Severity': np.random.choice(df['Severity'].unique(), size=50),
    'Likelihood': np.random.choice(df['Likelihood'].unique(), size=50),
    'Impact Score': np.random.randint(5, 10, size=50)
})
sim_hist_df.head()

## 6. Visualize Simulated Historical Data

See how failure types and maintenance actions are distributed.

In [ ]:
sns.countplot(y='FailureType', data=sim_hist_df)
plt.title('Simulated Failure Type Distribution')
plt.show()

fig = px.histogram(sim_hist_df, x='MaintenanceAction', color='FailureType', barmode='group', title='Maintenance Actions by Failure Type')
fig.show()

## 7. Train a Simple Content-Based Recommender

We'll use Nearest Neighbors to recommend actions for a new risk profile.

In [ ]:
nn = NearestNeighbors(n_neighbors=3, metric='euclidean')
nn.fit(X)

def recommend_action(risk_dict):
    query = []
    for col in categorical_cols:
        val = risk_dict[col]
        if val not in encoders[col].classes_:
            val = encoders[col].classes_[0]
        query.append(encoders[col].transform([val])[0])
    query.append(risk_dict['Impact Score'])
    distances, indices = nn.kneighbors([query])
    recommended_actions = actions.iloc[indices[0]].values
    return recommended_actions

## 8. Example Usage

Try the recommender for a sample risk:

In [ ]:
sample_risk = {
    'Risk Category': 'Technical',
    'Risk Type': 'Software Bug',
    'Severity': 'High',
    'Likelihood': 'Medium',
    'Impact Score': 8,
    'Status': 'Open',
    'RMS Step': 'Analysis'
}
rec_actions = recommend_action(sample_risk)
print("Recommended actions for sample risk:")
for action in rec_actions:
    print('-', action)

## 9. Visualize Top Recommended Actions

In [ ]:
top_actions = actions.value_counts().head(10)
plt.figure(figsize=(8,4))
sns.barplot(x=top_actions.values, y=top_actions.index)
plt.title('Top Recommended Actions')
plt.xlabel('Frequency')
plt.ylabel('Recommended Action')
plt.show()

fig = px.bar(x=top_actions.index, y=top_actions.values, title='Top Recommended Actions (Interactive)', labels={'x':'Action','y':'Count'})
fig.show()

## 10. Save Encoder Mappings (for reproducibility)

Use these mappings to interpret encoded values.

In [ ]:
encoder_mappings = {}
for col in categorical_cols:
    mapping = dict(zip(encoders[col].classes_, encoders[col].transform(encoders[col].classes_)))
    encoder_mappings[col] = mapping
encoder_mappings

----
# Summary

- Loaded and explored risk register data
- Simulated historical failure/maintenance data if needed
- Encoded features for ML modeling
- Trained a simple recommender to suggest actions for risks
- Visualized results with both static and interactive plots

You can expand this notebook with more advanced models (Random Forest, XGBoost, etc.), integrate real historical data, or deploy as a web app if needed.

**Questions, feedback, or requests for additional features? Let me know!**