# Astra Guardian: Model Testing and Demonstration

This notebook demonstrates the full functionality of the Astra Guardian system. It loads the trained models and artifacts to classify network traffic into one of three categories:\n\n1. **Known Application**: Traffic that is identified as a known application (e.g., Google, YouTube).\n2. **Unknown Anomaly**: Traffic that does not conform to the patterns of known, normal applications.\n3. **Masquerading Threat**: Traffic that is classified as a known application but exhibits anomalous behavior.

## 1. Load Artifacts and Data

In [None]:
import joblib\nimport pandas as pd\nimport numpy as np\nfrom tensorflow.keras.models import load_model\nfrom sklearn.metrics import accuracy_score, classification_report\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\nsns.set_style('whitegrid')

In [None]:
# Load the saved artifacts\nautoencoder = load_model('../artifacts/autoencoder.h5')\nclassifier = joblib.load('../artifacts/classifier.joblib')\nscaler = joblib.load('../artifacts/scaler.joblib')\nlabel_encoder = joblib.load('../artifacts/label_encoder.joblib')

In [None]:
# Load the test data (we'll re-create it for demonstration)\nfrom astra_guardian.preprocessing import clean_and_prepare_data, encode_and_split_data\n\ndf = pd.read_csv('../data/Dataset-Unicauca-Version2-87Atts.csv')\ndf_clean = clean_and_prepare_data(df)\n_, X_test, _, y_test, _, _ = encode_and_split_data(df_clean)

## 2. Evaluate the Classifier

In [None]:
y_pred = classifier.predict(X_test)\naccuracy = accuracy_score(y_test, y_pred)\nprint(f'Classifier Accuracy: {accuracy:.4f}')\nprint('\nClassification Report:')\nprint(classification_report(y_test, y_pred, target_names=label_encoder.classes_))

## 3. Establish Anomaly Threshold

To identify anomalies, we calculate the reconstruction error for each data point. We'll establish a threshold based on the distribution of these errors for the test set. Any data point with an error above this threshold will be considered an anomaly.

In [None]:
X_test_pred = autoencoder.predict(X_test)\nmse = np.mean(np.power(X_test - X_test_pred, 2), axis=1)\n\nplt.figure(figsize=(10, 6))\nsns.histplot(mse, bins=50, kde=True)\nplt.title('Reconstruction Error Distribution')\nplt.xlabel('Mean Squared Error')\nplt.ylabel('Frequency')\nplt.show()

In [None]:
# Set the anomaly threshold (e.g., 95th percentile)\nanomaly_threshold = np.quantile(mse, 0.95)\nprint(f'Anomaly Threshold (95th percentile): {anomaly_threshold:.4f}')

## 4. Identify Masquerading Threats

A masquerading threat is a flow that the classifier confidently identifies as a known application, but the anomaly detector flags as anomalous. We'll find examples of these by looking for high-confidence predictions with high reconstruction errors.

In [None]:
y_pred_proba = classifier.predict_proba(X_test)\nconfidence = np.max(y_pred_proba, axis=1)\n\n# Find masquerading threats\nmasquerading_threats = (confidence > 0.9) & (mse > anomaly_threshold)\n\nprint(f'Found {np.sum(masquerading_threats)} potential masquerading threats.')

In [None]:
# Display some examples\nmasquerading_df = pd.DataFrame({'y_true': y_test[masquerading_threats], 'y_pred': y_pred[masquerading_threats], 'confidence': confidence[masquerading_threats], 'mse': mse[masquerading_threats]})\nmasquerading_df['y_true'] = label_encoder.inverse_transform(masquerading_df['y_true'])\nmasquerading_df['y_pred'] = label_encoder.inverse_transform(masquerading_df['y_pred'])\n\nprint('Examples of Masquerading Threats:')\nmasquerading_df.head(10)