## AML_STR_Detector_ML_V2 – Unsupervised Anomaly Detection using Isolation Forest

**_A machine learning–driven financial crime detection system that simulates real-world transaction behavior and flags anomalous activities for STR generation using Isolation Forest_**

## Step 1: Generate Synthetic Transaction Dataset

We generate a synthetic dataset to simulate real-world customer transactions. This includes features like transaction amount, type, frequency, channel, and customer risk profile. This data will be used to train the Isolation Forest model to detect anomalous behavio.


In [2]:
import pandas as pd
import numpy as np
import random

# Set random seed for reproducibility
np.random.seed(42)

# Define parameters
num_customers = 20
num_transactions = 500

# Sample feature pools
transaction_types = ['NEFT', 'IMPS', 'RTGS', 'Crypto', 'SWIFT']
locations = ['India', 'Germany', 'USA', 'UAE', 'Singapore', 'Nigeria']
channels = ['Web', 'Mobile', 'ATM', 'POS']
times = ['Morning', 'Afternoon', 'Night']
risk_bands = ['Low', 'Medium', 'High']

# Generate customer IDs
customer_ids = [f'CUST{i:03d}' for i in range(1, num_customers + 1)]

# Generate synthetic data
transactions = []

for i in range(num_transactions):
    customer_id = random.choice(customer_ids)
    transaction = {
        'transaction_id': f'TXN{i:05d}',
        'customer_id': customer_id,
        'amount': round(np.random.exponential(scale=10000), 2),  # Skewed amount distribution
        'transaction_type': random.choice(transaction_types),
        'location_country': random.choice(locations),
        'channel': random.choice(channels),
        'time_of_day': random.choice(times),
        'txn_frequency_30d': np.random.poisson(lam=15),
        'account_age_days': np.random.randint(30, 2000),
        'customer_risk_band': random.choices(risk_bands, weights=[0.6, 0.3, 0.1])[0]
    }
    transactions.append(transaction)

# Convert to DataFrame
df_txns = pd.DataFrame(transactions)

# Save to CSV
df_txns.to_csv("synthetic_transactions.csv", index=False)

print("Synthetic transaction dataset created and saved as 'synthetic_transactions.csv'")

Synthetic transaction dataset created and saved as 'synthetic_transactions.csv'


## Step 2: Preprocess and Encode Data

To train the Isolation Forest model, we need to prepare the data:
- Encode categorical variables (e.g., transaction_type, channel)
- Scale numeric features if needed
- Drop ID columns that don't add value for anomaly detetion


In [5]:
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# Copy original data for processing
df_processed = df_txns.copy()

# Drop non-feature columns
df_processed = df_processed.drop(['transaction_id', 'customer_id'], axis=1)

# Encode categorical variables
categorical_cols = ['transaction_type', 'location_country', 'channel', 'time_of_day', 'customer_risk_band']
label_encoders = {}

for col in categorical_cols:
    le = LabelEncoder()
    df_processed[col] = le.fit_transform(df_processed[col])
    label_encoders[col] = le  # Store encoders for decoding later if needed

print("Data preprocessing complete. Encoded and ready for model training.")
df_processed.head()

Data preprocessing complete. Encoded and ready for model training.


Unnamed: 0,amount,transaction_type,location_country,channel,time_of_day,txn_frequency_30d,account_age_days,customer_risk_band
0,4692.68,0,0,0,1,16,496,1
1,1053.33,2,0,2,2,14,901,1
2,12312.5,1,5,3,1,19,1245,2
3,48551.15,4,1,1,1,16,282,2
4,5655.37,1,1,0,0,12,1735,1


## Step 3: Train Isolation Forest and Detect Anomalies

The Isolation Forest algorithm is trained on the processed transaction data. It learns the normal patterns and flags outliers as anomalies (suspicious transactions.


In [8]:
from sklearn.ensemble import IsolationForest

# Initialize Isolation Forest
model = IsolationForest(n_estimators=100, contamination=0.05, random_state=42)

# Train the model
model.fit(df_processed)

# Predict anomalies
df_txns['anomaly_score'] = model.decision_function(df_processed)
df_txns['anomaly_flag'] = model.predict(df_processed)  # -1 = anomaly, 1 = normal
df_txns['anomaly_flag'] = df_txns['anomaly_flag'].map({1: 'Normal', -1: 'Anomalous'})

# Show summary
print("Anomaly detection completed.")
df_txns['anomaly_flag'].value_counts()

Anomaly detection completed.


anomaly_flag
Normal       475
Anomalous     25
Name: count, dtype: int64

## Step 4: Generate Alert Metadata and Export to Excel

We now generate STR-style metadata for each anomalous transaction flagged by the Isolation Forest model. Each alert includes:
- Alert ID
- Risk Signal narrative
- Red Flags
- STR Status
- Date/Time
- Channel
- Compliance Reference
The output is exported to `ml_alerts_output.xlsx` for audit and regulatory use.

In [11]:
from datetime import datetime

# Filter only anomalies
df_alerts = df_txns[df_txns['anomaly_flag'] == 'Anomalous'].copy()
df_alerts.reset_index(drop=True, inplace=True)

# Generate alert metadata
df_alerts['alert_id'] = ['ALERT_ML_' + str(i+1).zfill(4) for i in range(len(df_alerts))]
df_alerts['rule_trigger'] = 'ML - Isolation Forest'
df_alerts['rule_severity'] = 'High'
df_alerts['risk_score'] = np.random.randint(75, 95, len(df_alerts))  # Randomized for now

# Red flag narrative (can be smarter in V3)
df_alerts['red_flags'] = 'Unusual transaction pattern detected via ML anomaly detection'

# Articulated narrative
df_alerts['transaction_articulation'] = (
    'Transaction flagged by ML model as anomalous based on behavior patterns, '
    'customer risk profile, and transaction attributes.'
)

# Compliance references
df_alerts['compliance_reference'] = 'FATF Rec 20, Basel Principle 3, BaFin Anomaly Risk'

# STR Status
df_alerts['str_status'] = 'STR_TO_BE_FILED'

# Timestamp
df_alerts['detection_timestamp'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

# Final selected columns for export
export_columns = [
    'alert_id', 'transaction_id', 'customer_id', 'amount', 'transaction_type',
    'location_country', 'channel', 'time_of_day', 'txn_frequency_30d',
    'account_age_days', 'customer_risk_band', 'risk_score', 'rule_trigger',
    'rule_severity', 'anomaly_score', 'red_flags', 'transaction_articulation',
    'compliance_reference', 'str_status', 'detection_timestamp'
]

df_alerts_final = df_alerts[export_columns]

# Export to Excel
df_alerts_final.to_excel("ml_alerts_output.xlsx", index=False)
print("ML-based alert report exported to 'ml_alerts_output.xlsx'")

ML-based alert report exported to 'ml_alerts_output.xlsx'


## Summary: AML_STR_Detector_ML_V2 – Isolation Forest-Based Anomaly Detection

This module uses unsupervised machine learning to detect suspicious financial transactions without predefined rules. The key steps are:

1. **Synthetic Data Generation**  
   Created 500 synthetic transactions with attributes like amount, channel, risk band, and transaction type.

2. **Data Preprocessing**  
   Encoded categorical variables and removed identifiers to prepare the data for model training.

3. **Model Training (Isolation Forest)**  
   Trained the model to learn normal behavior patterns and detect anomalies based on rarity and isolation.

4. **Anomaly Detection**  
   Flagged 25 transactions (5%) as anomalous — potentially suspicious — based on model scoring.

5. **Alert Generation**  
   Created structured alert metadata including:
   - Alert ID, Red Flags, Risk Score
   - Transaction Articulation
   - Compliance References (FATF, Basel, BaFin)
   - STR Filing Status and Timestamp

6. **Excel Export**  
   Final alerts exported to `ml_alerts_output.xlsx` for audit and compliance review.

This system forms the ML-based upgrade to the rule-based engine inAML_STR_Detector_Lite_V1.
