# ML System Design Class Notes: Real-Time Fraud Analytics Case Study

## Overview
In this class, we explored the design and implementation of real-time fraud detection systems in the financial technology (fintech) domain, particularly focused on transaction processing. The class discussed various statistical tests, performance metrics, AWS services, and the architecture necessary for building an efficient and scalable fraud detection system.

---

## What are the metrics used in Fraud Analytics in Transactions?

Fraud analytics employs several metrics to evaluate model performance. Understanding these metrics is crucial for assessing the effectiveness of fraud detection models.

### 1. Precision
Precision evaluates the accuracy of positive predictions:
$$ 
\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}} 
$$
- **Interpretation**: High precision means that when the system predicts fraud, it is likely correct. This is crucial in contexts where false positives can lead to significant financial and reputational losses.

### 2. Recall
Recall, also known as sensitivity, assesses the completeness of the positive predictions:
$$ 
\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} 
$$
- **Interpretation**: High recall means that the system successfully identifies a majority of actual fraudulent transactions, vital when failing to detect fraud incurs heavy costs.

### 3. F1 Score
The F1 Score provides a balance between precision and recall:
$$ 
\text{F1 Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} 
$$
- **Interpretation**: It offers a single score to evaluate the model's overall accuracy, especially useful with uneven class distributions.

### 4. AUC-ROC
AUC (Area Under Curve) of the ROC (Receiver Operating Characteristic) curve evaluates how well the model distinguishes between classes across different threshold settings.

### 5. Confusion Matrix
A confusion matrix is a table that visualizes the performance of an algorithm by showing True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values. This matrix provides insights into where the model is making errors.

---

## Difference between Precision, Recall, and F1 Score

Choosing between precision, recall, and the F1 score depends on the context:

- Use **Precision** when the cost of a false positive is high (e.g., blocking legitimate transactions).
- Use **Recall** when the cost of a false negative is high (i.e., missing a fraudulent transaction).
- Use **F1 Score** when it is crucial to maintain a balance between precision and recall, especially with uneven class distribution.

### Summary of Trade-offs
Considerations for metrics in fraud detection:
- High precision minimizes customer dissatisfaction from false blocks.
- High recall enhances overall security by catching as many fraudulent transactions as possible.
- The F1 score is crucial in scenarios where you cannot compromise on either of the two.

---

## What are AWS Services Used for End-to-End Real-Time Fraud Analytics?

The architecture of a real-time fraud detection system in AWS comprises various services:

### 1. Amazon Kinesis Data Streams
Used for ingesting high-throughput transactions from diverse sources, capable of handling up to **10,000 transactions per second**.

### 2. Amazon Kinesis Data Analytics (Apache Flink)
Utilized for real-time data analytics and processing — allowing for the immediate analysis of incoming transaction data.

### 3. DynamoDB
Provides low-latency lookups, enabling swift access to metadata related to transactions.

### 4. Amazon SageMaker
Facilitates model training, hosting, and monitoring of machine learning models.

### 5. AWS Lambda
Allows for executing code in response to transactions without the need for provisioning servers.

### 6. Amazon API Gateway
To create and manage APIs that facilitate communication between the fraud detection service and other applications.

### 7. Amazon SNS (Simple Notification Service)
Used for sending SMS or email alerts to customers or customer service regarding flagged transactions.

```mermaid
flowchart TD
    A[Amazon Kinesis Data Streams]
    B[Amazon Kinesis Data Analytics]
    C[DynamoDB]
    D[Amazon SageMaker]
    E[AWS Lambda]
    F[Amazon API Gateway]
    G[Amazon SNS]

    A -->|Transaction Ingestion| B
    B --> C
    B --> D
    D --> E
    E --> F
    E --> G
```

---

## 10,000 Transactions per Second - High Throughput

To achieve high throughput for processing transactions using **Amazon Kinesis Data Streams**, sharding is essential:

### Sharding
- Divide Kinesis data streams into shards.
- Each shard has a capacity of **1 MB/sec** input or **1,000 records per second**.
  
#### Configuration Example
For managing **10,000 transactions per second**, configure **10 shards**. This architecture is critical for maintaining throughput during peak transaction times.

---

## What Happens in a Lifestyle Store POS Machine?

When a transaction occurs in a Point of Sale (POS) system:

1. The POS machine captures transaction data (e.g., transaction ID, user ID, amount, timestamp, merchant ID).
2. This data is sent to the bank's transaction processing system, which aggregates and analyzes it for fraud detection.

### Who Will Get Maximum Share of 2.5% of Sales?
The **Issuer Bank (e.g., ICICI)** usually receives the largest portion of the transaction fee due to the risks associated with credit issuance.

### Why Does the Issuer Bank Receive Maximum Share?
The issuer bank is primarily responsible for:
- Credit approval
- Fraud investigation
- Handling chargebacks
Thus, it incurs more risk compared to other parties involved in the transaction.

---

## Real-Time Feature Engineering

The **Apache Flink** backend can perform real-time feature engineering tasks such as:

- Generating rolling aggregates (e.g., cumulative spending over the last hour).
- Enriching transaction data by cross-referencing with user or merchant data stored in **DynamoDB**.

### Example of a Real-Time Feature
- **Rolling Spend**: Amount spent in the last 1 hour that can enhance the fraud detection model greatly.

---

## What is Dwell Time?
**Dwell time** refers to the amount of time a customer spends in a store before they leave. Leveraging this metric can help analyze purchasing patterns and signal potential fraudulent transactions.

---

## Where to Store the Data?

For persistent storage of transaction data and model artifacts, **Amazon S3** is the recommended solution. It serves as a staging area for data analysis and model training, allowing for durability and availability.

---

## Which AWS Service Used for Training, Hosting, and Monitoring Model?
**Amazon SageMaker** is specifically designed for the entire machine learning lifecycle, from data preparation to model deployment.

---

## Which Service for Batch Feature Engineering for Historical Data?

**Amazon EMR** (Elastic MapReduce) is utilized for batch processing and large-scale feature engineering of historical data.

---

## Explain Anomaly Detection Methods

### Isolation Forest
Isolation Forest is an anomaly detection algorithm that partitions data into isolated sections. It identifies anomalies based on the number of partitions needed to isolate data points.

### Autoencoder
An autoencoder is a type of neural network designed to learn efficient representations of data. It comprises two main components:
- **Encoder**: Compresses data into a lower-dimensional representation.
- **Decoder**: Reconstructs the original data from the encodings.

Anomalies are detected through reconstruction errors (i.e., how far the predicted values are from the original values).

---

## Supervised vs Unsupervised Anomaly Detection

### Unsupervised Anomaly Detection
Algorithms like Isolation Forest and Autoencoders do not require labeled data. They learn normal patterns and recognize anomalies based on deviations from these patterns.

### Supervised Anomaly Detection
Involves training models using labeled data (fraud vs. non-fraud). It can include methods like logistic regression and decision trees, which rely on previously labeled datasets.

---

## What is the Intent to Have Two End Points, One from XGBoost and One from Autoencoders?

Having both models allows for:

- **Comparative Analysis**: Testing and comparing performance to yield more reliable outcomes.
- **Hybrid Approach**: Leveraging the strengths of both supervised (XGBoost) and unsupervised (Autoencoders) methods to improve overall detection accuracy.

### What is the Loss Function?
The loss function used in autoencoders is typically the **Mean Squared Error (MSE)**, which quantifies the average of the squares of the errors, representing how far predicted values are from actual values.

---

## What is Location Velocity?
**Location Velocity** gauges the speed at which a customer moves between transactions based on geographic locations, serving as a useful feature for identifying anomalous behaviors.

## What is Geo-Velocity?
**Geo-Velocity** indicates average speed calculations that assist in analyzing transaction legitimacy based on the time and distance traveled between significant transactions. This metric helps highlight instances of impossible travel, which could indicate fraud.

---

## Features for Blocking Transactions

Identification of high-risk transactions may rely on:
1. **Fraud Score > 0.9**
2. **Transaction Amount > $1,000**
3. **Geo-Velocity Analysis for Impossible Travel**

---

## How Blocking Happens

1. **AWS API Gateway**: Facilitates requests to a backend processing system to block fraudulent transactions.
2. **AWS Lambda**: Automatically triggers responses based on predefined conditions to review transactions and decide on blocking.
3. **Amazon SNS**: Sends notifications and alerts via SMS or email to respective stakeholders about flagged transactions.

---

## Monitoring and Retraining Phase

### Data Drift Detection
To stay effective over time, it's necessary to monitor for data drift. Statistical tests such as **KL Divergence**, **Chi-Squared Test**, or **KS Test** can evaluate shifts in data distribution over time.

### Experimental Model A vs. Model B
When comparing performance between two models, statistical tests can be utilized:
- **Z-Test** for proportions if comparing the success rates of two proportions.
- **T-Test** to compare means between two groups, particularly with small sample sizes.

---

### Proportion Example
If the historical fraud rate is 0.1% and today, out of 10,000 transactions, 120 were observed as fraudulent, you'd compare this against an expected count of 100. This comparison (observed - expected = 120 - 100 = 20) can be statistically tested using a Z-Test or T-Test to confirm if the observed increase is significant.

---

## Cost Efficiency - CPU vs. GPU

When considering resource allocation for model training:
- **GPU instances**: Ideal for deep learning tasks that require significant computational power.
- **CPU instances**: Often more cost-effective for traditional machine learning algorithms like XGBoost.

### Key Considerations
- **Training Complexity**: More complex models may necessitate the use of GPUs.
- **Budget Constraints**: CPUs may be more suitable for simpler models with lower training costs.

---

## What is Combined Fraud Score?

The combined fraud score merges the outputs from both XGBoost (providing predictive probability) and autoencoders (providing reconstruction error). By weighting these scores, practitioners can achieve a unified output that enhances overall detection accuracy.

---

## Additional Resources

For comprehensive insights on implementing statistical hypothesis tests in Python, refer to the [Statistical Hypothesis Tests in Python Cheat Sheet](https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/).

---

This class provided a comprehensive overview of the architecture and considerations necessary for effective real-time fraud detection in transactions. For further questions or clarification, please feel free to ask! 

```mermaid
flowchart TD
    A[POS Machine] -->|Transaction Data| B[Kinesis Data Streams]
    B --> C[Kinesis Data Analytics\n Apache Flink]
    C -->|Real-Time Features| D[DynamoDB\n Metadata Lookup]
    C --> E[SageMaker\n Fraud Prediction]
    E -->|Fraud Score| F[Lambda\n Decision Engine]
    F -->|Block/Allow| G[API Gateway]
    F -->|Alert| H[SNS]
    G --> I[Bank/Customer Systems]
```

---

## Key Metrics in Fraud Analytics

### 1. Precision vs. Recall Trade-off
Understanding when to prioritize precision or recall is fundamental to developing effective fraud detection systems.

```mermaid
pie
    title Metric Selection Criteria
    "Precision  Minimize False Positives" : 40
    "Recall  Minimize False Negatives" : 60
```
- Precision is critical when blocking legitimate transactions is costly, while recall is essential when missing fraud is unacceptable.

### 2. Confusion Matrix
Visualization of true and false predictions helps in understanding model performance.

```mermaid
flowchart LR
    A[Actual Fraud] -->|True Positive| B[Predicted Fraud]
    A -->|False Negative| C[Predicted Legit]
    D[Actual Legit] -->|False Positive| B
    D -->|True Negative| C
```

---

## AWS Architecture for 10K TPS

### High-Throughput Design

```mermaid
flowchart LR
    subgraph Kinesis
        B[Shard 1\n1K TPS]
        C[Shard 2\n1K TPS]
        D[...]
        E[Shard 10\n1K TPS]
    end
    A[POS Machines] --> Kinesis
    Kinesis --> F[Flink Processing]
```

- **Sharding**: 10 shards handle 10K transactions/sec with 1K TPS per shard.

---

## Fraud Detection Techniques

### Anomaly Detection Methods
Utilizing various methods for detecting anomalies enhances fraud prevention capabilities.
```mermaid
graph TD
    A[Anomaly Detection] --> B[Supervised\nXGBoost]
    A --> C[Unsupervised\nAutoencoder]
    C --> D[Reconstruction Error]
    B --> E[Probability Score]
    D & E --> F[Combined Fraud Score]
```

### Hybrid Approach Benefits
- **XGBoost** utilizes labeled historical data for prediction.
- **Autoencoder** captures novel patterns of fraud through unsupervised learning.

---

## Feature Engineering

### Real-Time Features
| Feature              | Calculation                           | Example Value         |
|---------------------|---------------------------------------|-----------------------|
| Location Velocity    | Distance/ Time between transactions    | 500 km/h - Alert Fraud! |
| Rolling Spend        | Cumulative sum of amount in last 1 hour | $2,000                |

---

## Transaction Blocking Logic

Decision flow for blocking transactions based on evaluation criteria.

```mermaid
flowchart TD
    A[Fraud Score > 0.9?] -->|Yes| B[Block]
    A -->|No| C[Amount > $1K?] -->|Yes| D[Manual Review]
    C -->|No| E[Allow]
```

---

## Monitoring & Retraining

### Data Drift Detection
Monitoring for shifts in data distribution to ensure model effectiveness.

```mermaid
flowchart LR
    A[New Transactions] --> B[KS Test vs. Training Data]
    B -->|Drift Detected| C[Retrain Model]
```

---

### Concepts of MLOps, LLMOps, and DevOps

```mermaid
mindmap
    root((MLOps vs. LLMOps vs. DevOps))
        MLOps
            Focus: Machine Learning Models
            Key Tools: SageMaker, MLflow
            Challenges: Data drift, model retraining
        LLMOps
            Focus: Large Language Models
            Key Tools: LangChain, LLamaIndex
            Challenges: Prompt engineering, hallucination control
        DevOps
            Focus: Software Deployment
            Key Tools: Jenkins, Kubernetes
            Challenges: CI/CD, infrastructure scaling
```

### Detailed Comparison
| Aspect             | MLOps                  | LLMOps                 | DevOps               |
|--------------------|------------------------|------------------------|----------------------|
| **Primary Goal**    | Deploy ML models       | Deploy LLM pipelines    | Deploy software      |
| **Key Tools**      | SageMaker, Kubeflow    | LangChain, Weaviate    | Docker, Kubernetes   |
| **Unique Challenges**| Data versioning       | Prompt management       | Infrastructure as Code|
| **Testing**        | Model accuracy tests    | Prompt effectiveness    | Unit/integration tests|

### Summary
- **MLOps**: Optimized for traditional machine learning scenarios such as fraud detection models.
- **LLMOps**: Addresses LLM-specific requirements, including RAG (Retrieve and Generate) pipelines.
- **DevOps**: Aiming for efficient software deployment processes across platforms.

---