# Frame the Problem and Look at the Big Picture

### Define the Objective in Business Terms
The objective is to develop a predictive model to identify fraudulent credit card transactions from a given dataset. This model aims to support financial institutions or payment processors in detecting and preventing fraud in real time, reducing financial losses and protecting customer accounts. The model should focus on prioritizing possible fraudulent transactions but still make sure that it is able to identify true fraudulent transactions as not to inconvenience customers.

### How Will Your Solution Be Used?
The solution will be implemented as part of the transaction processing system, providing real-time fraud classification. When a transaction is flagged as fraudulent, it will either be declined or sent for manual review by a fraud analyst. The client will use the model to improve their fraud detection efficiency, reduce reliance on manual reviews, and maintain customer trust by proactively addressing security concerns.

### What Are the Current Solutions/Workarounds (If Any)?
Currently, most fraud detection systems use rule-based engines that rely on predefined thresholds (e.g., large transactions or unusual locations). While effective to a degree, these systems often struggle with adapting to evolving fraud patterns and result in high false-positive rates. Manual reviews are used to address flagged transactions, which can be labor-intensive, slow, and costly.

### How Should You Frame This Problem?
This is a supervised binary classification problem where the goal is to classify each transaction as either "fraudulent" or "non-fraudulent" based on labeled historical data. The model will likely be deployed in an online setting, operating in real time to classify transactions as they occur. Batch learning can be used during initial training, with periodic updates to adapt to new fraud patterns.

## How Should Performance Be Measured?
- **Metrics:**
  - Precision: To ensure flagged transactions are truly fraudulent, minimizing inconvenience to customers.
  - Recall (Prioritize): To catch as many fraudulent transactions as possible.
  - F1 Score: To balance precision and recall.
  - ROC-AUC: To evaluate overall classifier performance.

### Is the Performance Measure Aligned with the Business Objective?
Yes, the performance measures align with the business objective of reducing fraud while maintaining a positive customer experience. Emphasis should be placed on recall to ensure fraud is detected, but precision is also critical to prevent disruption for legitimate customers.

### What Would Be the Minimum Performance Needed to Reach the Business Objective?
The model must outperform the current rule-based system in terms of fraud detection rate while maintaining or improving the false-positive rate. For example, achieving at least 90% recall with a precision above 80% might be a realistic goal to satisfy business requirements.

### What Are Comparable Problems? Can You Reuse Experience or Tools?
- Comparable problems include the **Squirrel Prediction Model** and **Early Spring Prediction Model**, as they both involved working with a binary classification problem.
- Tools such as the **DBScan algorithm** used in the Squirrel model might be useful to find clusters as well as other clustering algorithms used in previous works such the notebook **Clustering**.

### Is Human Expertise Available?
Human expertise is available in the form of fraud analysts who currently review flagged transactions. Their domain knowledge can guide feature selection and model validation. Analysts can also provide labeled data to improve model performance and adapt it to emerging fraud trends.

### How Would You Solve the Problem Manually?
Manual fraud detection involves analyzing transaction patterns for anomalies such as:
- Unusual transaction locations or times.
- Large transaction amounts.
- Multiple small transactions in a short time span.  
Analysts would cross-reference transaction data with customer profiles and historical behaviors to assess risk. While effective for small datasets, this approach is impractical for high transaction volumes due to its inefficiency and susceptibility to human error.

### List the Assumptions You Have Made So Far (Verify if Possible)
The dataset is assumed to accurately represent real-world fraud patterns, containing sufficiently labeled examples of both fraudulent and non-fraudulent transactions. It is expected that fraudulent transactions make up only a small proportion compared to non-fraudulent ones. The features in the dataset, such as transaction amount, location, and time, are presumed to be predictive of fraud, with fraudulent transactions exhibiting distinguishable patterns from legitimate ones. Given the imbalance in the dataset, stratification will be necessary to ensure proper representation of both classes during training. This assumption is supported by the observation that fraudulent transactions are only a small fraction of the total samples.