# Transaction Anomaly Detection
Transaction Anomaly Detection using unsupervised learning methods is a technique used to identify unusual or suspicious patterns in financial transactions without requiring labeled data for training. Since anomalies are often rare and unpredictable, unsupervised learning is particularly useful when normal and fraudulent transactions aren't clearly categorized.
## Importing Required Libraries

In [3]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

## 1. Data Exploration & Preprocessing
### 1.1. Examine the data

In [4]:
# Load the dataset
df = pd.read_csv('transactions_data.csv')

# Check the shape of the dataset (rows, columns)
print("Shape of the dataset:", df.shape)

# Get dataset information (column names, non-null counts, data types)
print("Dataset Info:\n")
df.info()

# List all columns (features)
print("Columns in the dataset:", df.columns)

# Check data types of each column
print("Data types:\n", df.dtypes)

# Display first 5 rows
print("\nFirst 5 rows:\n", df.head())

Shape of the dataset: (13305915, 12)
Dataset Info:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13305915 entries, 0 to 13305914
Data columns (total 12 columns):
 #   Column          Dtype  
---  ------          -----  
 0   id              int64  
 1   date            object 
 2   client_id       int64  
 3   card_id         int64  
 4   amount          object 
 5   use_chip        object 
 6   merchant_id     int64  
 7   merchant_city   object 
 8   merchant_state  object 
 9   zip             float64
 10  mcc             int64  
 11  errors          object 
dtypes: float64(1), int64(5), object(6)
memory usage: 1.2+ GB
Columns in the dataset: Index(['id', 'date', 'client_id', 'card_id', 'amount', 'use_chip',
       'merchant_id', 'merchant_city', 'merchant_state', 'zip', 'mcc',
       'errors'],
      dtype='object')
Data types:
 id                  int64
date               object
client_id           int64
card_id             int64
amount             object
use_chip           o

The dataset has 12 columns. MCC (Merchant Category Code) is a four-digit number used by credit card networks (like Visa, Mastercard, etc.) to classify businesses by the types of goods or services they provide.

## 1.2. Data Cleaning