### Finance – Ensuring Accurate Transactions

**Task 1**: Transaction Data Validation Insights

**Objective**: Maintain transaction integrity.

**Steps**:
1. Choose a sample financial transaction dataset.
2. Identify common transaction issues like duplicate entries or incorrect amounts.
3. Develop a list of validation checks specific to financial transactions.

In [1]:
# Write your code from here
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

np.random.seed(42)

# --- Task 1: Simulate sample transaction data with some common issues ---
def generate_transactions(num=100):
    data = {
        "TransactionID": np.arange(1, num + 1),
        "AccountID": np.random.randint(10000, 10100, size=num),
        "TransactionDate": [datetime.now() - timedelta(days=np.random.randint(0, 30)) for _ in range(num)],
        "Amount": np.round(np.random.uniform(-1000, 5000, num), 2),  # some negative amounts (issues)
        "Currency": np.random.choice(["USD", "EUR", "GBP"], num),
        "Status": np.random.choice(["Completed", "Pending", "Failed"], num, p=[0.8, 0.15, 0.05])
    }
    df = pd.DataFrame(data)

    # Introduce duplicates intentionally
    duplicates = df.sample(frac=0.05).copy()
    df = pd.concat([df, duplicates], ignore_index=True).reset_index(drop=True)

    return df

# --- Task 2: Validation checks ---

def check_duplicates(df, subset_cols=["TransactionID"]):
    duplicates = df[df.duplicated(subset=subset_cols, keep=False)]
    return duplicates

def validate_amounts(df, amount_col="Amount", min_amount=0):
    invalid_amounts = df[df[amount_col] < min_amount]
    return invalid_amounts

def validate_transaction_dates(df, date_col="TransactionDate"):
    now = datetime.now()
    invalid_dates = df[df[date_col] > now]
    return invalid_dates

def validate_status(df, allowed_statuses=["Completed", "Pending", "Failed"]):
    invalid_status = df[~df["Status"].isin(allowed_statuses)]
    return invalid_status

# --- Run validations ---

transactions_df = generate_transactions(200)

duplicates_df = check_duplicates(transactions_df)
invalid_amounts_df = validate_amounts(transactions_df)
invalid_dates_df = validate_transaction_dates(transactions_df)
invalid_status_df = validate_status(transactions_df)

# --- Summary ---

print(f"Total transactions: {len(transactions_df)}")
print(f"Duplicate transactions found: {len(duplicates_df)}")
print(f"Transactions with invalid amounts (less than 0): {len(invalid_amounts_df)}")
print(f"Transactions with invalid future dates: {len(invalid_dates_df)}")
print(f"Transactions with invalid status: {len(invalid_status_df)}")

# --- Optional: Display problematic records ---

def display_issues(name, df):
    print(f"\n{name} ({len(df)} records):")
    if df.empty:
        print("None")
    else:
        print(df.head())

display_issues("Duplicate Transactions", duplicates_df)
display_issues("Invalid Amount Transactions", invalid_amounts_df)
display_issues("Invalid Future Date Transactions", invalid_dates_df)
display_issues("Invalid Status Transactions", invalid_status_df)

# --- Integration Note ---
# The above functions can be hooked into real-time validation pipelines to check transactions on entry.
# Example: on transaction submission, call `validate_amounts()` and `check_duplicates()` with relevant data.


Total transactions: 210
Duplicate transactions found: 20
Transactions with invalid amounts (less than 0): 37
Transactions with invalid future dates: 0
Transactions with invalid status: 0

Duplicate Transactions (20 records):
     TransactionID  AccountID            TransactionDate   Amount Currency  \
5                6      10020 2025-05-23 06:58:29.456388  3486.31      GBP   
11              12      10099 2025-05-09 06:58:29.456405  -547.92      GBP   
78              79      10062 2025-05-14 06:58:29.456617  -650.84      USD   
98              99      10052 2025-05-26 06:58:29.456675  4877.06      EUR   
124            125      10027 2025-05-09 06:58:29.456752  3867.23      GBP   

        Status  
5      Pending  
11     Pending  
78   Completed  
98   Completed  
124  Completed  

Invalid Amount Transactions (37 records):
    TransactionID  AccountID            TransactionDate  Amount Currency  \
0               1      10051 2025-05-24 06:58:29.456356 -738.38      EUR   
11       

**Task 2**: Implement Financial Data Validation

**Objective**: Use automated tools to ensure transaction accuracy.

**Steps**:
1. Integrate data validation rules into your existing financial systems.
2. Ensure real-time checks to validate data upon entry.

In [None]:
# Write your code from here
