### Finance – Ensuring Accurate Transactions

**Task 1**: Transaction Data Validation Insights

**Objective**: Maintain transaction integrity.

**Steps**:
1. Choose a sample financial transaction dataset.
2. Identify common transaction issues like duplicate entries or incorrect amounts.
3. Develop a list of validation checks specific to financial transactions.

In [1]:
# Write your code from here
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Simulate a sample financial transaction dataset
np.random.seed(42)
num_transactions = 1000
start_date = datetime(2024, 1, 1)
dates = [start_date + timedelta(days=np.random.randint(365)) for _ in range(num_transactions)]
account_numbers = [f"ACC{np.random.randint(1000, 9999)}" for _ in range(num_transactions)]
transaction_types = np.random.choice(['debit', 'credit'], num_transactions)
amounts = np.random.uniform(10, 1000, num_transactions)
descriptions = [f"Transaction {i+1}" for i in range(num_transactions)]
transaction_ids = [f"TXN{i+1:04d}" for i in range(num_transactions)]

financial_data = pd.DataFrame({
    'Transaction ID': transaction_ids,
    'Date': dates,
    'Account Number': account_numbers,
    'Transaction Type': transaction_types,
    'Amount': amounts,
    'Description': descriptions
})

# Introduce some common data quality issues
# Duplicate entry
financial_data = pd.concat([financial_data, financial_data.iloc[[50]]]).reset_index(drop=True)
# Incorrect amount (outlier)
financial_data.loc[150, 'Amount'] = 15000
# Missing description
financial_data.loc[250, 'Description'] = None
# Inconsistent transaction type
financial_data.loc[350, 'Transaction Type'] = 'withdrawal' # Should be 'debit'

print("Sample Financial Transaction Data with Issues:")
print(financial_data.head())

Sample Financial Transaction Data with Issues:
  Transaction ID       Date Account Number Transaction Type      Amount  \
0        TXN0001 2024-04-12        ACC3026           credit   88.612519   
1        TXN0002 2024-12-14        ACC4744            debit  734.181525   
2        TXN0003 2024-09-27        ACC1009           credit  195.537825   
3        TXN0004 2024-04-16        ACC1260           credit  859.595283   
4        TXN0005 2024-03-12        ACC7038            debit  820.872879   

     Description  
0  Transaction 1  
1  Transaction 2  
2  Transaction 3  
3  Transaction 4  
4  Transaction 5  


**Task 2**: Implement Financial Data Validation

**Objective**: Use automated tools to ensure transaction accuracy.

**Steps**:
1. Integrate data validation rules into your existing financial systems.
2. Ensure real-time checks to validate data upon entry.

In [2]:
# Write your code from here
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Re-create the sample financial transaction dataset with issues
np.random.seed(42)
num_transactions = 1000
start_date = datetime(2024, 1, 1)
dates = [start_date + timedelta(days=np.random.randint(365)) for _ in range(num_transactions)]
account_numbers = [f"ACC{np.random.randint(1000, 9999)}" for _ in range(num_transactions)]
transaction_types = np.random.choice(['debit', 'credit'], num_transactions)
amounts = np.random.uniform(10, 1000, num_transactions)
descriptions = [f"Transaction {i+1}" for i in range(num_transactions)]
transaction_ids = [f"TXN{i+1:04d}" for i in range(num_transactions)]

financial_data = pd.DataFrame({
    'Transaction ID': transaction_ids,
    'Date': dates,
    'Account Number': account_numbers,
    'Transaction Type': transaction_types,
    'Amount': amounts,
    'Description': descriptions
})

# Introduce some common data quality issues
financial_data = pd.concat([financial_data, financial_data.iloc[[50]]]).reset_index(drop=True)
financial_data.loc[150, 'Amount'] = 15000
financial_data.loc[250, 'Description'] = None
financial_data.loc[350, 'Transaction Type'] = 'withdrawal'

# 1. Uniqueness Check on Transaction IDs
duplicate_ids = financial_data['Transaction ID'].duplicated(keep=False)
print("\nTransactions with Duplicate IDs:")
print(financial_data[duplicate_ids])

# 2. Duplicate Record Check (across key fields)
duplicate_records = financial_data.duplicated(subset=['Account Number', 'Date', 'Amount', 'Transaction Type'], keep=False)
print("\nDuplicate Transactions (based on key fields):")
print(financial_data[duplicate_records])

# 3. Amount Range Check (identifying outliers)
amount_threshold_upper = financial_data['Amount'].quantile(0.99)
amount_threshold_lower = financial_data['Amount'].quantile(0.01)
outlier_amounts = financial_data[(financial_data['Amount'] > amount_threshold_upper) | (financial_data['Amount'] < amount_threshold_lower)]
print(f"\nTransactions with Outlier Amounts (Upper Threshold: {amount_threshold_upper:.2f}, Lower Threshold: {amount_threshold_lower:.2f}):")
print(outlier_amounts)

# 4. Completeness Check for Mandatory Fields
mandatory_fields = ['Transaction ID', 'Date', 'Account Number', 'Amount', 'Transaction Type']
missing_values = financial_data[financial_data[mandatory_fields].isnull().any(axis=1)]
print("\nTransactions with Missing Mandatory Fields:")
print(missing_values)

# 5. Consistency Check for Transaction Types
valid_transaction_types = ['debit', 'credit']
inconsistent_types = financial_data[~financial_data['Transaction Type'].isin(valid_transaction_types)]
print("\nTransactions with Inconsistent Transaction Types:")
print(inconsistent_types)

# 6. Format Check for Account Numbers (simple example: length)
expected_account_number_length = 8
incorrect_account_length = financial_data[financial_data['Account Number'].str.len() != expected_account_number_length]
print(f"\nTransactions with Account Numbers of Incorrect Length (Expected: {expected_account_number_length}):")
print(incorrect_account_length)

# 7. Date Validity Check (simple example: no future dates)
now = datetime.now()
future_dates = financial_data[financial_data['Date'] > now]
print("\nTransactions with Future Dates:")
print(future_dates)

# Simulate Real-time Checks (example on a new transaction)
def validate_transaction(transaction):
    errors = []
    if pd.isna(transaction['Transaction ID']):
        errors.append("Transaction ID is missing.")
    if pd.isna(transaction['Date']):
        errors.append("Date is missing.")
    if pd.isna(transaction['Account Number']):
        errors.append("Account Number is missing.")
    if pd.isna(transaction['Amount']):
        errors.append("Amount is missing.")
    elif not isinstance(transaction['Amount'], (int, float)) or transaction['Amount'] <= 0:
        errors.append("Amount is invalid.")
    if transaction['Transaction Type'] not in ['debit', 'credit']:
        errors.append("Transaction Type is invalid.")
    if len(str(transaction['Account Number'])) != 8:
        errors.append("Account Number has incorrect length.")
    if pd.to_datetime(transaction['Date']) > datetime.now():
        errors.append("Date is in the future.")
    return errors

new_transaction = {
    'Transaction ID': 'TXN1002',
    'Date': datetime(2025, 6, 1),
    'Account Number': 'ACC123',
    'Transaction Type': 'withdrawal',
    'Amount': -50,
    'Description': 'Test'
}

validation_errors = validate_transaction(new_transaction)
if validation_errors:
    print("\nValidation Errors for New Transaction:")
    for error in validation_errors:
        print(f"- {error}")
else:
    print("\nNew Transaction passed validation.")


Transactions with Duplicate IDs:
     Transaction ID       Date Account Number Transaction Type      Amount  \
50          TXN0051 2024-01-14        ACC3192            debit  872.492437   
1000        TXN0051 2024-01-14        ACC3192            debit  872.492437   

         Description  
50    Transaction 51  
1000  Transaction 51  

Duplicate Transactions (based on key fields):
     Transaction ID       Date Account Number Transaction Type      Amount  \
50          TXN0051 2024-01-14        ACC3192            debit  872.492437   
1000        TXN0051 2024-01-14        ACC3192            debit  872.492437   

         Description  
50    Transaction 51  
1000  Transaction 51  

Transactions with Outlier Amounts (Upper Threshold: 984.17, Lower Threshold: 18.24):
    Transaction ID       Date Account Number Transaction Type        Amount  \
107        TXN0108 2024-08-18        ACC4876           credit    995.483141   
150        TXN0151 2024-02-05        ACC2205            debit  1500