# Challenge

Another approach to identifying fraudulent transactions is to look for outliers in the data. Standard deviation or quartiles are often used to detect outliers. Using this starter notebook, code two Python functions:

* One that uses standard deviation to identify anomalies for any cardholder.

* Another that uses interquartile range to identify anomalies for any cardholder.

## Identifying Outliers using Standard Deviation

In [3]:
# Initial imports
import pandas as pd
import numpy as np
import random
from sqlalchemy import create_engine
import os
from dotenv import load_dotenv
import dotenv

In [4]:
# Load .env enviroment variables
load_dotenv()
postgress_user = os.getenv("POSTGRES_USER")
postgress_pass = os.getenv("POSTGRES_PASS")

In [5]:
# Create a connection to the database
engine = create_engine(f'postgresql://{postgress_user}:{postgress_pass}@localhost:5432/fraud_detection')
# Use the connection variable rather than the engine, to maintain the db link active
connection = engine.connect()

In [47]:
# Write function that locates outliers using standard deviation
def outlier_std_identifier(ch_id):
    # Query the transactions for the given card holder ID
    query = f"""
            SELECT C.cardholder_id,
                    CH.name,
                    T.amount,
                    MC.name
            FROM transaction as T
            INNER JOIN credit_card as C
            ON T.card = C.card
            INNER JOIN card_holder as CH
            ON C.cardholder_id = CH.id
            INNER JOIN merchant as M
            ON T.id_merchant = M.id
            INNER JOIN merchant_category as MC
            ON M.id_merchant_category = MC.id
            WHERE C.cardholder_id = {ch_id};
            """
    # Create a DataFrame from the query result
    ch_df = pd.read_sql(query, connection)
    # FOR DEBUGGING: View a sample of the DataFrame
    # display(ch_df.head())
    # determine the normal range of values
    mean = round(ch_df['amount'].values.mean(),2)
    std = round(ch_df['amount'].values.std(),2)
    range_max = mean + 3*std
    range_min = max(0,mean - 3*std) # transactions should always be positive, so ensuring we're not looking at a negative sigma range
    # filter the dataframe for outlier transations outside of the normal range
    outliers = ch_df.query('amount < @range_min or amount > @range_max')
    # load a result object with details on the card holder's transactions characteristics
    result = f'mean: {mean}\nstd: {std}'
    # load a result object with details on the card holder's transactions characteristics
    if outliers.empty:
        result = f'mean: {mean}\nstd: {std}\nNo outliers identified'
    else:
        result = f'mean: {mean}\nstd: {std}\n{outliers}'
    # return the result
    return result

In [48]:
# Find anomalous transactions for 3 random card holders

# Query the list of card holder IDs
query = """
        SELECT DISTINCT C.cardholder_id
        FROM transaction as T
        INNER JOIN credit_card as C
        ON T.card = C.card
        ORDER BY C.cardholder_id;
        """
# Randomly select 3 card holder IDs
ch_IDs = pd.read_sql(query,connection).sample(3)['cardholder_id'].values.tolist()
# Call the outlier identifier function for the selected card holder IDs
for ch_id in ch_IDs:
    print(f'Outlier charges (potential fraud) for card holder {ch_id}:\n{outlier_std_identifier(ch_id)}\n')

Outlier charges (potential fraud) for card holder 24:
mean: 49.81
std: 214.7
     cardholder_id              name  amount        name
40              24  Stephanie Dalton  1011.0         bar
66              24  Stephanie Dalton  1901.0  restaurant
161             24  Stephanie Dalton  1301.0         pub
162             24  Stephanie Dalton  1035.0         pub

Outlier charges (potential fraud) for card holder 8:
mean: 8.39
std: 5.65
No outliers identified

Outlier charges (potential fraud) for card holder 9:
mean: 170.35
std: 426.73
    cardholder_id          name  amount         name
13              9  Laurie Gibbs  1534.0  coffee shop
27              9  Laurie Gibbs  1795.0          pub
60              9  Laurie Gibbs  1724.0          pub



## Identifying Outliers Using Interquartile Range

In [None]:
# Write a function that locates outliers using interquartile range
def outlier_interquartile_identifier(ch_id):
    return None

In [None]:
# Find anomalous transactions for 3 random card holders
for ch_id in ch_IDs:
    print(f'Interquartile outlier analysis for card holder {ch_id}: {outlier_interquartile_identifier(ch_id)}')