# Challenge

Another approach to identifying fraudulent transactions is to look for outliers in the data. Standard deviation or quartiles are often used to detect outliers. Using this starter notebook, code two Python functions:

* One that uses standard deviation to identify anomalies for any cardholder.

* Another that uses interquartile range to identify anomalies for any cardholder.

## Identifying Outliers using Standard Deviation

In [1]:
# Initial imports
import pandas as pd
import numpy as np
import random
from sqlalchemy import create_engine

In [2]:
# Create a connection to the database
engine = create_engine("postgresql://postgres:postgres@localhost:5432/fraud_detection")

In [6]:
# Write function that locates outliers using standard deviation
def find_outliers_std(data, deviation=3):
    data_mean, data_std = np.mean(data), np.std(data)
    cutoff = data_std * deviation
    lower, upper = data_mean - cutoff, data_mean + cutoff
    outliers = [x for x in data if x < lower or x > upper]
    return outliers

In [16]:
# Find anomalous transactions for 3 random card holders using standard deviation method
card_holder_ids = random.sample(range(1, 101), 3)
for id in card_holder_ids:
    query = f"SELECT amount FROM transaction WHERE id = {id}"
    df = pd.read_sql(query, engine)
    outliers = find_outliers_std(df["amount"])
    print(f"Card holder {id} has {len(outliers)} anomalous transactions using standard deviation method.")

Card holder 100 has 0 anomalous transactions using standard deviation method.
Card holder 22 has 0 anomalous transactions using standard deviation method.
Card holder 37 has 0 anomalous transactions using standard deviation method.


## Identifying Outliers Using Interquartile Range

In [21]:
# Find anomalous transactions for 3 random card holders using interquartile range method
card_holder_ids = random.sample(range(1, 101), 3)
for id in card_holder_ids:
    query = f"SELECT amount FROM transaction WHERE id = {id}"
    df = pd.read_sql(query, engine)
    outliers = find_outliers_iqr(df["amount"])
    print(f"Card holder {id} has {len(outliers)} anomalous transactions using interquartile range method.")

Card holder 13 has 0 anomalous transactions using interquartile range method.
Card holder 12 has 0 anomalous transactions using interquartile range method.
Card holder 61 has 0 anomalous transactions using interquartile range method.
