# Challenge

Another approach to identifying fraudulent transactions is to look for outliers in the data. Standard deviation or quartiles are often used to detect outliers. Using this starter notebook, code two Python functions:

* One that uses standard deviation to identify anomalies for any cardholder.

* Another that uses interquartile range to identify anomalies for any cardholder.

## Identifying Outliers using Standard Deviation

In [1]:
# Initial imports
import pandas as pd
import numpy as np
import random
from sqlalchemy import create_engine
from dotenv import load_dotenv
import os

In [2]:
# load postgresql database server password as an environmental variable
load_dotenv()

True

In [3]:
db_key = os.getenv("my_pass")
type(db_key)

str

In [4]:
# Create a connection to the database
engine = create_engine(f"postgresql://postgres:{db_key}@localhost:5432/fraud_detection")

In [34]:
# Write function that locates outliers using standard deviation
def find_anomalities_sd(card_holder_id: str = '1'):
    
    # Query the database
    query = f"""
            SELECT t.date, t.amount, t.card
            FROM transaction as t 
            INNER JOIN credit_card AS cc ON cc.card = t.card
            INNER JOIN card_holder AS ch ON ch.id = cc.cardholder_id
            WHERE ch.id = {card_holder_id}  
            ORDER BY t.date
            """
    # Use pandas to create a df from query results
    df = pd.read_sql(query, engine)
    
    # Calculate the mean and std for the t.amount columns
    amount_avg = df['amount'].mean()
    amount_std = df['amount'].std()
    
    # We will use 2 standard deviations for the purpose of our analysis
    lower = amount_avg - (amount_std * 2)
    higher = amount_avg + (amount_std * 2)
    
    # Use a list comprehension to retrieve transactions that are 2 std below/above the mean
    lower_transactions = [amount for amount in df['amount'] if amount < lower]
    higher_transactions = [amount for amount in df['amount'] if amount > higher]
    
    # Create a final list of results
    final_list = lower_transactions + higher_transactions
    
    # If final_list is not empty
    if final_list: 
        # Modify the df to maintain only the records where amount is part of the final_list
        df = df[df['amount'].isin(final_list)]
        # return df
        return df
    else: 
        return "No signs of fraudelent transactions were found"

In [65]:
# Find anomalous transactions for 3 random card holders
# Create a list to hold unique id values
card_holder_id =[]

# Create loop to generate random id numbers
for i in range(100):
    # random id numbers between 1 and 25 
    _id = np.random.randint(1,25)
    
    # Append id number only if it doesn't exist in card_holder_id list.
    if _id not in card_holder_id: 
        card_holder_id.append(_id)
    
    # Once we have three id numbers, call the find_anomalities_sd() and break out of the main for loop
    if len(card_holder_id) == 3: 
        for x in card_holder_id: 
            print('*' * 60)
            print(f'LOOKING FOR FRAUDELENT TRANSACTIONS FOR CARD HOLDER ID {x}')
            display(find_anomalities_sd(x))
            print()
        break

************************************************************
LOOKING FOR FRAUDELENT TRANSACTIONS FOR CARD HOLDER ID 7


Unnamed: 0,date,amount,card
1,2018-01-04 03:05:18,1685.0,3516952396080247
19,2018-02-19 16:00:43,1072.0,3516952396080247
32,2018-04-18 23:23:29,1086.0,3516952396080247
88,2018-08-07 11:07:32,1449.0,3516952396080247
128,2018-12-13 15:51:59,2249.0,3516952396080247
133,2018-12-18 17:20:33,1296.0,3516952396080247



************************************************************
LOOKING FOR FRAUDELENT TRANSACTIONS FOR CARD HOLDER ID 13


Unnamed: 0,date,amount,card
179,2018-11-08 02:10:03,22.78,5135837688671496



************************************************************
LOOKING FOR FRAUDELENT TRANSACTIONS FOR CARD HOLDER ID 6


Unnamed: 0,date,amount,card
4,2018-01-08 02:34:32,1029.0,3581345943543942
23,2018-02-27 15:27:32,1145.0,3581345943543942
40,2018-04-21 19:41:51,2108.0,3581345943543942
67,2018-07-03 14:56:36,1398.0,3581345943543942
79,2018-07-24 22:42:00,1108.0,3581345943543942
81,2018-08-05 01:06:38,1379.0,3581345943543942
90,2018-09-02 06:17:00,2001.0,3581345943543942
92,2018-09-11 15:16:47,1856.0,3581345943543942
122,2018-11-27 17:20:29,1279.0,3581345943543942





## Identifying Outliers Using Interquartile Range

In [None]:
# Write a function that locates outliers using interquartile range


In [None]:
# Find anomalous transactions for 3 random card holders
