# "Human or Robot" 
## Predicting Auction Fraud

On an auction website, human bidders are becoming increasingly frustrated with their inability to win auctions vs. their software-controlled counterparts. As a result, usage from the site's core customer base is plummeting. In order to rebuild customer happiness, the site owners need to eliminate computer generated bidding from their auctions.

The goal of this project is to identify online auction bids that are placed by "robots", thus helping the site owners easily flag these users for removal from their site to prevent unfair auction activity.

<img src="../images/robots-greeting.png" width="500">

## PRE PROCESSING AND MODELING
- Models to train
    - Logistic Regression
    - k-Nearest Neighbors
    - Decision Trees
    - Support Vector Machine
    - Naive Bayes

- Evaluation
- Conclusion

In [1]:
# import packages
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Modeling
from sklearn.model_selection import train_test_split, cross_validate, GridSearchCV, learning_curve

In [2]:
# load df
df = pd.read_csv('../data/features-outcome-df')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1984 entries, 0 to 1983
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   outcome               1984 non-null   float64
 1   bids_per_auction      1984 non-null   int64  
 2   country_per_user      1984 non-null   int64  
 3   device_per_user       1984 non-null   int64  
 4   ip_bids_per_user      1984 non-null   float64
 5   url_bids_per_user     1984 non-null   float64
 6   bots_country          1984 non-null   float64
 7   bots_device           1984 non-null   float64
 8   bids_per_user         1984 non-null   int64  
 9   url_per_user          1984 non-null   int64  
 10  auction_per_user      1984 non-null   int64  
 11  ip_per_user           1984 non-null   int64  
 12  bids_per_device       1984 non-null   int64  
 13  bids_per_coutry       1984 non-null   int64  
 14  avg_bids_per_user     1984 non-null   float64
 15  median_bids_per_user 

In [3]:
df.columns

Index(['outcome', 'bids_per_auction', 'country_per_user', 'device_per_user',
       'ip_bids_per_user', 'url_bids_per_user', 'bots_country', 'bots_device',
       'bids_per_user', 'url_per_user', 'auction_per_user', 'ip_per_user',
       'bids_per_device', 'bids_per_coutry', 'avg_bids_per_user',
       'median_bids_per_user', 'auto parts', 'books and music', 'clothing',
       'computers', 'furniture', 'home goods', 'jewelry', 'mobile',
       'office equipment', 'sporting goods'],
      dtype='object')

### Test Train Split

In [4]:
# declare response variable Y
y = df['outcome']

# keep explanatory variables X
X = df[['bids_per_auction', 'country_per_user', 'device_per_user',
       'ip_bids_per_user', 'url_bids_per_user', 'bots_country', 'bots_device',
       'bids_per_user', 'url_per_user', 'auction_per_user', 'ip_per_user',
       'bids_per_device', 'bids_per_coutry', 'avg_bids_per_user',
       'median_bids_per_user', 'auto parts', 'books and music', 'clothing',
       'computers', 'furniture', 'home goods', 'jewelry', 'mobile',
       'office equipment', 'sporting goods']]

In [5]:
# train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = 0.20, random_state = 42)

# Check the shape of the X train, y_train, X_test and y_test to make sure the proportions are right
print('X_train: ', X_train.shape, 'y_train: ', y_train.shape, 'X_test :' ,X_test.shape,
      'y_test: ', y_test.shape)

X_train:  (1587, 25) y_train:  (1587,) X_test : (397, 25) y_test:  (397,)


In [6]:
# instantiate models


## Evaluation and Conclusion

-------