# Fraud Detection Project - Main Notebook

Contents:

* [Data Understanding](#Data-Understanding)
    - [Importing Data](#Importing-Data)
    - [Describing Data](#Describing-Data)
    - [Tidying Data](#Tidying-Data)
* [Data Preparation](#Data-Preparation)
    - [Selecting Data](#Selecting-Data)
    - [Cleaning Data](#Cleaning-Data)
    - [Exploratory Data Analysis (EDA)](#EDA)
    - [Feature Engineering](#Feature-Engineering)
* [Modeling](#Modeling)
    - [xyz](#xyz)
* [Evaluation](#Evaluation)
    - [xyz](#xyz)

## Data Understanding

### Importing Data

In [1]:
# Import necessary modules
import json
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns

# Display 100 columns max
pd.set_option('display.max_columns', 100)

# Display 20 characters max within a cell
# pd.set_option('max_colwidth',60)

In [2]:
ads_list = []
with open('data/autos_20190626.json','r') as file:
    for ad in file:
        ads_list.append(json.loads(ad))
autos = pd.DataFrame(ads_list)

### Describing Data

In [22]:
# autos.head(3)

In [17]:
# autos.info()

In [23]:
# Show the proportion of missing values which are higher than 0.2
# autos.isnull().mean()[autos.isnull().mean() > 0.2].sort_values(ascending=False)

### Tidying Data

In [13]:
%run -i "functions/name_columns.py"
# load "functions/name_columns.py"

In [7]:
# Tidy column names
autos.columns = name_columns(autos)

## Data Preparation

### Selecting Data

In [8]:
t_cols = ['safety_security', 'co2_emission', 'consumption', 'country_version', 
          'cylinders', 'displacement', 'make', 'model', 'model_code', 'next_inspection',
          'non_smoking_vehicle', 'body_type', 'description', "hp", 'kw', 'km', "make_model", 
          "offer_number"]

m_cols = ['entertainment_media', 'availability', 'available_from', 'body',
        'body_color', 'body_color_original', 'full_service', 'gearing_type', 'gears',
        'inspection_new', 'last_service_date', 'last_timing_belt_service_date', 'paint_type',
        'null', 'price', 'registration', 'short_description', 'vat']

v_cols = ['comfort_convenience', 'extras', 'drive_chain',
         'electricity_consumption', 'emission_class', 'emission_label',
         'first_registration', 'fuel', 'nr_of_doors', 'nr_of_seats',
         'other_fuel_types', 'previous_owners', 'type', 'upholstery', 'warranty',
         'weight', 'prev_owner', 'url']

### Cleaning Data

In [19]:
# Run cleaning functions
%run -i "functions/tidy_clean.py"

In [20]:
# %load "functions/tidy_clean.py"

In [None]:
# Clean t_cols and return the new cleaned DataFrame to df_t
df_t = clean_t(autos[t_cols])

# Clean m_cols return the new cleaned DataFrame to df_m
df_m = clean_m(autos[m_cols])

# Clean v_cols return the new cleaned DataFrame to df_v
df_v=clean_v(autos[v_cols])

In [11]:
# Join (left join) all DataFrames
df_2 = df_t.join(df_m)
df = df_2.join(df_v)

In [14]:
# Run additional cleaning
df = clean_update(df)

In [16]:
df.head(3)

Unnamed: 0,co2_emission,country_version,cylinders,displacement,make,model,model_code,next_inspection,body_type,description,hp,km,offer_number,ss_abs,ss_adaptive_cruise_control,ss_adaptive_headlights,ss_alarm_system,ss_blind_spot_monitor,ss_central_door_lock,ss_central_door_lock_with_remote_control,ss_daytime_running_lights,ss_driver_drowsiness_detection,ss_driver_side_airbag,ss_electronic_stability_control,ss_emergency_brake_assistant,ss_emergency_system,ss_fog_lights,ss_head_airbag,ss_immobilizer,ss_isofix,ss_led_daytime_running_lights,ss_led_headlights,ss_lane_departure_warning_system,ss_night_view_assist,ss_passenger_side_airbag,ss_power_steering,ss_rear_airbag,ss_side_airbag,ss_tire_pressure_monitoring_system,ss_traction_control,ss_traffic_sign_recognition,ss_xenon_headlights,next_inspection_bool,consumption_comb,available_from,body,body_color,gearing_type,gears,inspection_new,...,cc_electric_starter,cc_electric_tailgate,cc_electrical_side_mirrors,cc_electrically_adjustable_seats,cc_electrically_heated_windshield,cc_heads_up_display,cc_heated_steering_wheel,cc_hill_holder,cc_keyless_central_door_lock,cc_leather_seats,cc_leather_steering_wheel,cc_light_sensor,cc_lumbar_support,cc_massage_seats,cc_multi_function_steering_wheel,cc_navigation_system,cc_panorama_roof,cc_park_distance_control,cc_parking_assist_system_camera,cc_parking_assist_system_self_steering,cc_parking_assist_system_sensors_front,cc_parking_assist_system_sensors_rear,cc_power_windows,cc_rain_sensor,cc_seat_heating,cc_seat_ventilation,cc_split_rear_seats,cc_start_stop_system,cc_sunroof,cc_tinted_windows,cc_wind_deflector,cc_windshield,ext_alloy_wheels,ext_cab_or_rented_car,ext_catalytic_converter,ext_handicapped_enabled,ext_right_hand_drive,ext_roof_rack,ext_shift_paddles,ext_ski_bag,ext_sliding_door,ext_sport_package,ext_sport_seats,ext_sport_suspension,ext_touch_screen,ext_trailer_hitch,ext_tuned_car,ext_voice_control,ext_winter_tyres,weight_kg
0,99.0,,3.0,1422.0,Audi,A1,0588/BDF,06/2021,Sedans,Sicherheit: Deaktivierung für Beifahrer-Airbag...,66.0,56013.0,LR-062483,1,0,0,0,0,1,0,1,0,1,1,0,0,1,0,1,1,0,0,0,0,1,1,0,1,1,1,0,1,True,3.8,NaT,Sedans,Black,Automatic,,1,...,0,0,1,0,0,0,0,1,0,0,1,1,0,0,1,1,0,1,0,0,0,1,1,1,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1220.0
1,129.0,,4.0,1798.0,Audi,A1,0588/BCY,,Sedans,Sicherheit: Deaktivierung für Beifahrer-Airbag...,141.0,80000.0,,1,0,0,0,0,1,1,1,0,1,1,0,0,0,1,1,1,0,0,0,0,1,1,0,1,1,1,0,1,False,5.6,NaT,Sedans,Red,Automatic,7.0,0,...,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,1255.0
2,99.0,,,1598.0,Audi,A1,,,Sedans,Sicherheit: Deaktivierung für Beifahrer-Airbag...,85.0,83450.0,AM-95365,1,0,0,0,0,1,0,1,0,1,1,0,0,0,0,1,1,0,0,0,0,1,1,0,1,1,1,0,0,False,3.8,NaT,Sedans,Black,Automatic,,0,...,0,0,1,0,0,0,0,1,0,0,1,0,0,0,1,1,0,1,0,0,1,1,1,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,


### EDA

## Feature Engineering