# LeBonCoin Fraud Detection
This project aims to build classification models to detect rent fraud in the LeBonCoin marketplace

### Create Virtual env
```bash
$ mamba env create --file conda/dev.yaml
$ conda activate data_science_machine_learning
```

### Load datasets


In [2]:
import numpy as np
import pandas as pd
import re

import warnings
warnings.filterwarnings('ignore')

In [3]:
lyondf = pd.read_json('https://pages.isfa.fr/~mai2121213/dataScience/data/lyon/Lyon.json')
lyondf.head()

Unnamed: 0,list_id,first_publication_date,expiration_date,index_date,status,category_id,category_name,subject,body,ad_type,url,price,price_cents,images,attributes,location,owner,options,has_phone,is_boosted
0,2135256182,2022-03-22 17:34:42,2022-05-21 18:34:42,2022-03-30 14:41:10,active,10,Locations,Location Studio Meublé - Lyon 69003,Location Studio Meublé - Lyon 69003\n\nDisponi...,offer,https://www.leboncoin.fr/locations/2135256182.htm,[550],55000,{'thumb_url': 'https://img.leboncoin.fr/api/v1...,"[{'key': 'activity_sector', 'value': '6', 'val...","{'country_id': 'FR', 'region_id': '22', 'regio...","{'store_id': '33130486', 'user_id': '2db03cc4-...","{'has_option': False, 'booster': False, 'photo...",False,1.0
1,2096525179,2022-03-08 19:08:49,2022-05-07 20:08:49,2022-03-29 20:08:50,active,10,Locations,"Déménagement, garde meuble, location box sécurisé",Selfstock est le spécialiste dans la location ...,offer,https://www.leboncoin.fr/locations/2096525179.htm,[59],5900,{'thumb_url': 'https://img.leboncoin.fr/api/v1...,"[{'key': 'activity_sector', 'value': '2', 'val...","{'country_id': 'FR', 'region_id': '22', 'regio...","{'store_id': '39897285', 'user_id': '1e6b823c-...","{'has_option': True, 'booster': False, 'photos...",True,1.0
2,1825420781,2022-02-02 11:26:11,2022-04-03 12:26:11,2022-03-30 12:26:10,active,10,Locations,Location BOX STOCKAGE/GARDE MEUBLE Discount LY...,Resotainer LYON situé à au Port Edouard Herrio...,offer,https://www.leboncoin.fr/locations/1825420781.htm,[32],3200,{'thumb_url': 'https://img.leboncoin.fr/api/v1...,"[{'key': 'activity_sector', 'value': '7', 'val...","{'country_id': 'FR', 'region_id': '22', 'regio...","{'store_id': '5573560', 'user_id': 'a7993a58-d...","{'has_option': True, 'booster': False, 'photos...",True,1.0
3,2137929861,2022-03-28 13:57:52,2022-05-27 13:57:52,2022-03-28 13:57:52,active,10,Locations,Location suite,Location une suite parentale à jean Mace à 5 m...,offer,https://www.leboncoin.fr/locations/2137929861.htm,[500],50000,{'thumb_url': 'https://img.leboncoin.fr/api/v1...,"[{'key': 'real_estate_type', 'value': '2', 'va...","{'country_id': 'FR', 'region_id': '22', 'regio...","{'store_id': '30741331', 'user_id': 'f3982f0b-...","{'has_option': False, 'booster': False, 'photo...",False,
4,2104155199,2022-03-26 18:04:02,2022-05-25 19:04:02,2022-03-26 18:04:02,active,10,Locations,Location cave,Cave vieux Lyon proche musée gadagne et mairie...,offer,https://www.leboncoin.fr/locations/2104155199.htm,[78],7800,{'thumb_url': 'https://img.leboncoin.fr/api/v1...,"[{'key': 'real_estate_type', 'value': '5', 'va...","{'country_id': 'FR', 'region_id': '22', 'regio...","{'store_id': '10016300', 'user_id': '2254b2e2-...","{'has_option': False, 'booster': False, 'photo...",False,


In [4]:
def get_value_from_attributes(attributes,key,value_label=False):
    inlist = [key in attribute["key"] for attribute in attributes]
    try:
        # this will throw error if True is not found
        index = inlist.index(True)

        return attributes[index]["value_label"] if value_label else attributes[index]["value"]

    except ValueError:
        return

Get Values from the attributes array

In [5]:
lyondf["real_estate_type"] = lyondf["attributes"].apply(get_value_from_attributes,key="real_estate_type",value_label=True)
lyondf["real_estate_type_value"] = lyondf["attributes"].apply(get_value_from_attributes,key="real_estate_type")
lyondf["furnished"] = lyondf["attributes"].apply(get_value_from_attributes,key="furnished")
lyondf["square"] = lyondf["attributes"].apply(get_value_from_attributes,key="square")
lyondf["charges_included"] = lyondf["attributes"].apply(get_value_from_attributes,key="charges_included")
lyondf["rooms"] = lyondf["attributes"].apply(get_value_from_attributes,key="rooms")

#### Get errors from given text
* Go to `https://textgears.com/signup?shutupandgiveme=thekey`
* Activate the API Key

Ps: You can use any email (There is no email verification ^^)
(This might taks time because it communicates with another service)

In [29]:
import requests
from requests.exceptions import ConnectionError

def get_text_errors_nb(text):
    url = f'https://api.textgears.com/grammar?key=PzFH3QNGgDc2UBfi&text="{text}"&language=fr-FR'
    try:
        response = requests.get(url)
    except ConnectionError:
        response = None
    if response and response.json()["status"]:
        content = response.json()
        errors = content["response"]["errors"]
        return len(errors)


In [None]:
lyondf["description_errors"] = lyondf["body"].apply(lambda x: get_text_errors_nb(x))

In [6]:
def get_value_from_object(object,key):
    try:
        value = object[key]
        return value
    except KeyError:
        return 

Get values from the Location object

In [7]:
lyondf["zipcode"] = lyondf["location"].apply(get_value_from_object,key="zipcode")
lyondf["lat"] = lyondf["location"].apply(get_value_from_object,key="lat")
lyondf["lng"] = lyondf["location"].apply(get_value_from_object,key="lng")

Get values from User object

In [8]:
lyondf["user_id"] = lyondf["owner"].apply(get_value_from_object,key="user_id")
lyondf["user_type"] = lyondf["owner"].apply(get_value_from_object,key="type")
lyondf["user_siren"] = lyondf["owner"].apply(get_value_from_object,key="siren")

Get number of images

In [9]:
lyondf["nb_images"] = lyondf["images"].apply(lambda x: x["nb_images"])

Get description length

In [11]:
lyondf["description_length"] = lyondf["body"].apply(lambda x: len(re.findall(r'\w+', x)))

In [61]:
lyondf["ratio_description_errors"] = lyondf["description_length"] / ((lyondf["description_errors"] + 1) * lyondf["description_length"])

*Not sure about this Ratio..*

In [62]:
lyondf["ratio_description_errors"].describe()

count    613.000000
mean       0.272850
std        0.228213
min        0.015873
25%        0.125000
50%        0.200000
75%        0.333333
max        1.000000
Name: ratio_description_errors, dtype: float64

In [13]:
clean_lyondf = lyondf.copy()

Now, we get the features that seem important to us. And drop the columns that we don't need. Some of the columns are not used because we already extracted meaningful knowledge from them. (Like the images)

In [17]:
clean_lyondf = clean_lyondf.drop(columns=["list_id","index_date","category_id","category_name","price","images","attributes","location","owner","options","expiration_date","first_publication_date","url","subject","body"])

In [18]:
clean_lyondf.columns

Index(['status', 'ad_type', 'price_cents', 'has_phone', 'is_boosted',
       'real_estate_type', 'real_estate_type_value', 'furnished', 'square',
       'charges_included', 'rooms', 'zipcode', 'lat', 'lng', 'user_id',
       'user_type', 'user_siren', 'nb_images', 'description_length'],
      dtype='object')