<a href="https://colab.research.google.com/github/fardinpratama/Project-with-python/blob/master/Order_Brushing_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Shopee Code League</h1>

<h3>-- Order Brushing Detection --</h3>

Order Brushing is a technique that can be used by sellers to make fake orders to improve seller rankings or certain items, which might encourage seller items to improve search results on Shopee.

For this purpose, a shop that is considered to have ordered a brush if the concentrate level is greater than or equal to 3.

<b>Concentrate rate = number of Order within 1 hour / number of unique buyers within 1 hour </b>

Suspicious buyers are considered as buyers who contribute the highest order to stores that are considered to have committed orders.

In [0]:
import pandas as pd
import numpy as np
from datetime import timedelta

In [0]:
df = pd.read_csv('order_brush_order.csv') 
df.head()

Unnamed: 0,orderid,shopid,userid,event_time
0,31076582227611,93950878,30530270,2019-12-27 00:23:03
1,31118059853484,156423439,46057927,2019-12-27 11:54:20
2,31123355095755,173699291,67341739,2019-12-27 13:22:35
3,31122059872723,63674025,149380322,2019-12-27 13:01:00
4,31117075665123,127249066,149493217,2019-12-27 11:37:55


In [0]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 222750 entries, 0 to 222749
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   orderid     222750 non-null  int64 
 1   shopid      222750 non-null  int64 
 2   userid      222750 non-null  int64 
 3   event_time  222750 non-null  object
dtypes: int64(3), object(1)
memory usage: 6.8+ MB


In [0]:
#number of sellers
len(df['shopid'].unique())

18770

In [0]:
df['event_time'] = pd.to_datetime(df['event_time']) #converto to type Timestamp

In [0]:
shopid = []
userid = []
check = None
for sellers in df['shopid'].unique():
    seller = df.loc[df['shopid'] == sellers]
    if len(seller) >= 3:
        for key, val in seller.iterrows():
            start_date = val['event_time']
            end_date = start_date + timedelta(hours=1) #add 1 hour from the start time
            # check the number of orders that occur during one hour
            sub_seller = seller.loc[(seller['event_time'] >= start_date) & (seller['event_time'] <= end_date )]
            rate = len(sub_seller)/len(sub_seller['userid'].unique()) # Concentrate rate formula
            # a shop that is considered to have ordered a brush if the rate is greater than or equal to 3
            if rate >= 3: 
                for user in sub_seller['userid'].unique():
                    shopid.append(sellers) #put in a list of suspects
                    userid.append(user) # #put in a list of suspects

In [0]:
data = {'shopid':shopid, 'userid':userid} #get list of suspects
for i in  set(df.set_index('shopid').drop(result['shopid'], axis=0).index):
    data['shopid'].append(i) #put in a list of unsuspected
    data['userid'].append(0) #put in a list of unsuspected

In [0]:
result= pd.DataFrame(data).sort_values('shopid') #create result in  dataFrame
result['userid'] = result['userid'].apply(str) #convert to type string
result.drop_duplicates(inplace=True, ignore_index=True) #romove same value 
result = result.groupby('shopid')['userid'].apply(' & '.join).reset_index() #separate the userid list that is more than 1

list of sellers who are suspected of carrying out a binding order and their userID, value 0 is considered to be a store that does not order brushing

In [0]:
result

Unnamed: 0,shopid,userid
0,10009,0
1,10051,0
2,10061,0
3,10084,0
4,10100,0
...,...,...
18765,214662358,0
18766,214949521,0
18767,214964814,0
18768,215175775,0


In [0]:
#list of sellers who are suspected 
result[result['userid'] != '0']

Unnamed: 0,shopid,userid
40,10402,77819
57,10536,672345
111,42472,740844
114,42818,170385453
129,76934,190449497
...,...,...
17401,203531250,114282846
17960,204225676,198662175
18155,208696908,214111334
18557,210197928,52867898


In [0]:
#list of suspected sellers with more than 1 userid
result[result['userid'].str.contains('&')]

Unnamed: 0,shopid,userid
344,823357,133545410 & 188942105
990,8996761,13135622 & 162508227 & 137245836 & 215382704 &...
1474,16001939,205729485 & 1024838
3189,51134277,29857724 & 212200633
3341,54257623,1974334 & 107414154
3911,64394533,92111793 & 194833170
5955,98481320,96474917 & 124597967
8621,136564914,178491887 & 191211430
9045,143281052,99517130 & 186080843
9208,145777302,107406 & 101582282 & 201343856
