# Shopee Code League 5: Logistics
### Data Analytics

### Task
Identify all the orders that are considered late depending on the Service Level Agreements (SLA) with our Logistics Provider.

For the purpose of this question, assume that all deliveries are considered successful by the second attempt.


[Link to kaggle](https://www.kaggle.com/c/open-shopee-code-league-logistic/overview)


#### 2nd approach using `numpy.busday_count()`

In [1]:
# Importing libraries
import pandas as pd
import numpy as np
from datetime import datetime

## 1 Data exploration
- Load data
- Data preprocessing

In [12]:
# dataset that contains the order & delivery info
# Note due the file size exceeded 100MB, this file not in github repo folder.
order = pd.read_csv('../../gitignore_largefile/delivery_orders_march.csv')
order.head()

Unnamed: 0,orderid,pick,1st_deliver_attempt,2nd_deliver_attempt,buyeraddress,selleraddress
0,2215676524,1583138397,1583385000.0,,"Baging ldl BUENAVISTA,PATAG.CAGAYAN Buagsong,c...",Pantranco vill. 417 Warehouse# katipunan 532 (...
1,2219624609,1583309968,1583463000.0,1583799000.0,coloma's quzom CASANAS Site1 Masiyan 533A Stol...,"BLDG 210A Moras C42B 2B16,168 church) Complex ..."
2,2220979489,1583306434,1583460000.0,,"21-O LumangDaan,Capitangan,Abucay,Bataan .Bign...","#66 150-C, DRIVE, Milagros Joe socorro Metro M..."
3,2221066352,1583419016,1583556000.0,,"616Espiritu MARTINVILLE,MANUYO #5paraiso kengi...","999maII 201,26 Villaruel Barretto gen.t number..."
4,2222478803,1583318305,1583480000.0,,L042 Summerbreezee1 L2(Balanay analyn Lot760 C...,G66MANILA Hiyas Fitness MAYSILO magdiwang Lt.4...


Get the seller and buyer location

In [13]:
order['buyer'] = [address.split()[-1].lower() for address in order['buyeraddress']]
order['seller'] = [address.split()[-1].lower() for address in order['selleraddress']]

order['seller_buyer'] = [i + " " + j for i,j in zip(order['seller'], order['buyer'])]

In [4]:
# Find the unique combination of seller buyer:
set(order['seller_buyer'])

{'luzon luzon',
 'manila luzon',
 'manila manila',
 'manila mindanao',
 'manila visayas'}

In [5]:
# load the SLA matrix data
pd.read_excel('./open-shopee-code-league-logistic/SLA_matrix.xlsx')

Unnamed: 0,1st Attempt SLA\n(Working Days),Unnamed: 1,Destination (Buyer),Unnamed: 3,Unnamed: 4,Unnamed: 5
0,,,Metro Manila,Luzon,Visayas,Mindanao
1,Origin\n(Seller),Metro Manila,3 working days,5 working days,7 working days,7 working days
2,,Luzon,5 working days,5 working days,7 working days,7 working days
3,,Visayas,7 working days,7 working days,7 working days,7 working days
4,,Mindanao,7 working days,7 working days,7 working days,7 working days
5,,,,,,
6,"Working Days are defined as Mon - Sat, Excludi...",,,,,
7,SLA calculation begins from the next day after...,,,,,
8,2nd Attempt must be no later than 3 working da...,,,,,


In [6]:
# from the SLA matrix, get the SLA for based on the unique seller_buyer combination
sla_dict= {'manila manila': 3,
       'manila luzon': 5,
       'manila visayas': 7,
       'manila mindanao': 7,
       'luzon luzon': 5}

## 2 Data analysis

In [15]:
# list of timestamp column
order_col = ['pick', '1st_deliver_attempt', '2nd_deliver_attempt']

In [16]:
order[order_col[-1]] = order['2nd_deliver_attempt'].fillna(0).astype('int')

In [25]:
# convert unix datetime(seconds)stamps to unix datetime(date)stamps
GMT8_OFFSET = 3600 * 8 #timezone offset
DURATION_1DAY = 3600*24

#the floor division // rounds the result down to the nearest whole number
order[order_col] = (order[order_col] + GMT8_OFFSET) // DURATION_1DAY
order.head()

Unnamed: 0,orderid,pick,1st_deliver_attempt,2nd_deliver_attempt,buyeraddress,selleraddress,buyer,seller,seller_buyer
0,2215676524,18323,18326.0,0,"Baging ldl BUENAVISTA,PATAG.CAGAYAN Buagsong,c...",Pantranco vill. 417 Warehouse# katipunan 532 (...,manila,manila,manila manila
1,2219624609,18325,18327.0,18331,coloma's quzom CASANAS Site1 Masiyan 533A Stol...,"BLDG 210A Moras C42B 2B16,168 church) Complex ...",manila,manila,manila manila
2,2220979489,18325,18327.0,0,"21-O LumangDaan,Capitangan,Abucay,Bataan .Bign...","#66 150-C, DRIVE, Milagros Joe socorro Metro M...",manila,manila,manila manila
3,2221066352,18326,18328.0,0,"616Espiritu MARTINVILLE,MANUYO #5paraiso kengi...","999maII 201,26 Villaruel Barretto gen.t number...",manila,manila,manila manila
4,2222478803,18325,18327.0,0,L042 Summerbreezee1 L2(Balanay analyn Lot760 C...,G66MANILA Hiyas Fitness MAYSILO magdiwang Lt.4...,luzon,manila,manila luzon


In [41]:
pd.to_datetime(order['pick'][0], unit = 'D').date()

datetime.date(2020, 3, 2)

In [43]:
# convert unix datetime(date) stamps to date strings
for col in order_col:
    order[col] = pd.to_datetime(order[col], unit = 'D').dt.date

### compute number of working days between time intervals.   
`np.busday_count()` Counts the number of valid days between `begindates` and
`enddates`, not including the day of `enddates`.    
busday_count(begindates, enddates, weekmask='1111100', holidays=[], busdaycal=None, out=None)

In [50]:
workdays = '1111110'
holidays = ['2020-03-08', '2020-03-25', '2020-03-30', '2020-03-31']

order['num_1stdelivery'] = np.busday_count(order['pick'], order['1st_deliver_attempt'], weekmask = workdays,
               holidays = holidays)
order['num_2nddelivery'] = np.busday_count(order['1st_deliver_attempt'], order['2nd_deliver_attempt'],
                            weekmask = workdays, holidays = holidays)

### Get the SLA based on the seller_buyer and SLA_dict

In [55]:
# drop unwanted columns
order = order[['orderid', 'pick', '1st_deliver_attempt', '2nd_deliver_attempt','seller_buyer', 'num_1stdelivery', 'num_2nddelivery']]

In [59]:
# add in another column to get the SLA based on seller_buyer matrix
order['sla'] = [sla_dict[seller_buyer] for seller_buyer in order['seller_buyer']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [62]:
order.head(10)

Unnamed: 0,orderid,pick,1st_deliver_attempt,2nd_deliver_attempt,seller_buyer,num_1stdelivery,num_2nddelivery,sla
0,2215676524,2020-03-02,2020-03-05,1970-01-01,manila manila,3,-15708,3
1,2219624609,2020-03-04,2020-03-06,2020-03-10,manila manila,2,3,3
2,2220979489,2020-03-04,2020-03-06,1970-01-01,manila manila,2,-15709,3
3,2221066352,2020-03-05,2020-03-07,1970-01-01,manila manila,2,-15710,3
4,2222478803,2020-03-04,2020-03-06,1970-01-01,manila luzon,2,-15709,5
5,2222597288,2020-03-04,2020-03-07,1970-01-01,manila manila,3,-15710,3
6,2222738456,2020-03-02,2020-03-05,2020-03-09,manila manila,3,3,3
7,2224695304,2020-03-02,2020-03-10,1970-01-01,manila manila,7,-15712,3
8,2224704587,2020-03-04,2020-03-05,2020-03-09,manila luzon,1,3,5
9,2225138267,2020-03-04,2020-03-10,1970-01-01,manila visayas,5,-15712,7


In [67]:
mask_1st_late = order['num_1stdelivery'] > order['sla']
mask_2nd_late = order['num_2nddelivery'] > 3


In [76]:
# apply both mask to get the boolean, and convert boolean to numbers (0,1)
order['is_late'] = (mask_1st_late | mask_2nd_late).astype('int')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [79]:
# create another submission dataframe from order dataframe
submission = order[['orderid', 'is_late']]
submission.head(30)

Unnamed: 0,orderid,is_late
0,2215676524,0
1,2219624609,0
2,2220979489,0
3,2221066352,0
4,2222478803,0
5,2222597288,0
6,2222738456,0
7,2224695304,1
8,2224704587,0
9,2225138267,0


In [80]:
# export to csv
# score = 1 
submission.to_csv('submission_v2.csv', index=False)