# <a id='toc1_'></a>[Data Unification - Orders](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Data Unification - Orders](#toc1_)    
  - [Import libraries and PyPower](#toc1_1_)    
  - [Import Data](#toc1_2_)    
  - [Data Source 1 - Otter](#toc1_3_)    
    - [New colums](#toc1_3_1_)    
      - ['vat'](#toc1_3_1_1_)    
      - ['software'](#toc1_3_1_2_)    
      - ['payment_method'](#toc1_3_1_3_)    
      - ['driver_tip'](#toc1_3_1_4_)    
      - ['quantity'](#toc1_3_1_5_)    
    - [Order columns](#toc1_3_2_)    
    - [Final DataFrame](#toc1_3_3_)    
  - [Data Source 2 - Deliverect](#toc1_4_)    
    - [New colums](#toc1_4_1_)    
      - ['brand'](#toc1_4_1_1_)    
      - ['quantity'](#toc1_4_1_2_)    
      - ['tip'](#toc1_4_1_3_)    
      - ['discount'](#toc1_4_1_4_)    
      - ['order_type'](#toc1_4_1_5_)    
      - ['delivery_timing'](#toc1_4_1_6_)    
      - ['software'](#toc1_4_1_7_)    
      - ['software'](#toc1_4_1_8_)    
      - ['location'](#toc1_4_1_9_)    
    - [Order columns](#toc1_4_2_)    
    - [Final DataFrame](#toc1_4_3_)    
  - [Concatenate Data Source 1: Otter and Data Source 2: Deliverect](#toc1_5_)    
  - [Export data](#toc1_6_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_1_'></a>[Import libraries and PyPower](#toc0_)

In [166]:
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)


import sys
sys.path.append('../Python')
from  PyPower import *

## <a id='toc1_2_'></a>[Import Data](#toc0_)

In [167]:
otter = pd.read_csv('../../data/2-analyzed/otter_EDA.csv')

otter_item = pd.read_csv('../../data/2-analyzed/otter_item_EDA.csv')

deliverect = pd.read_csv('../../data/2-analyzed/deliverect_EDA.csv')

deliverect_item = pd.read_csv('../../data/2-analyzed/deliverect_item_EDA.csv')

## <a id='toc1_3_'></a>[Data Source 1 - Otter](#toc0_)

In [168]:
otter.head()

Unnamed: 0,created_datetime,order_id_software,channel,status,brand,location,quantity,subtotal,tip,discount,total,order_type,delivery_timing
0,"31/10/2023, 23:22:36",457EE,Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,2,46.0,0.0,-12.64,23.73,DELIVERY,Immediately
1,"31/10/2023, 23:19:01",0390E,Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,4,62.8,0.0,-25.27,28.16,DELIVERY,Immediately
2,"31/10/2023, 23:15:19",EBBB9,Uber Eats,DELIVERED,Tepuy Basics,Umbrella - Goleta,3,30.6,0.0,0.0,30.6,DELIVERY,Immediately
3,"31/10/2023, 23:14:29",58A44,Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,2,45.8,0.0,-12.64,23.6,DELIVERY,Immediately
4,"31/10/2023, 23:01:13",776960000,Gloria Foods,DELIVERED,Umbrella SH,Umbrella - Goleta,6,44.0,0.0,0.0,44.0,PICK UP,Immediately


In [169]:
otter.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55678 entries, 0 to 55677
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   created_datetime   55678 non-null  object 
 1   order_id_software  55678 non-null  object 
 2   channel            55678 non-null  object 
 3   status             55678 non-null  object 
 4   brand              55678 non-null  object 
 5   location           55678 non-null  object 
 6   quantity           55678 non-null  int64  
 7   subtotal           55678 non-null  float64
 8   tip                55678 non-null  float64
 9   discount           55678 non-null  float64
 10  total              55678 non-null  float64
 11  order_type         55678 non-null  object 
 12  delivery_timing    55678 non-null  object 
dtypes: float64(4), int64(1), object(8)
memory usage: 5.5+ MB


### <a id='toc1_3_1_'></a>[New colums](#toc0_)

#### <a id='toc1_3_1_1_'></a>['vat'](#toc0_)
Software does not provide the tax data, but we created the column to unify the information in the database.

In [170]:
otter['vat']=0

#### <a id='toc1_3_1_2_'></a>['software'](#toc0_)
To identify data source

In [171]:
otter['software'] = 'Otter'

#### <a id='toc1_3_1_3_'></a>['payment_method'](#toc0_)
Filled with random data proportion 95% CARD, 5% CASH

In [172]:
otter['payment_method'] =  fill_random(otter.shape[0], 42, ['CARD', 'CASH'], [0.95, 0.05])

#### <a id='toc1_3_1_4_'></a>['driver_tip'](#toc0_)

In [173]:
otter['driver_tip']=0

#### <a id='toc1_3_1_5_'></a>['quantity'](#toc0_)
Fill quantity with the sum of otter_item items as soon as their 'order_id_software' matches

In [174]:
merged_df = pd.merge(otter, otter_item.groupby('order_id_software')['quantity'].sum().reset_index(), on='order_id_software', how='left')
otter['quantity'] = merged_df['quantity_x']

Example for verification

In [175]:
otter_item[(otter_item.order_id_software == 'F1ACE')]['quantity'].sum()

22

In [176]:
otter[otter.order_id_software=='F1ACE']['quantity']

881    22
Name: quantity, dtype: int64

### <a id='toc1_3_2_'></a>[Order columns](#toc0_)

In [177]:
otter = otter[['order_id_software', 'created_datetime', 'channel', 'status', 'brand', 'location', 'quantity', 'subtotal',
                'tip', 'driver_tip', 'discount', 'total', 'order_type', 'delivery_timing', 'payment_method', 'vat', 'software' ]]

### <a id='toc1_3_3_'></a>[Final DataFrame](#toc0_)

In [178]:
otter.head()

Unnamed: 0,order_id_software,created_datetime,channel,status,brand,location,quantity,subtotal,tip,driver_tip,discount,total,order_type,delivery_timing,payment_method,vat,software
0,457EE,"31/10/2023, 23:22:36",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,2,46.0,0.0,0,-12.64,23.73,DELIVERY,Immediately,CARD,0,Otter
1,0390E,"31/10/2023, 23:19:01",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,4,62.8,0.0,0,-25.27,28.16,DELIVERY,Immediately,CASH,0,Otter
2,EBBB9,"31/10/2023, 23:15:19",Uber Eats,DELIVERED,Tepuy Basics,Umbrella - Goleta,3,30.6,0.0,0,0.0,30.6,DELIVERY,Immediately,CARD,0,Otter
3,58A44,"31/10/2023, 23:14:29",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,2,45.8,0.0,0,-12.64,23.6,DELIVERY,Immediately,CARD,0,Otter
4,776960000,"31/10/2023, 23:01:13",Gloria Foods,DELIVERED,Umbrella SH,Umbrella - Goleta,6,44.0,0.0,0,0.0,44.0,PICK UP,Immediately,CARD,0,Otter


## <a id='toc1_4_'></a>[Data Source 2 - Deliverect](#toc0_)

In [179]:
deliverect.head()

Unnamed: 0,created_datetime,location,order_id_software,channel,status,payment_method,total,subtotal,vat
0,2023-04-30 19:56:04,Alicante - Periodista Rodolfo Salazar,128399797,Just Eat,DELIVERED,CARD,35.09,33.1,3.01
1,2023-04-30 12:51:20,Alicante - Periodista Rodolfo Salazar,128359424,Just Eat,DELIVERED,CARD,27.29,25.3,2.3
2,2023-04-29 21:25:39,Alicante - Periodista Rodolfo Salazar,128335005,Just Eat,DELIVERED,CARD,41.19,39.2,3.57
3,2023-04-29 20:49:51,Alicante - Periodista Rodolfo Salazar,128328474,Just Eat,DELIVERED,CARD,63.69,61.7,5.62
4,2023-04-29 13:16:34,Alicante - Periodista Rodolfo Salazar,128288843,Just Eat,DELIVERED,CARD,67.99,66.0,5.99


In [180]:
deliverect.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 595 entries, 0 to 594
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   created_datetime   595 non-null    object 
 1   location           595 non-null    object 
 2   order_id_software  595 non-null    int64  
 3   channel            595 non-null    object 
 4   status             595 non-null    object 
 5   payment_method     595 non-null    object 
 6   total              595 non-null    float64
 7   subtotal           595 non-null    float64
 8   vat                595 non-null    float64
dtypes: float64(3), int64(1), object(5)
memory usage: 42.0+ KB


### <a id='toc1_4_1_'></a>[New colums](#toc0_)

#### <a id='toc1_4_1_1_'></a>['brand'](#toc0_)

In [181]:
deliverect['brand'] = 'Tepuy Burger'

#### <a id='toc1_4_1_2_'></a>['quantity'](#toc0_)

For this column we need to deliverect items, so we can apply a summation 

In [182]:
deliverect.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 595 entries, 0 to 594
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   created_datetime   595 non-null    object 
 1   location           595 non-null    object 
 2   order_id_software  595 non-null    int64  
 3   channel            595 non-null    object 
 4   status             595 non-null    object 
 5   payment_method     595 non-null    object 
 6   total              595 non-null    float64
 7   subtotal           595 non-null    float64
 8   vat                595 non-null    float64
 9   brand              595 non-null    object 
dtypes: float64(3), int64(1), object(6)
memory usage: 46.6+ KB


In [183]:
merged_df = pd.merge(deliverect, deliverect_item.groupby('order_id_software')['quantity'].sum().reset_index(), on='order_id_software', how='left')
deliverect['quantity'] = merged_df['quantity']

In [None]:
#Change order_id_software type (df, column, type_from, type_to):

deliverect = change_type(deliverect, 'order_id_software', 'int64', 'object')

Example for verification

In [185]:
deliverect_item[deliverect_item.order_id_software == 130370486]

Unnamed: 0,order_id_software,item,price,quantity
1440,130370486,Tequeños,990,1
1441,130370486,Cheesebacon,1250,1
1442,130370486,Mixto de patatas y boniato,150,1
1443,130370486,Una más de ternera gallega,250,1
1444,130370486,Huevo Frito,100,1


In [186]:
deliverect[deliverect.order_id_software ==130370486]

Unnamed: 0,created_datetime,location,order_id_software,channel,status,payment_method,total,subtotal,vat,brand,quantity
392,2023-06-02 19:28:15,Alicante - Periodista Rodolfo Salazar,130370486,Just Eat,DELIVERED,CARD,29.39,27.4,2.5,Tepuy Burger,5.0


#### <a id='toc1_4_1_3_'></a>['tip'](#toc0_)

In [187]:
deliverect['tip'] = 0

#### <a id='toc1_4_1_4_'></a>['discount'](#toc0_)

In [188]:
deliverect['discount'] = 0

#### <a id='toc1_4_1_5_'></a>['order_type'](#toc0_)

In [189]:
deliverect['order_type'] = 'DELIVERY'

#### <a id='toc1_4_1_6_'></a>['delivery_timing'](#toc0_)

In [190]:
deliverect['delivery_timing'] = 'Immediately'

#### <a id='toc1_4_1_7_'></a>['software'](#toc0_)

In [191]:
deliverect['software'] = 'Deliverect'

#### <a id='toc1_4_1_8_'></a>['software'](#toc0_)

In [192]:
deliverect['driver_tip'] = 0

#### <a id='toc1_4_1_9_'></a>['location'](#toc0_)

In [193]:
deliverect.location.value_counts()

location
Alicante - Periodista Rodolfo Salazar    549
Alicante - Bazan                          46
Name: count, dtype: int64

In [194]:
deliverect.location = deliverect.location.apply(replace_to,  change='Alicante - Bazan', change_to='Tepuy Burger - Calle Bazan')
deliverect.location = deliverect.location.apply(replace_to,  change='Alicante - Periodista Rodolfo Salazar', change_to='Tepuy Burger - Rodolfo Salazar')

### <a id='toc1_4_2_'></a>[Order columns](#toc0_)

In [195]:
deliverect = deliverect[['order_id_software', 'created_datetime', 'channel', 'status', 'brand', 'location', 'quantity', 'subtotal',
                'tip', 'driver_tip', 'discount', 'total', 'order_type', 'delivery_timing', 'payment_method', 'vat', 'software' ]]

### <a id='toc1_4_3_'></a>[Final DataFrame](#toc0_)

In [196]:
deliverect.head()

Unnamed: 0,order_id_software,created_datetime,channel,status,brand,location,quantity,subtotal,tip,driver_tip,discount,total,order_type,delivery_timing,payment_method,vat,software
0,128399797,2023-04-30 19:56:04,Just Eat,DELIVERED,Tepuy Burger,Tepuy Burger - Rodolfo Salazar,5.0,33.1,0,0,0,35.09,DELIVERY,Immediately,CARD,3.01,Deliverect
1,128359424,2023-04-30 12:51:20,Just Eat,DELIVERED,Tepuy Burger,Tepuy Burger - Rodolfo Salazar,3.0,25.3,0,0,0,27.29,DELIVERY,Immediately,CARD,2.3,Deliverect
2,128335005,2023-04-29 21:25:39,Just Eat,DELIVERED,Tepuy Burger,Tepuy Burger - Rodolfo Salazar,8.0,39.2,0,0,0,41.19,DELIVERY,Immediately,CARD,3.57,Deliverect
3,128328474,2023-04-29 20:49:51,Just Eat,DELIVERED,Tepuy Burger,Tepuy Burger - Rodolfo Salazar,9.0,61.7,0,0,0,63.69,DELIVERY,Immediately,CARD,5.62,Deliverect
4,128288843,2023-04-29 13:16:34,Just Eat,DELIVERED,Tepuy Burger,Tepuy Burger - Rodolfo Salazar,7.0,66.0,0,0,0,67.99,DELIVERY,Immediately,CARD,5.99,Deliverect


## <a id='toc1_5_'></a>[Concatenate Data Source 1: Otter and Data Source 2: Deliverect](#toc0_)

In [197]:
print('Otter rows: ', otter.shape[0], ' Deliverect rows:', deliverect.shape[0], 'Suma:' , otter.shape[0]+deliverect.shape[0])

Otter rows:  55678  Deliverect rows: 595 Suma: 56273


In [198]:
all = pd.concat([otter, deliverect])

all.shape

(56273, 17)

## <a id='toc1_6_'></a>[Export data](#toc0_)

In [199]:
all.to_csv('../../data/3-processed/all_orders.csv',  index=False)