# <a id='toc1_'></a>[Database Structure](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Database Structure](#toc1_)    
  - [Import libraries and Python files](#toc1_1_)    
  - [Import Data](#toc1_2_)    
  - [Create new dataframes](#toc1_3_)    
    - [Orders](#toc1_3_1_)    
      - [brand DataFrame](#toc1_3_1_1_)    
      - [location DataFrame](#toc1_3_1_2_)    
      - [channel DataFrame](#toc1_3_1_3_)    
      - [status DataFrame](#toc1_3_1_4_)    
      - [order_type DataFrame](#toc1_3_1_5_)    
      - [software DataFrame](#toc1_3_1_6_)    
      - [delivery_timing DataFrame](#toc1_3_1_7_)    
    - [Items](#toc1_3_2_)    
      - [menu DataFrame](#toc1_3_2_1_)    
  - [Ids association in Orders](#toc1_4_)    
    - [Fill each column with its associated id in the corresponding table](#toc1_4_1_)    
      - [Brand](#toc1_4_1_1_)    
      - [Location](#toc1_4_1_2_)    
      - [Channel](#toc1_4_1_3_)    
      - [Status](#toc1_4_1_4_)    
      - [Order_type](#toc1_4_1_5_)    
      - [Software](#toc1_4_1_6_)    
      - [Delivery_timing](#toc1_4_1_7_)    
    - [Create table location_brand](#toc1_4_2_)    
  - [Ids association in Items](#toc1_5_)    
  - [Create own id for the orders table and associate it to the items table](#toc1_6_)    
  - [Convert ids to integer](#toc1_7_)    
  - [Drop columns](#toc1_8_)    
  - [Check and drop Nan](#toc1_9_)    
  - [Order DataFrames](#toc1_10_)    
  - [Export DataFrames](#toc1_11_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_1_'></a>[Import libraries and Python files](#toc0_)

In [1]:
import pandas as pd
import numpy as np
import warnings

pd.set_option('display.max_columns', None)

warnings.filterwarnings('ignore')

import sys
sys.path.append('../Python')
from PyPower import *
from PySql import *

## <a id='toc1_2_'></a>[Import Data](#toc0_)

In [2]:
all_orders = pd.read_csv('../../data/3-processed/all_orders.csv')

all_items = pd.read_csv('../../data/3-processed/all_items.csv')

In [3]:
all_orders.head()

Unnamed: 0,order_id_software,created_datetime,channel,status,brand,location,quantity,subtotal,tip,driver_tip,discount,total,order_type,delivery_timing,payment_method,vat,software
0,457EE,"31/10/2023, 23:22:36",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,2.0,46.0,0.0,0,-12.64,23.73,DELIVERY,Immediately,CARD,0.0,Otter
1,0390E,"31/10/2023, 23:19:01",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,4.0,62.8,0.0,0,-25.27,28.16,DELIVERY,Immediately,CASH,0.0,Otter
2,EBBB9,"31/10/2023, 23:15:19",Uber Eats,DELIVERED,Tepuy Basics,Umbrella - Goleta,3.0,30.6,0.0,0,0.0,30.6,DELIVERY,Immediately,CARD,0.0,Otter
3,58A44,"31/10/2023, 23:14:29",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,2.0,45.8,0.0,0,-12.64,23.6,DELIVERY,Immediately,CARD,0.0,Otter
4,776960000,"31/10/2023, 23:01:13",Gloria Foods,DELIVERED,Umbrella SH,Umbrella - Goleta,6.0,44.0,0.0,0,0.0,44.0,PICK UP,Immediately,CARD,0.0,Otter


## <a id='toc1_3_'></a>[Create new dataframes](#toc0_)

### <a id='toc1_3_1_'></a>[Orders](#toc0_)

#### <a id='toc1_3_1_1_'></a>[brand DataFrame](#toc0_)

In [4]:
brand = new_table(all_orders, 'brand', 'brand_id','brand')

brand

Unnamed: 0,brand_id,brand
0,1,Umbrella SH
1,2,Tepuy Basics
2,3,Tepuy Burger
3,4,Arepas Doña Emilia


#### <a id='toc1_3_1_2_'></a>[location DataFrame](#toc0_)

In [5]:
location = new_table(all_orders, 'location', 'location_id','location')

location

Unnamed: 0,location_id,location
0,1,Umbrella - Salamanca
1,2,Umbrella - Goleta
2,3,Umbrella - San Vicent del Raspeig
3,4,Umbrella - Calle Linares
4,5,Tepuy Burger - Rodolfo Salazar
5,6,Umbrella - Tetuan
6,7,Tepuy Burger - Benidorm
7,8,Tepuy Burger - Calle Bazan


#### <a id='toc1_3_1_3_'></a>[channel DataFrame](#toc0_)

In [6]:
channel = new_table(all_orders, 'channel', 'channel_id','channel')

channel

Unnamed: 0,channel_id,channel
0,1,Uber Eats
1,2,Gloria Foods
2,3,Glovo
3,4,Just Eat
4,5,Direct Orders
5,6,Phone


#### <a id='toc1_3_1_4_'></a>[status DataFrame](#toc0_)

In [7]:
status = new_table(all_orders, 'status', 'status_id','status')

status

Unnamed: 0,status_id,status
0,1,DELIVERED
1,2,NEW
2,3,CANCELED
3,4,UNKNOWN
4,5,ACCEPTED


#### <a id='toc1_3_1_5_'></a>[order_type DataFrame](#toc0_)

In [8]:
order_type = new_table(all_orders, 'order_type', 'order_type_id','order_type')

order_type

Unnamed: 0,order_type_id,order_type
0,1,DELIVERY
1,2,PICK UP
2,3,OWN DELIVERY
3,4,UNKNOWN


#### <a id='toc1_3_1_6_'></a>[software DataFrame](#toc0_)

In [9]:
software = new_table(all_orders, 'software', 'software_id','software')

software

Unnamed: 0,software_id,software
0,1,Otter
1,2,Deliverect


#### <a id='toc1_3_1_7_'></a>[delivery_timing DataFrame](#toc0_)

In [10]:
delivery_timing = new_table(all_orders, 'delivery_timing', 'delivery_timing_id','delivery_timing')

delivery_timing

Unnamed: 0,delivery_timing_id,delivery_timing
0,1,Immediately
1,2,Scheduled
2,3,Unknown


### <a id='toc1_3_2_'></a>[Items](#toc0_)

#### <a id='toc1_3_2_1_'></a>[menu DataFrame](#toc0_)

In [11]:
menu =  new_table(all_items, 'item', 'item_id','item')

menu.head()

Unnamed: 0,item_id,item
0,1,Patatas fritas Pequeñas
1,2,Kahuna Burger
2,3,Sweet Home Sanvi
3,4,Chicken Tenders
4,5,Tequeños


## <a id='toc1_4_'></a>[Ids association in Orders](#toc0_)

In [12]:
#Create new columns

all_orders['brand_id'] = np.nan 
all_orders['location_id'] = np.nan
all_orders['channel_id'] = np.nan
all_orders['status_id'] = np.nan
all_orders['order_type_id'] = np.nan
all_orders['software_id'] = np.nan
all_orders['delivery_timing_id'] = np.nan

### <a id='toc1_4_1_'></a>[Fill each column with its associated id in the corresponding table](#toc0_)

#### <a id='toc1_4_1_1_'></a>[Brand](#toc0_)

In [13]:
all_orders =  all_orders.apply(fill_column, df_to_compare=brand, column_to_compare='brand', column_to_change='brand_id', axis=1)

#### <a id='toc1_4_1_2_'></a>[Location](#toc0_)

In [14]:
all_orders =  all_orders.apply(fill_column, df_to_compare=location, column_to_compare='location', column_to_change='location_id', axis=1)

#### <a id='toc1_4_1_3_'></a>[Channel](#toc0_)

In [15]:
all_orders = all_orders.apply(fill_column, df_to_compare=channel, column_to_compare='channel', column_to_change='channel_id', axis=1)

#### <a id='toc1_4_1_4_'></a>[Status](#toc0_)

In [16]:
all_orders = all_orders.apply(fill_column, df_to_compare=status, column_to_compare='status', column_to_change='status_id', axis=1)

#### <a id='toc1_4_1_5_'></a>[Order_type](#toc0_)

In [17]:
all_orders = all_orders.apply(fill_column, df_to_compare=order_type, column_to_compare='order_type', column_to_change='order_type_id', axis=1)

#### <a id='toc1_4_1_6_'></a>[Software](#toc0_)

In [18]:
all_orders = all_orders.apply(fill_column, df_to_compare=software, column_to_compare='software', column_to_change='software_id', axis=1)

#### <a id='toc1_4_1_7_'></a>[Delivery_timing](#toc0_)

In [19]:
all_orders = all_orders.apply(fill_column, df_to_compare=delivery_timing, column_to_compare='delivery_timing', column_to_change='delivery_timing_id', axis=1)

### <a id='toc1_4_2_'></a>[Create table location_brand](#toc0_)

In [34]:
location_brand = all_orders[['location_id', 'brand_id']].drop_duplicates()

In [33]:
location_brand.head()

Unnamed: 0,location_id,brand_id
0,1,1
2,2,2
4,2,1
6,3,1
8,4,1


## <a id='toc1_5_'></a>[Ids association in Items](#toc0_)

In [35]:
all_items.columns

Index(['item', 'quantity', 'price', 'total_item', 'order_id_software',
       'service_fee'],
      dtype='object')

In [36]:
all_items['item_id'] = np.nan 

In [37]:
all_items = all_items.apply(fill_column, df_to_compare=menu, column_to_compare='item', column_to_change='item_id', axis=1)

## <a id='toc1_6_'></a>[Create own id for the orders table and associate it to the items table](#toc0_)

In [38]:
all_orders['order_id'] = [e+1 for e in range(all_orders.shape[0])]

In [39]:
all_orders.head()

Unnamed: 0,order_id_software,created_datetime,channel,status,brand,location,quantity,subtotal,tip,driver_tip,discount,total,order_type,delivery_timing,payment_method,vat,software,brand_id,location_id,channel_id,status_id,order_type_id,software_id,delivery_timing_id,order_id
0,457EE,"31/10/2023, 23:22:36",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,2.0,46.0,0.0,0,-12.64,23.73,DELIVERY,Immediately,CARD,0.0,Otter,1,1,1,1,1,1,1,1
1,0390E,"31/10/2023, 23:19:01",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,4.0,62.8,0.0,0,-25.27,28.16,DELIVERY,Immediately,CASH,0.0,Otter,1,1,1,1,1,1,1,2
2,EBBB9,"31/10/2023, 23:15:19",Uber Eats,DELIVERED,Tepuy Basics,Umbrella - Goleta,3.0,30.6,0.0,0,0.0,30.6,DELIVERY,Immediately,CARD,0.0,Otter,2,2,1,1,1,1,1,3
3,58A44,"31/10/2023, 23:14:29",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Salamanca,2.0,45.8,0.0,0,-12.64,23.6,DELIVERY,Immediately,CARD,0.0,Otter,1,1,1,1,1,1,1,4
4,776960000,"31/10/2023, 23:01:13",Gloria Foods,DELIVERED,Umbrella SH,Umbrella - Goleta,6.0,44.0,0.0,0,0.0,44.0,PICK UP,Immediately,CARD,0.0,Otter,1,2,2,1,2,1,1,5


In [40]:
orders = all_orders.copy()
items= all_items.copy()

In [41]:
all_items['order_id'] = np.nan 

In [42]:
all_items = all_items.apply(fill_column, df_to_compare=all_orders, column_to_compare='order_id_software', column_to_change='order_id', axis=1)

Verification

In [44]:
all_orders[(all_orders.order_id==2089)]

Unnamed: 0,order_id_software,created_datetime,channel,status,brand,location,quantity,subtotal,tip,driver_tip,discount,total,order_type,delivery_timing,payment_method,vat,software,brand_id,location_id,channel_id,status_id,order_type_id,software_id,delivery_timing_id,order_id
2088,EC727,"22/10/2023, 18:15:56",Uber Eats,DELIVERED,Umbrella SH,Umbrella - Tetuan,3.0,20.4,0.0,0,0.0,13.49,DELIVERY,Immediately,CARD,0.0,Otter,1,6,1,1,1,1,1,2089


## <a id='toc1_7_'></a>[Convert ids to integer](#toc0_)

In [45]:
brand = change_type(brand, 'brand_id', 'object', 'int')
location = change_type(location, 'location_id', 'object', 'int')
channel = change_type(channel, 'channel_id', 'object', 'int')
status = change_type(status, 'status_id', 'object', 'int')
order_type = change_type(order_type, 'order_type_id', 'object', 'int')
software = change_type(software, 'software_id', 'object', 'int')
delivery_timing = change_type(delivery_timing, 'delivery_timing_id', 'object', 'int')
menu = change_type(menu, 'item_id', 'object', 'int')
location_brand = change_type(location_brand, 'location_id', 'object', 'int')
location_brand = change_type(location_brand, 'brand_id', 'object', 'int')


## <a id='toc1_8_'></a>[Drop columns](#toc0_)

In [46]:
all_orders.drop(columns = ['brand', 'brand_id' ,'location', 'channel', 'status', 'order_type', 'software', 'delivery_timing'], inplace=True)

## <a id='toc1_9_'></a>[Check and drop Nan](#toc0_)

In [58]:
check_nan(all_orders, False)

'N nan cols: 1'

quantity    0.005331
dtype: float64

In [60]:
all_orders.dropna(inplace=True)

## <a id='toc1_10_'></a>[Order DataFrames](#toc0_)

In [61]:
all_orders = all_orders[['order_id', 'order_id_software', 'created_datetime', 'quantity', 'subtotal', 'tip', 'driver_tip', 'discount', 'total', 
                        'payment_method', 'vat', 'location_id', 'channel_id', 'status_id', 'order_type_id', 'software_id', 'delivery_timing_id']]

In [62]:
all_items = all_items[['order_id', 'item_id', 'item', 'quantity', 'price', 'total_item', 'service_fee','order_id_software' ]]

## <a id='toc1_11_'></a>[Export DataFrames](#toc0_)

In [64]:
brand.to_csv('../../data/4-fill_db/brand.csv',  index=False)
location .to_csv('../../data/4-fill_db/location.csv',  index=False)
channel.to_csv('../../data/4-fill_db/channel.csv',  index=False)
status.to_csv('../../data/4-fill_db/status.csv',  index=False)
order_type.to_csv('../../data/4-fill_db/order_type.csv',  index=False)
software.to_csv('../../data/4-fill_db/software.csv',  index=False)
delivery_timing.to_csv('../../data/4-fill_db/delivery_timing.csv',  index=False)
menu.to_csv('../../data/4-fill_db/menu.csv',  index=False)
location_brand.to_csv('../../data/4-fill_db/location_brand.csv',  index=False)
all_items.to_csv('../../data/4-fill_db/item.csv',  index=False)
all_orders.to_csv('../../data/4-fill_db/order.csv',  index=False)