# How To Improve: An Analysis of Restaurant Orders With The Aim of Improving Customer Experience and Company Performance

The following is an analysis of the order history of two restaurants, with the aim of improving their service level and financial performance. To this end, the following themes will be explored:

- Pricing
- Inventory Management
- Facility Management

Data Import

In [0]:
# Creates an object used to load the data
sqlContext

Out[191]: <pyspark.sql.context.SQLContext at 0x7f5d6cafd610>

In [0]:
# Loading order data from both restaurants
res_one_orders = sqlContext.read.load('/FileStore/tables/restaurant_1_orders.csv', format='csv', header= True)
res_two_orders = sqlContext.read.load('/FileStore/tables/restaurant_2_orders.csv', format='csv', header= True)
# Loading product prices
res_one_prod_prices = sqlContext.read.load('/FileStore/tables/restaurant_1_products_price.csv', format='csv', header= True)
res_two_prod_prices = sqlContext.read.load('/FileStore/tables/restaurant_2_products_price.csv', format='csv', header= True)


## Data Cleaning

In [0]:
res_one_orders.show()

+------------+----------------+--------------------+--------+-------------+--------------+
|Order Number|      Order Date|           Item Name|Quantity|Product Price|Total products|
+------------+----------------+--------------------+--------+-------------+--------------+
|       16118|03/08/2019 20:25|       Plain Papadum|       2|          0.8|             6|
|       16118|03/08/2019 20:25|    King Prawn Balti|       1|        12.95|             6|
|       16118|03/08/2019 20:25|         Garlic Naan|       1|         2.95|             6|
|       16118|03/08/2019 20:25|       Mushroom Rice|       1|         3.95|             6|
|       16118|03/08/2019 20:25| Paneer Tikka Masala|       1|         8.95|             6|
|       16118|03/08/2019 20:25|       Mango Chutney|       1|          0.5|             6|
|       16117|03/08/2019 20:17|          Plain Naan|       1|          2.6|             7|
|       16117|03/08/2019 20:17|       Mushroom Rice|       1|         3.95|             7|

In [0]:
res_two_orders.show()

+--------+----------------+--------------------+--------+-------------+--------------+
|Order ID|      Order Date|           Item Name|Quantity|Product Price|Total products|
+--------+----------------+--------------------+--------+-------------+--------------+
|   25583|03/08/2019 21:58|Tandoori Mixed Grill|       1|        11.95|            12|
|   25583|03/08/2019 21:58|        Madras Sauce|       1|         3.95|            12|
|   25583|03/08/2019 21:58|       Mushroom Rice|       2|         3.95|            12|
|   25583|03/08/2019 21:58|         Garlic Naan|       1|         2.95|            12|
|   25583|03/08/2019 21:58|             Paratha|       1|         2.95|            12|
|   25583|03/08/2019 21:58|          Plain Rice|       1|         2.95|            12|
|   25583|03/08/2019 21:58|         Prawn Puree|       1|         4.95|            12|
|   25583|03/08/2019 21:58|       Plain Papadum|       1|          0.8|            12|
|   25583|03/08/2019 21:58|       Mango Chu

In [0]:
res_one_orders.printSchema()

root
 |-- Order Number: string (nullable = true)
 |-- Order Date: string (nullable = true)
 |-- Item Name: string (nullable = true)
 |-- Quantity: string (nullable = true)
 |-- Product Price: string (nullable = true)
 |-- Total products: string (nullable = true)



In [0]:
res_two_orders.printSchema()

root
 |-- Order ID: string (nullable = true)
 |-- Order Date: string (nullable = true)
 |-- Item Name: string (nullable = true)
 |-- Quantity: string (nullable = true)
 |-- Product Price: string (nullable = true)
 |-- Total products: string (nullable = true)



From the above we see that the two tables contain similar headers, and contain data in the same formats. However, the data type of every column in both tables is the string data type. We will there change the title of the 'Order Number' column in restaurant1_orders to 'Order ID' and cast the values of the tables to their respective appropriate data types.

In [0]:
res_one_orders = res_one_orders.withColumnRenamed("Order Number", "Order ID")
res_one_orders = res_one_orders.withColumnRenamed("Total products", "Total Products")
res_two_orders = res_two_orders.withColumnRenamed("Total products", "Total Products")
res_one_orders.show()

+--------+----------------+--------------------+--------+-------------+--------------+
|Order ID|      Order Date|           Item Name|Quantity|Product Price|Total Products|
+--------+----------------+--------------------+--------+-------------+--------------+
|   16118|03/08/2019 20:25|       Plain Papadum|       2|          0.8|             6|
|   16118|03/08/2019 20:25|    King Prawn Balti|       1|        12.95|             6|
|   16118|03/08/2019 20:25|         Garlic Naan|       1|         2.95|             6|
|   16118|03/08/2019 20:25|       Mushroom Rice|       1|         3.95|             6|
|   16118|03/08/2019 20:25| Paneer Tikka Masala|       1|         8.95|             6|
|   16118|03/08/2019 20:25|       Mango Chutney|       1|          0.5|             6|
|   16117|03/08/2019 20:17|          Plain Naan|       1|          2.6|             7|
|   16117|03/08/2019 20:17|       Mushroom Rice|       1|         3.95|             7|
|   16117|03/08/2019 20:17|Tandoori Chicken

In [0]:
from pyspark.sql.functions import col, rtrim, ltrim

res_one_orders = res_one_orders.withColumn("Order ID", rtrim(ltrim(col("Order ID"))))
res_two_orders = res_two_orders.withColumn("Order ID", rtrim(ltrim(col("Order ID"))))

res_one_orders = res_one_orders.withColumn("Item Name", rtrim(ltrim(col("Item Name"))))
res_two_orders = res_two_orders.withColumn("Item Name", rtrim(ltrim(col("Item Name"))))

In [0]:
from pyspark.sql.functions import to_timestamp
res_one_orders = res_one_orders.withColumn("Date", to_timestamp(res_one_orders["Order Date"], "dd/MM/yyyy HH:mm"))
res_two_orders = res_two_orders.withColumn("Date", to_timestamp(res_two_orders["Order Date"], "dd/MM/yyyy HH:mm"))

res_one_orders = res_one_orders.withColumn("Order Date", res_one_orders["Date"])
res_one_orders = res_one_orders.drop("Date")
res_two_orders = res_two_orders.withColumn("Order Date", res_two_orders["Date"])
res_two_orders = res_two_orders.drop("Date")

In [0]:
# Check for non-whole numbers in 'Quantity' and 'Total Products'
res_one_orders.select(res_one_orders['Quantity'] % 1 != 0).count() == res_one_orders.count()

Out[200]: True

In [0]:
res_two_orders.select(res_two_orders["Quantity"] % 1 != 0).count() == res_two_orders.count()

Out[201]: True

In [0]:
res_one_orders.select(res_one_orders["Total Products"] % 1 != 0).count() == res_one_orders.count()

Out[202]: True

In [0]:
res_two_orders.select(res_two_orders["Total Products"] % 1 != 0).count() == res_two_orders.count()

Out[203]: True

In [0]:
res_one_orders = res_one_orders.withColumn("Quantity", res_one_orders['Quantity'].cast('int'))
res_two_orders = res_two_orders.withColumn("Quantity", res_two_orders['Quantity'].cast('int'))
res_one_orders = res_one_orders.withColumn("Total Products", res_one_orders['Total Products'].cast('int'))
res_two_orders = res_two_orders.withColumn("Total Products", res_two_orders['Total Products'].cast('int'))

In [0]:
res_one_orders = res_one_orders.withColumn("Product Price", res_one_orders['Product Price'].cast('float'))
res_two_orders = res_two_orders.withColumn("Product Price", res_two_orders['Product Price'].cast('float'))

In [0]:
res_one_orders.printSchema()
res_two_orders.printSchema()

root
 |-- Order ID: string (nullable = true)
 |-- Order Date: timestamp (nullable = true)
 |-- Item Name: string (nullable = true)
 |-- Quantity: integer (nullable = true)
 |-- Product Price: float (nullable = true)
 |-- Total Products: integer (nullable = true)

root
 |-- Order ID: string (nullable = true)
 |-- Order Date: timestamp (nullable = true)
 |-- Item Name: string (nullable = true)
 |-- Quantity: integer (nullable = true)
 |-- Product Price: float (nullable = true)
 |-- Total Products: integer (nullable = true)



In [0]:
display(res_one_orders)

Order ID,Order Date,Item Name,Quantity,Product Price,Total Products
16118,2019-08-03T20:25:00.000+0000,Plain Papadum,2,0.8,6
16118,2019-08-03T20:25:00.000+0000,King Prawn Balti,1,12.95,6
16118,2019-08-03T20:25:00.000+0000,Garlic Naan,1,2.95,6
16118,2019-08-03T20:25:00.000+0000,Mushroom Rice,1,3.95,6
16118,2019-08-03T20:25:00.000+0000,Paneer Tikka Masala,1,8.95,6
16118,2019-08-03T20:25:00.000+0000,Mango Chutney,1,0.5,6
16117,2019-08-03T20:17:00.000+0000,Plain Naan,1,2.6,7
16117,2019-08-03T20:17:00.000+0000,Mushroom Rice,1,3.95,7
16117,2019-08-03T20:17:00.000+0000,Tandoori Chicken (1/4),1,4.95,7
16117,2019-08-03T20:17:00.000+0000,Vindaloo - Lamb,1,7.95,7


In [0]:
display(res_two_orders)

Order ID,Order Date,Item Name,Quantity,Product Price,Total Products
25583,2019-08-03T21:58:00.000+0000,Tandoori Mixed Grill,1,11.95,12
25583,2019-08-03T21:58:00.000+0000,Madras Sauce,1,3.95,12
25583,2019-08-03T21:58:00.000+0000,Mushroom Rice,2,3.95,12
25583,2019-08-03T21:58:00.000+0000,Garlic Naan,1,2.95,12
25583,2019-08-03T21:58:00.000+0000,Paratha,1,2.95,12
25583,2019-08-03T21:58:00.000+0000,Plain Rice,1,2.95,12
25583,2019-08-03T21:58:00.000+0000,Prawn Puree,1,4.95,12
25583,2019-08-03T21:58:00.000+0000,Plain Papadum,1,0.8,12
25583,2019-08-03T21:58:00.000+0000,Mango Chutney,2,0.5,12
25583,2019-08-03T21:58:00.000+0000,Onion Chutney,1,0.5,12


## Data Analysis

### Seasonality

In [0]:
from pyspark.sql.functions import dayofweek
revenue_over_time = res_one_orders.withColumn("Total Price", col("Quantity") * col("Product Price")).withColumn("Day Of Week", dayofweek(col("Order Date")))
display(revenue_over_time.groupBy("Day Of Week").sum())

Day Of Week,sum(Quantity),sum(Product Price),sum(Total Products),sum(Total Price),sum(Day Of Week)
1,13769,59431.74911546707,74977,67316.94906365871,11112
6,21357,89918.99865031242,121335,100681.14859724043,102972
3,7742,34149.29949915409,40256,38262.649486243725,18999
5,9317,40458.59939599037,48613,46177.54937493801,37555
4,8487,36524.89945584536,44223,41376.199437618256,27280
7,24108,100808.59852343798,146827,112513.64848744868,136115
2,8261,34232.59949231148,42365,40189.29949456453,12870


Databricks visualization. Run in Databricks to view.

### Facility Management