## Introduction

Imagine you work for a company called NV-WholeSale, a wholesale food products distribution company, with hundreds of locations spread across the United States. NV-WholeSale offers low warehouse prices for quality products, as a result most restaurants in the vicinity purchase groceries, raw materials, disposable kitchenware, and other essentials in bulk from NV-WholeSale through online orders. The company has established Distribution Centers (DCs) in most cities, and raw materials are transported daily from these DCs to restaurants, commercial clients, and healthcare facilities.

As a member of the Logistics Optimization team, your role is to develop software that facilitates the efficient scheduling and delivery of products to customers.

## Data Analysis

We have three <mark>.csv</mark> files representing important bulding blocks for data modeling.

1. orders.csv: order details and other related information
2. vehicles.csv: information about vehicles available for transportation of goods
3. depot.csv: distribution centers information

We use cuDF, the GPU equivalent of pandas, for data analysis [RAPIDS cuDF](https://rapids.ai/). cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF also provides a pandas-like API that will be familiar to data engineers & data scientists, so you can use it to easily accelerate their workflows without going into the details of CUDA programming. cuDF accelerates workflows where the data is large by leveraging the GPU(s). To understand why cuDF performs well with GPUs, feel to refer to this [blog!](https://developer.nvidia.com/blog/pandas-dataframe-tutorial-beginners-guide-to-gpu-accelerated-dataframes-in-python/)

In [None]:
# DO NOT CHANGE THIS CELL
# import dependencies
import pandas as pd
import cudf as cudf
import time

In [None]:
data_path = "data/"

In [None]:
%%time
# DO NOT CHANGE THIS CELL
# loading a sample dataframe with pandas

cudf.read_csv(data_path+"custom_2020.csv")

In [None]:
%%time
# DO NOT CHANGE THIS CELL
# loading the same dataframe with cuDF

cudf.read_csv(data_path+"custom_2020.csv")

Data Modeling is an important step prior to utilizing the cuOpt functionalities. To revisit your data analysis abilities, kindly complete the following exercises as a warm-up.

Note: If you are familiar with pandas, please skip to the next notebook.

In [None]:
# DO NOT CHANGE THIS CELL
# loading data from .csv to dataframes

orders_df = cudf.read_csv(data_path+"orders.csv")
depot_df = cudf.read_csv(data_path+"depot.csv")
vehicles_df = cudf.read_csv(data_path+"vehicles.csv")

# To view the orders dataframe
orders_df.head()

In [None]:
# Find total number of orders
n_orders = <<<<FIXME>>>> # Add your code here
n_orders

In [None]:
# EXERCISE
# Similarly, find total number of depots and total number of vehicles
n_depot = <<<<FIXME>>>> # Add your code here
n_vehicles = <<<<FIXME>>>> # Add your code here

n_depot, n_vehicles

In [None]:
# DO NOT CHANGE THIS CELL

# To view the orders dataframe
depot_df.head()

You will see different vehicle attributes in <mark>vehicles_df</mark>.

1. <mark>vehicle_id</mark>
2. <mark>vehicles_type</mark>

In [None]:
# DO NOT CHANGE THIS CELL

# To view the vehicles dataframe
vehicles_df.head()

In [None]:
# DO NOT CHANGE THIS CELL
# Find out the number of unique vehicle ids, it should be equal to the number of vehicles since each vehicle has a unique ID

vehicle_ids = vehicles_df['vehicle_id'].unique()
len(vehicle_ids)

In [None]:
# DO NOT CHANGE THIS CELL
# Find out the carrying capacity of trucks

truck_capacity = vehicles_df[vehicles_df['vehicle_type'] == "Truck"]["vehicle_capacity"].unique()
truck_capacity

In [None]:
# EXERCISE
# How many types of vehicles?
vehicle_types = <<<<FIXME>>>> # Add your code here

# What is the maximum distance an 'EV Van' can go?
ev_van_max_distance = <<<<FIXME>>>> # Add your code here

print("Vehicle Types = ", vehicle_types)
print("Maximum Distance for an EV Van = ", ev_van_max_distance)

In [None]:
# EXERCISE
#Find out the different order types
order_types = <<<<FIXME>>>> # Add your code here

# HINT: Use the min and std functions in pandas
#Find out the minimum order weight
min_wt = <<<<FIXME>>>> # Add your code here

#Find out the standard deviation of the order weights
std_wt = <<<<FIXME>>>> # Add your code here

order_types, min_wt, std_wt

In [None]:
# DO NOT CHANGE THIS CELL
# Find out the vehicle IDs of EV vans 

EV_ids= vehicles_df.loc[vehicles_df["vehicle_type"]=="EV Van",["vehicle_id"]]
EV_ids

In [None]:
# DO NOT CHANGE THIS CELL
# Find out the order IDs, service times and order weights for orders with latitude < 32.5 and longitude < -96

orders_df.loc[(orders_df['lat'] < 32.5) & (orders_df['lng'] < -96), ["order_ID", "service_time","order_wt"]]

In [None]:
# EXERCISE
# Find out the order IDs, service times and address of all "Retailer" orders

retailer_order = <<<<FIXME>>>> # Add your code here
retailer_order

In [None]:
# EXERCISE
# Find out the vehicle IDs, vehicle start and end times of all "Trucks"

trucks = <<<<FIXME>>>> # Add your code here
trucks