<img src='https://cnet1.cbsistatic.com/img/AvDDpG5BMtTqvvLNZWg3Y54Saq0=/2020/01/08/5e97dfbc-c4ab-456c-a80d-88c86276d945/flipdish-manna-drone.jpg' width='500'>

This notebook takes the input text file and transforms it into three dataframes, one for products, one for warehouses, and one for orders. Then it combines two of the dataframes to make a 3-d array suitable for optimization work.

In [None]:
import numpy as np
import pandas as pd

with open('../input/hashcode-drone-delivery/busy_day.in') as file:
    line_list = file.read().splitlines()

# Data extraction

## Products

The first data frame describes the 400 products. It contains product weights and the inventories at each of the 10 warehouses.

In [None]:
weights = line_list[2].split()
products_df = pd.DataFrame({'weight': weights})

wh_count = int(line_list[3])
wh_endline = (wh_count*2)+4

wh_invs = line_list[5:wh_endline+1:2]
for i, wh_inv in enumerate(wh_invs):
    products_df[f'wh{i}_inv'] = wh_inv.split()

products_df = products_df.astype(int)
products_df


## Warehouses

This is a simple dataframe with warehouse locations. You could change this to also include products so it matches the Orders dataframe below.

In [None]:
wh_locs = line_list[4:wh_endline:2]
wh_rows = [wl.split()[0] for wl in wh_locs]
wh_cols = [wl.split()[1] for wl in wh_locs]

warehouse_df = pd.DataFrame({'wh_row': wh_rows, 'wh_col': wh_cols}).astype(np.uint16)
warehouse_df

## Orders

This dataframe contains information on orders to include the location for delivery and the specific items in the order.

In [None]:
order_locs = line_list[wh_endline+1::3]
o_rows = [ol.split()[0] for ol in order_locs]
o_cols = [ol.split()[1] for ol in order_locs]

orders_df = pd.DataFrame({'row': o_rows, 'col': o_cols})

orders_df[orders_df.duplicated(keep=False)].sort_values('row')

orders_df['product_count'] = line_list[wh_endline+2::3]

order_array = np.zeros((len(orders_df), len(products_df)), dtype=np.uint16)
orders = line_list[wh_endline+3::3]
for i,ord in enumerate(orders):
    products = [int(prod) for prod in ord.split()]
    order_array[i, products] = 1

df = pd.DataFrame(data=order_array, columns=['p_'+ str(i) for i in range(400)], 
                    index=orders_df.index)

orders_df = orders_df.astype(np.uint16).join(df)
orders_df


# EDA

We might as well do basic data exploration with these DFs.

### What is the demand for products? How does it compare to the inventory?

It looks like the demand per product is pretty well distributed with most products falling in sort of a 2 to 10 range.


In [None]:
chart_opts = {'width': 500,
              'xlabel': "Total Demand",
              'ylabel': "Count of Products"}

import holoviews as hv
from holoviews import dim, opts
hv.extension('bokeh')

counts = orders_df.product_count \
                  .value_counts() \
                  .sort_index() \
                  .reset_index()
hv.Bars(counts).opts(**chart_opts)

Here is a comparison of supply and demand by product. It's hard to see 400 products on one chart so I'll see if there are any cases where demand exceeds supply. It looks like we're good as the surplus values are all positive.

In [None]:
supply = products_df.drop(columns='weight').sum(axis=1)
supply

demand = orders_df.loc[:, orders_df.columns.str.contains("p_")].sum()
demand

surplus = supply.to_numpy() - demand.to_numpy()
print(np.amin(surplus))


freqs, edges = np.histogram(surplus, 20)
hv.Histogram((edges, freqs)).opts(width=600, xlabel="surplus")


Next let's see what the warehouses have in stock. Really this could be done a better way in more detail but here it is at a high level. The range doesn't seem extremely diverse, but there is one warehouse with a bit fewer than the rest.


In [None]:
chart_opts = {'width': 500,
              'xlabel': "Warehouse",
              'ylabel': "Total Inventory",
              'yticks': list(range(0,1801,200))}


total_prods = products_df.loc[:, products_df.columns.str.contains("wh")].sum()
hv.Bars(total_prods.value_counts().index).opts(**chart_opts)

OK, just a couple more things to look at. Here is the distribution of weights for products.

In [None]:
hv.Distribution(products_df.weight).opts(width=500)

And finally we have locations. Here is the grid of warehouses and orders. The desity varies quite a bit throughout the grid,

In [None]:
chart_opts = dict(width=600, height=400, alpha=0.7)

customers = hv.Points(orders_df, kdims = ['col', 'row']).opts(**chart_opts)
warehouses = hv.Points(warehouse_df, kdims = ['wh_col', 'wh_row']).opts(size=8, **chart_opts)
customers * warehouses

# N-dimensional arrays

Dataframes work better for EDA than they do for optimization problems like this one. That's where multi-dimensional arrays can be really useful. Here's a 3-d array that maps product quantities to where they are located on the grid at the start (warehouse locations). 

In [None]:
inventory_array = np.zeros((400, 600, 400), dtype=np.uint16)

wh = warehouse_df.to_numpy()
inv = products_df.drop(columns='weight').T.to_numpy()
inventory_array[wh[:, 0], wh[:, 1]] = inv

inventory_array.sum()

It's pretty easy to go back and forth from the dataframes. As a check, let's see how many items of product 1 are at warehouse 5. It should be 16. We can also make sure the total number of each product from products_df is the same as what's in the 3-d array.

In [None]:
print(inventory_array[182,193,1], 
    np.array_equal(inventory_array.sum(axis=(0, 1)), inv.sum(axis=0)))

It checks out. Good luck with the challenge!