# Homework 8: I/O and data structures practice

## Goal

In this assignment we will practice working with files, more practice with dictionaries, and performing some basic geometric computations. Recall earlier in the course I walked through an example of processing retail data where we
had data about a set of products in a store, and the set of baskets that people bought.  We will work
with that and extend it a little for this assignment.  Our working data will be the following:

- _Product inventory_: This will be a table of products where each row corresponds to one product, and each column represents a property of the objects.  Specifically, our product inventory table will have five columns: a unique product ID, a text description of the product, a unit price in dollars, an X coordinate, and a Y coordinate.

- _Sales data_: This will be a table of records from the point of sale system.  Each row will correspond to an item in a basket.  The columns will be the basket ID, the product ID, product quantity, and the product pick up order.  The product pick up order for a basket containing $n$ items will range from 1 to $n$ and corresponds to the order that the customer picked up the products as they went through the store.

We will also be provided additional parameters about the store.  Specifically, we will be given the X,Y
coordinates of the entry door, the checkout stand, and the exit door.  We are going to assume that this
store has only one checkout line and all customers are good at following signs and always use the appropriate door for entry and exit.  We are also going to assume that customers have the magical ability to teleport through shelves so that they can take the shortest path from one product to the next, allowing us to avoid worrying about complex calculations of the distance between any two products.

The coordinates of the entrance, exit, and checkout are:

In [22]:
import math

In [23]:
store_entrance = (10,0)
store_exit = (90,0)
store_checkout = (50,10)

## Sample I/O code

To get you started, here is code that reads in the inventory data from the CSV file and produces a dictionary of dictionaries.  The outer dictionary maps product IDs as strings to the inner dictionary, and the inner dictionary for each product maps an attribute (e.g., 'unit_price') to its value.  Numerical values are stored as floating point numbers, _not strings_.

In [24]:
import csv

In [25]:
def read_inventory(filename):
    # initialize empty dict
    inventory={}
    
    # open the given file and name the file handle f
    with open(filename) as f:
        # create a CSV reader object from the file
        reader = csv.reader(f)
        
        # advance the reader to skip the first header line
        next(reader)
        
        # for each row in the CSV file, create the appropriate
        # entry in the inventory.  this includes converting the
        # strings for price, x, and y into floats so we can do
        # arithmetic with them later.
        for row in reader:
            inventory[row[0]] = { 'desc':row[1],
                                  'unit_price':float(row[2]),
                                  'x':float(row[3]),
                                  'y':float(row[4]) }
            
        return inventory

In [26]:
inventory = read_inventory('inventory.csv')

Test it: what is the price of product ID 4?

In [27]:
inventory['4']['unit_price']

0.39

In [28]:
inventory['2']

{'desc': 'apple', 'unit_price': 0.49, 'x': 4.0, 'y': 70.0}

## Part 1: read the basket data

Complete the following function to return a dictionary mapping basket ID to some data structure of your choice to represent the basket contents.

In [29]:
def read_baskets(filename):
    # initialize the empty dict
    baskets = {}
    
    # open the given file and name the file handle f
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        next(reader)
        
        items = []
        basket_ids = []
        
        for row in reader:
            items.append(row)
            if row[0] not in basket_ids:
                basket_ids.append(row[0])
                
        for basket_id in basket_ids:
            baskets[basket_id] = []
        
        for item in items: 
            baskets[item[0]].append({
                'product_id': item[1],
                'quantity': float(item[2]),
                'pickup_order': float(item[3])
            })

        return baskets

In [30]:
baskets = read_baskets('baskets.csv')

In [44]:
baskets['1']

[{'pickup_order': 1.0, 'product_id': '1', 'quantity': 2.0},
 {'pickup_order': 2.0, 'product_id': '3', 'quantity': 1.0}]

In [32]:
baskets['1'][0]['product_id']

'1'

In [33]:
baskets['1'][1]['product_id']

'3'

In [34]:
# need to print every item in the 'baskets' dictionary
for i in baskets:
    total_price = 0
    print(" ")
    print("Basket:", i)
    basket_number = str(i)
    for n in baskets[basket_number]:
        # print(n)
        product = str(n['product_id'])
        print( inventory[product]['desc'])
        total_price += inventory[product]['unit_price'] * n['quantity']
    print("Total basket cost:", total_price)

 
Basket: 1
banana
cucumber
Total basket cost: 1.97
 
Basket: 2
banana
onion
cucumber
cheddar cheese
Total basket cost: 17.85
 
Basket: 3
coffee
potato chips
coca cola classic 6-pack
Total basket cost: 21.950000000000003
 
Basket: 4
whole milk
cucumber
salsa
ground beef 1lb
whole wheat fig bars
taco shells
Total basket cost: 29.43
 
Basket: 5
banana
apple
cucumber
onion
Total basket cost: 18.88
 
Basket: 6
onion
cucumber
apple
banana
Total basket cost: 4.23
 
Basket: 7
coffee
taco shells
Total basket cost: 13.49
 
Basket: 8
whole milk
Total basket cost: 1.99
 
Basket: 9
whole milk
whole wheat fig bars
coca cola classic 6-pack
onion
salsa
Total basket cost: 27.5
 
Basket: 10
banana
cucumber
potato chips
whole milk
Total basket cost: 8.44


## Part 2: Calculate the path length of each customer

Given the basket data and inventory data, write a function that calculates the distance traveled by a customer through the store.  Their trip should go entry -> first product -> second product -> ... -> checkout -> exit.  You should assume that the customer takes a straight path from each point to the next.

In [35]:
def dist(a, b):                                 # Distance to points passed to function
    return math.hypot(b[0] - a[0], b[1] - a[1])

def customer_trip(inventory, baskets, basket_id):
                                                # pre-defined: store_entrance, store_exit, store_checkout
    item_locations = []                         # init empty list of item locations
    
    total_distance = 0                          # init dist length for basket
    basket_number = str(basket_id)              # convert basket number is a string
    
    for n in baskets[basket_number]:            # gather products listing and add item location to list
        item = str( inventory[ n['product_id'] ]['desc'] )
        item_location_x = inventory[ n['product_id'] ]['x']
        item_location_y = inventory[ n['product_id'] ]['y']
        item_locations.append( (item_location_x, item_location_y))
 
    # print('List of item locations:', item_locations)                   # print the list of all item locations
    
    total_distance += dist(store_entrance, item_locations[0])          # store_entrance --> first item 
    
    for i in range(1, len(item_locations)):
        total_distance += dist(item_locations[1], item_locations[i-1]) # first item --> second item --> n items
        
    total_distance += dist(item_locations[-1], store_checkout)         # last item --> store_checkout
    
    total_distance += dist(store_checkout, store_exit)                 # store_checkout --> store_exit
    
    return total_distance

In [36]:
for i in baskets:
    distance = customer_trip(inventory, baskets, i)
    print('Total distance for basket is',distance)

Total distance for basket is 215.93949787012758
Total distance for basket is 193.48802396347443
Total distance for basket is 251.81709268444462
Total distance for basket is 352.86596943976394
Total distance for basket is 216.4556448800485
Total distance for basket is 229.47637975508894
Total distance for basket is 244.36564958947673
Total distance for basket is 219.71963427413766
Total distance for basket is 330.140504873586
Total distance for basket is 247.93214532645038


## Part 3: Calculate the total price for each basket

Given the basket and inventory data, write a function that calculates the total cost of a basket.

In [37]:
def basket_total(inventory, baskets, basket_id):
    
    total_price = 0
    basket_number = str(basket_id)
    
    for n in baskets[basket_number]:
        # print(n)
        product = str(n['product_id'])
        total_price += inventory[product]['unit_price'] * n['quantity']

    # print("Total basket cost:", total_price)
    return total_price

In [38]:
for i in baskets:
    price = basket_total(inventory, baskets, i)
    print('Total basket cost is', price)

Total basket cost is 1.97
Total basket cost is 17.85
Total basket cost is 21.950000000000003
Total basket cost is 29.43
Total basket cost is 18.88
Total basket cost is 4.23
Total basket cost is 13.49
Total basket cost is 1.99
Total basket cost is 27.5
Total basket cost is 8.44


## Part 4: Calculate the price per unit of distance traveled for all baskets

For each basket we have a distance traveled and a total price.  Write a function that returns a dictionary mapping the basket ID to the price per unit distance travelled.

In [43]:
price_per_unit_trav = {}
for i in baskets:
    distance = customer_trip(inventory, baskets, i)
    print(distance)
    price = basket_total(inventory, baskets, i)
    print(price)
    div = price / distance
    print(div)
    
    price_per_unit_trav[ i[0] ] = { 'ratio':div }

print(price_per_unit_trav)

    


215.93949787012758
1.97
0.0091229257242453
193.48802396347443
17.85
0.0922537717547295
251.81709268444462
21.950000000000003
0.08716644198376892
352.86596943976394
29.43
0.08340277201206237
216.4556448800485
18.88
0.08722341249387411
229.47637975508894
4.23
0.018433269709564496
244.36564958947673
13.49
0.055204158287642274
219.71963427413766
1.99
0.00905699668841218
330.140504873586
27.5
0.08329786740506143
247.93214532645038
8.44
0.03404157209581321
{'1': {'ratio': 0.03404157209581321}, '2': {'ratio': 0.0922537717547295}, '3': {'ratio': 0.08716644198376892}, '4': {'ratio': 0.08340277201206237}, '5': {'ratio': 0.08722341249387411}, '6': {'ratio': 0.018433269709564496}, '7': {'ratio': 0.055204158287642274}, '8': {'ratio': 0.00905699668841218}, '9': {'ratio': 0.08329786740506143}}


## Part 5: EXTRA CREDIT.  

### Calculate the difference between the length of the path each customer took versus the shortest path they could have taken.

Each customer may have travelled the store inefficiently.  We would like to know the excess distance each customer covered versus what they could have done had they planned their trip more carefully.  Write a function that calculates the shortest path that a customer could have taken.

In [40]:
def customer_shortest_trip(inventory, baskets, basket_id):
    # fill me in
    pass