# Lecture notes: 10/17/2017

Want to read in comma separated data from files called "retail/products.csv" and "retail/baskets.csv".  Before we are able to write a function to do this, do some exploratory coding to work through the details.  First, we'll create an empty dictionary for the inventory, open the products file, and read the contents into a list of lines.

In [1]:
inventory = {}

f = open('retail/products.csv')

lines = f.readlines()

We can look at what was read in and see a few features we want to get rid of.  Specifically, the header line, and for each line, the single character that corresponds to the UNIX line terminator character.  Look at the lines list:

In [2]:
lines

['\ufeffproduct ID,description,unit price\n',
 '1234,bananas,0.33\n',
 '33,apples,0.49\n',
 '39,chicken,7.72\n',
 '452,soup,1.5\n',
 '888,potato chips,2.99\n',
 '111,beer,6.99\n',
 '8,newspaper,1\n',
 '6,soap,2.99\n',
 '12,toothpaste,4.99\n',
 '999,coffee,13.99\n']

We can use slice notation to get the list of lines without the first line:

In [3]:
lines[1:]

['1234,bananas,0.33\n',
 '33,apples,0.49\n',
 '39,chicken,7.72\n',
 '452,soup,1.5\n',
 '888,potato chips,2.99\n',
 '111,beer,6.99\n',
 '8,newspaper,1\n',
 '6,soap,2.99\n',
 '12,toothpaste,4.99\n',
 '999,coffee,13.99\n']

Looking at the first of these lines, we can reduce the string to everything up to the last character using a slice :-1.  Once we have this, we can use the split() function on strings to take the comma separated values in the line string, and get the list of entries for each column.

In [6]:
lines[1:][0][:-1].split(',')

['1234', 'bananas', '0.33']

We can iterate over the set of lines and do this line by line - parsing each line into a list called "parts", which we interpret as parts[0] being the product ID (the key for our dictionary), parts[1] as the description of the product, and parts[2] as the unit price in dollars.

In [14]:
for line in lines[1:]:
    parts = line[:-1].split(',')
    inventory[parts[0]] = (parts[1], float(parts[2]))
    print(parts[0]+' maps to '+str(inventory[parts[0]]))

1234 maps to ('bananas', 0.33)
33 maps to ('apples', 0.49)
39 maps to ('chicken', 7.72)
452 maps to ('soup', 1.5)
888 maps to ('potato chips', 2.99)
111 maps to ('beer', 6.99)
8 maps to ('newspaper', 1.0)
6 maps to ('soap', 2.99)
12 maps to ('toothpaste', 4.99)
999 maps to ('coffee', 13.99)


Look at the dictionary that was created:

In [15]:
inventory

{'111': ('beer', 6.99),
 '12': ('toothpaste', 4.99),
 '1234': ('bananas', 0.33),
 '33': ('apples', 0.49),
 '39': ('chicken', 7.72),
 '452': ('soup', 1.5),
 '6': ('soap', 2.99),
 '8': ('newspaper', 1.0),
 '888': ('potato chips', 2.99),
 '999': ('coffee', 13.99)}

To clarify the split operation, we can split on other characters - such as taking a sentence and reducing it to the list of words.

In [16]:
s = 'i am a sentence'

In [17]:
s.split(' ')

['i', 'am', 'a', 'sentence']

We can wrap everything above up into a function that takes a filename, and returns the inventory dictionary fully populated.

In [18]:
def read_inventory(fname):
    inventory = {}
    f = open(fname)
    lines = f.readlines()
    for line in lines[1:]:
        parts = line[:-1].split(',')
        inventory[parts[0]] = (parts[1], float(parts[2]))    
    f.close()
    return inventory

In [23]:
inventory = read_inventory('retail/products.csv')

We can do the same thing for baskets.  The basic structure of the function is the same (open file, read lines, iterate over each line parsing them into parts).  The difference is that each basket ID can correspond to multiple product ID / quantity pairs.  So updating the dictionary of sales is a bit more complex.  We need to check if we already have an entry in the sales dictionary for the given basket ID.  If we do, we append the product ID / quantity pair to the existing list of items.  If not, we set the value for the basket ID to a list containing the pair for the first product/quantity pair encountered as we work through the lines in file.

In [20]:
def read_baskets(fname):
    sales = {}
    f = open(fname)
    lines = f.readlines()
    for line in lines[1:]:
        parts = line[:-1].split(',')
        
        if parts[0] in sales:
            sales[parts[0]].append( (parts[1], int(parts[2]) ) )
        else:
            sales[parts[0]] = [(parts[1], int(parts[2]))]

        
    f.close()
    return sales

In [22]:
sales = read_baskets('retail/baskets.csv')

Print the dictionaries to make sure everything worked as expected:

In [24]:
sales

{'1': [('1234', 2), ('33', 1)],
 '2': [('1234', 1), ('39', 2), ('452', 1)],
 '3': [('888', 2), ('111', 1)],
 '4': [('33', 1)],
 '5': [('8', 2), ('1234', 1)],
 '6': [('6', 1), ('33', 2), ('12', 2), ('999', 1)]}

In [25]:
inventory

{'111': ('beer', 6.99),
 '12': ('toothpaste', 4.99),
 '1234': ('bananas', 0.33),
 '33': ('apples', 0.49),
 '39': ('chicken', 7.72),
 '452': ('soup', 1.5),
 '6': ('soap', 2.99),
 '8': ('newspaper', 1.0),
 '888': ('potato chips', 2.99),
 '999': ('coffee', 13.99)}

Now that we've organized our data as dictionaries, we can write functions that work with the dictionaries to compute quantities from the data.  Here is an example: given a basket ID, compute the total cost of the basket - the sum of each product price times its quantity in the basket.

In [26]:
def total_basket(b_id, inv, sales):
    b = sales[b_id]
    
    total = 0.0
    for product_id, quant in b:
        total += inv[product_id][1] * quant
        
    return total
    

Try it out:

In [28]:
total_basket('3', inventory, sales)

12.97

The way we work with the files above is not the only way to do it.  We can use "with" to open the file and associate it with a variable within a block of code.  The nice part about this is that we don't have to manually clean up after ourselves and close the file - when the block closes and the file variable f goes out of scope, it gets cleaned up for us.

In [None]:
def read_inventory(fname):
    inventory = {}
    
    with open(fname) as f:
        lines = f.readlines()
        for line in lines[1:]:
            parts = line[:-1].split(',')
            inventory[parts[0]] = (parts[1], float(parts[2]))    
    return inventory