# LBA: Grocery store prices
In this assignment, we model the cost of groceries in different parts of the world. To what extent do
grocery prices vary by country and store brand? Are grocery prices and the geographical
distribution of different grocery stores correlated with other cost-of-living measures- for example,
rent and real estate prices?

## Task 1: Prices
What is the basic average price for each product? You need to think carefully about how to
anchor the basic price for each product since this will depend on the currency used as well
as the distribution of prices.<br><br>

- Anchor in Euros
- Anchor at units that make sense for each category but is base 10 metric, e.g. milk in l, apples and potatoes in kg, meat and butter in 0.1kg. 

In [5]:
## test a currency conversion package
# pip install CurrencyConverter
from currency_converter import CurrencyConverter
from datetime import date
c = CurrencyConverter()
c.convert(100,'USD','EUR', date=date(2020, 9 , 24))

85.87376556462

## Task 2: Factors influencing prices
How much does each of the following factors modify the basic price of the product (up or
down)?
- The geographical location (country) of the grocery store.
- Brand of the grocery store. Since we are getting data from multiple countries, you will need to specify whether the store brand is considered budget (cheap), mid-range, or luxury (expensive). This should be based on what you think the general public perception of the store brand is.
- Does price variation by geographical location correlate with variation in rental prices, or
not?

Explain in your report how strong each of these effects is. Which has the greatest influence
on price variation between shops?

Notes:
- Two levels of categorical variables
    - country
    - grocery store rating
- one numeric variable: rent price
- Determine significance and size of the variation between categories

In [7]:
import numpy as np

In [14]:
# I downloaded the data as tsv to avoid commas in people's responses
data = np.loadtxt("LBAData.tsv", delimiter="\t",dtype=object)

In [21]:
#dimensions
np.shape(data)
#example datapoint
print(data[4])

['10/23/2020 18:40:00' 'katja.dellalibera@minerva.kgi.edu'
 'Katja Della Libera' 'Germany' 'EUR'
 'Alnatura Super Natur Markt, Münzgasse 4A, 78462 Konstanz'
 'Luxury (expensive)' '1000' '1' '3.99' '1' '3.99' '1' '3.99' '1' '2.29'
 '' '' '' '' '0.1' '0.79' '1' '5.99' '0.1' '0.99' '1' '2.49' '2' '4.99'
 '2' '5.99' '1' '1.49' '1' '0.95' '1' '1.49' '1' '3.99' '1' '5.99' '1'
 '7.69' '1' '1.49' '1' '1.29' '1' '1.15' '0.25' '2.99' '0.25' '2.49'
 '0.25' '2.59' '10' '4.29' '10' '1.99' '1' '0.45' '1' '32.9' '' '' '' '']


In [40]:
products = ['apples','bananas','tomatoes','potatoes','flour','rice','milk','butter','eggs','chicken']
indexes = {'country': 3,
          'currency': 4,
          'category': 6,
          'rent': 7}

for no, prod in enumerate(products):
    for i in range (3):
        indexes[prod + str(i)] = 8+6*no+i*2
print(indexes)

{'country': 3, 'currency': 4, 'category': 6, 'rent': 7, 'apples0': 8, 'apples1': 10, 'apples2': 12, 'bananas0': 14, 'bananas1': 16, 'bananas2': 18, 'tomatoes0': 20, 'tomatoes1': 22, 'tomatoes2': 24, 'potatoes0': 26, 'potatoes1': 28, 'potatoes2': 30, 'flour0': 32, 'flour1': 34, 'flour2': 36, 'rice0': 38, 'rice1': 40, 'rice2': 42, 'milk0': 44, 'milk1': 46, 'milk2': 48, 'butter0': 50, 'butter1': 52, 'butter2': 54, 'eggs0': 56, 'eggs1': 58, 'eggs2': 60, 'chicken0': 62, 'chicken1': 64, 'chicken2': 66}


In [43]:
data[4][indexes['bananas1']]

''

In [53]:
num(data[1:,indexes['bananas0']])

NameError: name 'num' is not defined

In [65]:
def get_normalized_price(i, target_unit):
    c = CurrencyConverter()
    units = np.array(data[1:,i], dtype=np.float)
    price = np.array(data[1:,i+1], dtype=np.float)
    normalized_price = price/units*target_unit
    normalized_price_EUR = []
    for no, x in enumerate(normalized_price):
        original_currency = str(data[no+1,indexes['currency']])
        normalized_price_EUR.append(c.convert(100,original_currency,'EUR', date=date(2020, 9 , 24)))
    return normalized_price_EUR

In [66]:
get_normalized_price(indexes['bananas0'], 1)

ValueError: MAD is not a supported currency

In [56]:
data[:,indexes['bananas0']]

array(['Product 1 quantity (kg)', '1', '1', '1', '1', '1', '1', '1', '1',
       '0.9', '0.118', '0.118', '0.45', '1', '1', '0.118', '1', '1',
       '0.4535924', '0.453592', '0.54', '1', '1', '1', '1', '1', '1', '1',
       '1', '1', '1', '1', '1', '0.6', '0.6', '0.78', '1', '1', '0.91',
       '1', '0.118', '1', '1', '1', '1', '0.12', '0.45', '0.45', '0.2',
       '0.5', '1', '3.5', '1', '0.4536', '1.36078', '1.3608', '0.453592',
       '1.36078', '0.453592', '1', '1', '1', '1', '1', '1'], dtype=object)

In [58]:
data[:,indexes['bananas0']+1]

array(['Product 1 price', '1.73', '1.73', '2.35', '2.29', '1.73', '73',
       '0.95', '1.95', '0.64', '0.25', '0.25', '0.59', '1.4', '2.2',
       '0.19', '1.05', '0.86', '0.79', '0.79', '0.39', '0.83', '1.95',
       '0.86', '1.64', '1.95', '1.56', '0.73', '0.73', '1.05', '1.95',
       '1.05', '1.64', '1.4', '1.4', '1.25', '1.76', '1.63', '1.79', '14',
       '0.19', '19', '1.56', '14.95', '10', '0.25', '0.69', '0.49',
       '0.29', '25750', '33800', '14900', '1.73', '1.99', '10', '1.99',
       '0.49', '14.7', '0.8', '1.15', '1.64', '0.83', '0.83', '1.96',
       '1.75'], dtype=object)