# Tesco Creative Extension Project

In [433]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [434]:
#importing the correct data
ward_tesco = pd.read_csv('data/tesco/year_osward_grocery.csv')
ward_crime = pd.read_csv('data/crime_wards.csv', header=2)
ward_demographics = pd.read_csv('data/demographics_ward.csv', header=2)
ward_education = pd.read_csv('data/education_ward.csv', header=2)
ward_environment = pd.read_csv('data/environment_ward.csv', header=2)
ward_property = pd.read_csv('data/property_wards.csv', header=2)
ward_total_wellbeing = pd.read_csv('data/total_stats_ward.csv') #serves as validation set (Sort of)

In [435]:
#getting rid of all the badly formated data
ward_crime = ward_crime.dropna()
ward_demographics = ward_demographics.dropna(axis=1)
ward_education = ward_education.dropna()
ward_property= ward_property.dropna()

In [436]:
#rename all the "New Code " to "area id" for merging purposes
master_data = {'crime': ward_crime, 'demographics': ward_demographics, 'education': ward_education,
              'environment': ward_environment, 'property': ward_property}

for key in master_data:
    master_data[key].rename(columns= {"New Code":"area_id"}, inplace=True)

We need to create 11 different indicators + 1 with the food. here is the list :
* Housing
* Income
* Jobs
* Community
* Education
* Environment
* Civic Engagement
* Health
* Life Satisfaction
* Safety
* Work-Life Balance

To create our indicators we need our different categories. Let's go step by step:
**Housing** : 
- need average # of rooms shared per person --> gives an idea of how densly packed living conditions -- here called `household_density`
- need an access to an indoor private flushing toilet 
- need way of measuring housing expenditure: ratio to housing costs on household gross adjusted disposable income

In [453]:
#we want to create a data set with only the latest data possible (since tesco is 2015) -- we use census data from 2011
column_names = ["area_id","Names","All Household spaces - 2011 Census",
                               "Household composition - 2011 Census All Households",
                               "Household composition - 2011 Census Couple household with dependent children",
                               "Household composition - 2011 Census Couple household without dependent children",
                               "Household composition - 2011 Census Lone parent household",
                               "Household composition - 2011 Census One person household",
                               "Household composition - 2011 Census Other multi person household",
                               "Accomodation Type - 2011 Census Whole house or bungalow: Detached",
                               "Accomodation Type - 2011 Census Whole house or bungalow: Semi-detached",
                               "Accomodation Type - 2011 Census Whole house or bungalow: Terraced",
                               "Accomodation Type - 2011 Census Flat, maisonette or apartment"]
housing = pd.DataFrame(data=ward_demographics, 
                       columns=column_names)

#get data from tesco for density and area sq km:
column_names.append("population")
column_names.append("area_sq_km")
column_names.append("people_per_sq_km")

In [454]:
housing = housing.merge(ward_tesco, on='area_id', how='inner')

calculating some important statistics like the average number of people in a household:

In [455]:
household_type = ["Household composition - 2011 Census All Households",
                  "Household composition - 2011 Census Couple household with dependent children",
                  "Household composition - 2011 Census Couple household without dependent children",
                  "Household composition - 2011 Census Lone parent household",
                  "Household composition - 2011 Census One person household",
                  "Household composition - 2011 Census Other multi person household"]
#assuming 2 dependent child , we are calculating the average number of people found in a household
housing = housing.assign(avg_people_per_household= ((housing[household_type[1]]*4 + housing[household_type[2]]*2 
                        + housing[household_type[3]]+housing[household_type[4]]+3*housing[household_type[5]])
                        / housing[household_type[0]]))

... and the average amount of bedrooms found in a dwelling

In [456]:
#calculating the amount of bedrooms found on average in a dwelling
rooms = ["1 bedroom","2 bedrooms", "3 bedrooms", "4+ bedrooms", "All properties (2015)", "Annex/Other/Unknown"]
rooms_temp = pd.DataFrame(data=ward_property, columns=rooms)
rooms_temp = rooms_temp.assign(avg_rooms= round((1*rooms_temp["1 bedroom"]+2*rooms_temp["2 bedrooms"]+
                               3*rooms_temp["3 bedrooms"]+4*rooms_temp["4+ bedrooms"])
                                                /(rooms_temp[rooms[4]]-rooms_temp[rooms[5]])))
housing = housing.assign(avg_rooms_per_household = rooms_temp["avg_rooms"])

... which allows us to calculate the average number of rooms shared in a household. we also calculate household density

In [457]:
housing = housing.assign(rooms_shared_per_household = housing["avg_rooms_per_household"]/housing["avg_people_per_household"])

#this sequence of codes gives us a rough estimate of the household density per area. we can improve by 
#giving weights to the different types of dwelling -- i.e apartments are more densly packed
housing = housing.assign(household_per_sq_km = housing["All Household spaces - 2011 Census"]/housing["area_sq_km"])
housing = housing.assign(household_density = housing["household_per_sq_km"]/housing["people_per_sq_km"])

we want to get a sense of how much does it cost to keep a roof over their heads --

In [458]:
column_names = ["area_id","Median House Price 2014"]
house_cost = pd.DataFrame(data=master_data["demographics"], columns= column_names)
house_cost.rename(columns={"Median House Price 2014":"median_house_price"}, inplace=True)

In [459]:
housing = housing.merge(house_cost, on='area_id', how='inner')

and we keep only the important columns for our frame 

In [460]:
column_names = ["area_id","Names","median_house_price", "population","area_sq_km", "people_per_sq_km","avg_people_per_household",
               "avg_rooms_per_household","household_per_sq_km","household_density"]
housing = pd.DataFrame(data=housing, columns=column_names)

In [461]:
housing

Unnamed: 0,area_id,Names,median_house_price,population,area_sq_km,people_per_sq_km,avg_people_per_household,avg_rooms_per_household,household_per_sq_km,household_density
0,E05000026,Abbey,173000,14370.0,1.26,11404.761905,2.260061,2.0,3772.222222,0.330759
1,E05000027,Alibon,215000,10845.0,1.36,7974.264706,2.096701,2.0,2974.264706,0.372983
2,E05000028,Becontree,210000,13856.0,1.29,10741.085271,2.104980,2.0,3393.798450,0.315964
3,E05000029,Chadwell Heath,240500,10850.0,3.38,3210.059172,2.055048,2.0,1201.775148,0.374378
4,E05000030,Eastbrook,240000,11348.0,3.45,3289.275362,2.194237,2.0,1152.753623,0.350458
...,...,...,...,...,...,...,...,...,...,...
478,E05000645,Tachbrook,715650,8996.0,0.36,24988.888889,1.709201,3.0,14444.444444,0.578035
479,E05000646,Vincent Square,840000,11276.0,0.60,18793.333333,1.791183,3.0,9523.333333,0.506740
480,E05000647,Warwick,857250,10086.0,0.58,17389.655172,1.772537,3.0,9434.482759,0.542534
481,E05000648,Westbourne,499975,13668.0,0.67,20400.000000,1.904780,2.0,8165.671642,0.400278


**Income**
link here : http://www.oecdbetterlifeindex.org/topics/income/ 
* income 
* household net wealth : average total wealth of household assets (savings, stocks) minus liabilites (loans)
* household net adjusted disposable income 

In [173]:
#code here

**Jobs** 
link here : http://www.oecdbetterlifeindex.org/topics/jobs/
* job security -- expected loss of earnings when someone becomes unemployed
* personal earnings 
* long-term unemployment rate (have been actively searching for a job in past 12 months)
* employment rate 

In [174]:
#code here

**Community**
link here : http://www.oecdbetterlifeindex.org/topics/community/
* community
* quality of support network -- how much can you rely on friends --> we should change to indicator of community diversity perhaps with ethnic group diversity and religious diversity

In [175]:
#code here

**Education** link here : http://www.oecdbetterlifeindex.org/topics/education/
* years in education
* student skills -- average performance of student here GSED or whatever 
* education attainment -- percent of people 24- 64 years old having at least an upper-secondary education 

In [177]:
#code here

**Environment**
link here: http://www.oecdbetterlifeindex.org/topics/environment/
* water quality 
* air pollution -- measured in PM 2.5
* in addition : access to parks and greens 

In [178]:
#code here

**Civic Engagement** 
link here : http://www.oecdbetterlifeindex.org/topics/civic-engagement/
* voter turnout in latest elections
* stakeholder engagement for developing regulations -- might be hard to do

In [179]:
#code here

**Health**
link here: http://www.oecdbetterlifeindex.org/topics/health/
* life expectancy
* self-reported health (kind of hard)
* we can include like ambulances or whatever

In [180]:
#code here

**Life Satisfaction**
link here: http://www.oecdbetterlifeindex.org/topics/life-satisfaction/
* life satisfaction -- how satisfied are you with your life ? we have data for that

In [181]:
#code here

**Safety**
link here : http://www.oecdbetterlifeindex.org/topics/safety/
* homicide rate
* feeling safe walking alone at night (self reported) --> can use burgularies or something

In [182]:
# code here

**Work-Life Balance** 
link here : http://www.oecdbetterlifeindex.org/topics/work-life-balance/ 
* Time devoted to leisure and personal care
* employees working very long hours 

In [183]:
#code here