## Living Cost and Food Survey: Geospatial Analysis

This script contains the following:
#### 01. Importing Libraries and LCF Data
#### 02. Import and examine the UK regions GeoJSON file
#### 03. Create a Subset of the Data to Explore
* Match the regions in the BMI dataframe to the geoJSON file
* Create subsets of the BMI data for adults only (16+) and the years 2015 and 2019

#### 04. Map Percentage Classified as Obese for the Years 2015 and 2019
* Mapping Obesity in the Year 2015
* Mapping Obesity in the Year 2019

#### 05. Map Percentage Classified as Overweight for the Years 2015 and 2019
* Mapping Overweight in the Year 2015
* Mapping Overweight in the Year 2019

#### 06. Create a Subset of the LCF Analysis Data to Explore
* Match the lcf_analysis regions to the geoJSON file

#### 07. Map the Percentage spent on Ultra-Processed Food for the Years 2015 to 2020, 2015 and 2020
* Mapping the Percentage Spent on Ultra-Processed Food for the Period 2015 - 2020
* Mapping % Ultra-Processed Food for the Year 2015
* Mapping % Ultra-Processed Food for the Year 2020

#### 08. Map the Percentage spent on Unprocessed Food for the Years 2015 to 2020, 2015 and 2020
* Mapping the Percentage Spent on Unprocessed Food for the Period 2015 - 2020
* Mapping the Percentage Spent on Unprocessed Food for the Year 2015
* Mapping the Percentage Spent on Unprocessed Food for the Year 2020

#### 09. Export the Dataframe as lcf_analysis_regions.csv
#### 10. Discussion of the results and what they mean
--- 

## 01. Importing Libraries and LCF Data

In [1]:
# import the required libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import os
import folium
import json
import plotly.express as px

# automatically display the charts in the notebook 
%matplotlib inline  

In [2]:
# Assign the main project folder path to the variable path
path = r'/Users/elsaekevall/Jupyter_Notebook/Career_Foundry/09_2022_LCF_Analysis/'
path

'/Users/elsaekevall/Jupyter_Notebook/Career_Foundry/09_2022_LCF_Analysis/'

In [3]:
# Use the os.path.join() function to import the lcf_analysis.pkl file as pandas a dataframe and view first fifteen rows
df_lcf_analysis = pd.read_pickle(os.path.join(path, '02_Data', '02_2_Prepared_Data', 'lcf_analysis_eda.pkl'))
df_lcf_analysis.head(15)

Unnamed: 0,unique_id,no_people,household_type,quarter,OECD_disposable_income,region,total_income,OECD_scale,gross_income,weekly_disposable_income,...,W_total_food_cost,W_adult_total_expenditure,W_child_total_expenditure,W_total_expenditure,W_unprocessed_food,W_processed_food,W_ultra_processed_food,%unprocessed_food,%processed_food,%ultra_processed_food
0,1,1,Index,January to March,173.27,Eastern,179.684359,1.0,173.27,173.27,...,988.172709,2511.076176,0.0,2511.076176,373.039988,7.026438,554.833106,37.750485,0.711054,56.147382
1,2,5,Index,July to September,333.667857,North West and Merseyside,1117.34,2.8,1117.34,934.27,...,1851.163542,13186.349938,114.43666,13300.786598,323.283563,39.149384,789.311802,17.463803,2.114853,42.638686
2,3,2,Index,January to March,678.86,North West and Merseyside,1269.12,1.5,732.74,1018.29,...,2697.053132,16984.800931,0.0,16984.800931,748.575896,133.122488,750.798131,27.755326,4.93585,27.837721
3,4,3,Index,July to September,180.0,South East,391.475289,1.6,288.0,288.0,...,784.453365,9200.604165,15.847543,9216.451707,224.321967,61.488466,189.774324,28.59596,7.838384,24.191919
4,5,3,Index,April to June,375.455,North West and Merseyside,922.302564,2.0,911.7,750.91,...,2608.318219,9434.408332,610.013743,10127.284984,196.333554,65.180202,863.439444,7.527209,2.498936,33.103302
5,6,4,Index,July to September,254.204,South East,719.12,2.5,719.12,635.51,...,8339.835694,44028.146914,0.0,44028.146914,786.446068,92.277452,1133.217217,9.429995,1.106466,13.588004
6,7,1,Index,July to September,430.7164,Eastern,545.270577,1.0,499.77,430.7164,...,316.637369,4992.797215,0.0,4992.797215,96.29733,13.911332,35.473896,30.412497,4.393459,11.20332
7,8,1,Index,October to December,434.1212,South West,544.506146,1.0,523.95,434.1212,...,800.843774,5060.872092,0.0,5060.872092,85.762104,44.027604,389.002227,10.708968,5.497652,48.574047
8,9,2,Index,April to June,1004.068,Wales,1863.292,1.5,1863.292,1506.102,...,1169.910709,3590.546326,0.0,3590.546326,313.551775,211.472312,313.551775,26.801342,18.075936,26.801342
9,10,2,Index,April to June,858.526667,London,1726.29,1.5,0.0,1287.79,...,7082.301474,25763.427321,0.0,25763.427321,774.771086,66.371239,661.261182,10.939538,0.937142,9.336812


In [4]:
# Checking the shape of the dataframe
df_lcf_analysis.shape

(26146, 43)

In [5]:
# Use the os.path.join() function to import the lcf_analysis.pkl file as pandas a dataframe and view first fifthteen rows
df_BMI = pd.read_csv(os.path.join(path, '02_Data', '02_2_Prepared_Data', 'BMI.csv'), index_col = False)
df_BMI.head(15)

Unnamed: 0.1,Unnamed: 0,region,% underweight,% normal,% overweight,% obese,% morbidly obese,% overweight (25+),% obese (30+),year,age_group
0,0,North East,2.582531,31.197943,32.244137,29.483562,4.491828,66.21953,33.97539,2019,16+
1,1,North West and Merseyside,0.560719,30.982224,38.279053,26.739931,3.438074,68.457054,30.178005,2019,16+
2,2,Yorkshire and the Humber,0.747205,35.800674,34.390793,24.56245,4.498879,63.45212,29.061329,2019,16+
3,3,East Midlands,2.335773,32.92297,33.95813,26.669912,4.113216,64.74126,30.783129,2019,16+
4,4,West Midlands,2.657712,28.137455,35.628075,29.294754,4.282002,69.204834,33.576756,2019,16+
5,5,Eastern,1.308611,35.41282,37.54746,22.697529,3.033584,63.278572,25.731112,2019,16+
6,6,London,2.85791,37.22865,36.47197,21.314987,2.126483,59.91344,23.441471,2019,16+
7,7,South East,1.725926,38.382877,36.133995,20.73367,3.023532,59.891193,23.757202,2019,16+
8,8,South West,1.57521,29.76165,38.60615,26.602253,3.454734,68.66314,30.056988,2019,16+
9,9,England,1.768487,33.991184,36.218113,28.022217,3.314329,67.55466,31.336544,2019,16+


In [6]:
df_BMI = df_BMI.drop(['Unnamed: 0'], axis = 1)
df_BMI.head()

Unnamed: 0,region,% underweight,% normal,% overweight,% obese,% morbidly obese,% overweight (25+),% obese (30+),year,age_group
0,North East,2.582531,31.197943,32.244137,29.483562,4.491828,66.21953,33.97539,2019,16+
1,North West and Merseyside,0.560719,30.982224,38.279053,26.739931,3.438074,68.457054,30.178005,2019,16+
2,Yorkshire and the Humber,0.747205,35.800674,34.390793,24.56245,4.498879,63.45212,29.061329,2019,16+
3,East Midlands,2.335773,32.92297,33.95813,26.669912,4.113216,64.74126,30.783129,2019,16+
4,West Midlands,2.657712,28.137455,35.628075,29.294754,4.282002,69.204834,33.576756,2019,16+


In [7]:
# Checking the shape of the dataframe
df_BMI.shape

(75, 10)

---

## 02. Import and examine the UK regions GeoJSON file

Borders of the 12 regions of the UK in GeoJson format for choropleth visualization for UK regions.<br>
Original file downloaded from https://www.kaggle.com/datasets/dorianlazar/uk-regions-geojson?resource=download

In [8]:
# Import the ".json" file for the UK Regions
UKregions_geo = r'/Users/elsaekevall/Jupyter_Notebook/Career_Foundry/09_2022_LCF_Analysis/02_Data/02_1_Original_Data/uk_regions.geojson'

In [9]:
# Examine the JSON file contents to see how the regions are spelt:

f = open(r'/Users/elsaekevall/Jupyter_Notebook/Career_Foundry/09_2022_LCF_Analysis/02_Data/02_1_Original_Data/uk_regions.geojson',)
  
# returns JSON object asa dictionary
data = json.load(f)
  
# Iterating through the json list
for i in data['features']:
    print(i)

{'type': 'Feature', 'geometry': {'type': 'MultiPolygon', 'coordinates': [[[[-2.03, 55.77], [-2.02, 55.77], [-2.03, 55.77], [-2.04, 55.76], [-2.05, 55.76], [-2.07, 55.76], [-2.09, 55.76], [-2.24, 55.65], [-2.33, 55.63], [-2.31, 55.62], [-2.23, 55.52], [-2.2, 55.48], [-2.18, 55.47], [-2.33, 55.41], [-2.34, 55.39], [-2.33, 55.39], [-2.34, 55.39], [-2.33, 55.39], [-2.34, 55.37], [-2.64, 55.26], [-2.63, 55.25], [-2.63, 55.23], [-2.69, 55.19], [-2.53, 55.08], [-2.53, 55.09], [-2.53, 55.08], [-2.53, 55.09], [-2.52, 55.09], [-2.5, 55.09], [-2.49, 55.09], [-2.5, 55.09], [-2.49, 55.09], [-2.5, 55.07], [-2.51, 55.04], [-2.51, 55.03], [-2.51, 55.04], [-2.51, 55.03], [-2.6, 54.97], [-2.6, 54.96], [-2.6, 54.97], [-2.6, 54.96], [-2.41, 54.85], [-2.33, 54.81], [-2.33, 54.8], [-2.33, 54.81], [-2.33, 54.8], [-2.33, 54.81], [-2.33, 54.8], [-2.34, 54.71], [-2.35, 54.7], [-2.16, 54.46], [-2.17, 54.46], [-2.16, 54.46], [-1.79, 54.49], [-1.66, 54.53], [-1.64, 54.52], [-1.59, 54.51], [-1.58, 54.51], [-1.59, 5

**Regions in the uk_regions file: East, North West, South East, South West, Wales, London, West Midlands, Yorkshire and the Humber, North East, 
Northern Ireland, East Midlands and Scotland**<br>
Map the regions using *key_on = 'feature.properties.rgn19nm'*

---

## 03. Create a subset of the BMI data and match the regions to the geoJSON file

### Match the regions in the BMI dataframe to the geoJSON file

In [10]:
# Create a subset of the BMI data and show the first 5 rows
df_BMI_geo = df_BMI[['region', '% overweight (25+)', '% obese (30+)', 'year', 'age_group']]
df_BMI_geo.head()

Unnamed: 0,region,% overweight (25+),% obese (30+),year,age_group
0,North East,66.21953,33.97539,2019,16+
1,North West and Merseyside,68.457054,30.178005,2019,16+
2,Yorkshire and the Humber,63.45212,29.061329,2019,16+
3,East Midlands,64.74126,30.783129,2019,16+
4,West Midlands,69.204834,33.576756,2019,16+


**Change 'North West and Merseyside' to North West and 'Eastern' to East**

In [11]:
# show the unique values in the BMI region column
df_BMI['region'].unique()

array(['North East', 'North West and Merseyside',
       'Yorkshire and the Humber', 'East Midlands', 'West Midlands',
       'Eastern', 'London', 'South East', 'South West', 'England',
       'Scotland', 'Wales', 'Northern Ireland'], dtype=object)

In [12]:
# changing region column entries using .replace() and recheck column
df_BMI_geo['region'].replace({'North West and Merseyside' : 'North West', 'Eastern' : 'East'}, inplace = True)
df_BMI_geo['region'].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_BMI_geo['region'].replace({'North West and Merseyside' : 'North West', 'Eastern' : 'East'}, inplace = True)


array(['North East', 'North West', 'Yorkshire and the Humber',
       'East Midlands', 'West Midlands', 'East', 'London', 'South East',
       'South West', 'England', 'Scotland', 'Wales', 'Northern Ireland'],
      dtype=object)

### Create subsets of the BMI data for adults only (16+) and the years 2015 and 2019

Data set previously cleaned and checked. The values for Wales in 2015 are missing values.

In [13]:
# select only the adult age group 
df_BMI_16_geo = df_BMI_geo.groupby(['age_group']).get_group('16+')
df_BMI_16_geo.head()

Unnamed: 0,region,% overweight (25+),% obese (30+),year,age_group
0,North East,66.21953,33.97539,2019,16+
1,North West,68.457054,30.178005,2019,16+
2,Yorkshire and the Humber,63.45212,29.061329,2019,16+
3,East Midlands,64.74126,30.783129,2019,16+
4,West Midlands,69.204834,33.576756,2019,16+


In [14]:
# Check the unique values in each column - age_group should only have one
df_BMI_16_geo.nunique()

region                13
% overweight (25+)    54
% obese (30+)         53
year                   5
age_group              1
dtype: int64

---

## 04. Map Percentage Classified as Obese for the Years 2015 and 2019

### Mapping Obesity in the Year 2015

In [15]:
# Create a data frame with just the regions and the obesity values for 2015
df_BMI_16_geo15 = df_BMI_16_geo.groupby(['year']).get_group(2015)
data_to_plot1 = df_BMI_16_geo15[['region','% obese (30+)']]
data_to_plot1

Unnamed: 0,region,% obese (30+)
44,North East,34.09976
45,North West,29.550465
46,Yorkshire and the Humber,28.422194
47,East Midlands,27.656828
48,West Midlands,30.502756
49,East,25.780594
50,London,23.243416
51,South East,25.08762
52,South West,21.74644
53,England,29.758091


In [16]:
# Check for null values
data_to_plot1.isnull().sum()

region           0
% obese (30+)    1
dtype: int64

In [17]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Percentage of Adults Classified as Obese (BMI 30+) in 2015'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar
myscale = np.linspace(data_to_plot1['% obese (30+)'].min()-2, data_to_plot1['% obese (30+)'].max()+1, 4)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot1,
    columns = ['region','% obese (30+)'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'BuPu', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage of Adults Classified as Obese',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage of Adults Classified as Obese').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

**There was no obesity data for Wales in 2015. The North East and the West Midlands in England and Scotland have the highest percentage of adults classified as obese. While London and the South West have the lowest. Has this changed over time?**

In [18]:
# Change the working directory and show directory path
os.chdir(r'/Users/elsaekevall/Jupyter_Notebook/Career_Foundry/09_2022_LCF_Analysis/04_Analysis')
os.getcwd()

'/Users/elsaekevall/Jupyter_Notebook/Career_Foundry/09_2022_LCF_Analysis/04_Analysis'

In [19]:
# save the 2015 obesity map
map.save('map_obesity15.html')

#### Add the tooltip function for the values

*Tried lots of different ways, but couldn't get the tooltip to work on any of the graphs. 
Error code:
ValueError('Cannot render objects with any missing geometries'
    543                      ': {!r}'.format(data))*

*Not sure if it is to do with the geojson file for the regions as I couldn't get ploty to work either.*

### Mapping Obesity in the Year 2019

In [20]:
# Create a data frame with just the regions and the obesity values for 2019
df_BMI_16_geo19 = df_BMI_16_geo.groupby(['year']).get_group(2019)
data_to_plot2 = df_BMI_16_geo19[['region','% obese (30+)']]
data_to_plot2

Unnamed: 0,region,% obese (30+)
0,North East,33.97539
1,North West,30.178005
2,Yorkshire and the Humber,29.061329
3,East Midlands,30.783129
4,West Midlands,33.576756
5,East,25.731112
6,London,23.441471
7,South East,23.757202
8,South West,30.056988
9,England,31.336544


In [21]:
# Check for null values
data_to_plot2.isnull().sum()

region           0
% obese (30+)    0
dtype: int64

In [22]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Percentage of Adults Classified as Obese (BMI 30+) in 2019'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# use the same scale as 2015
myscale = np.linspace(data_to_plot1['% obese (30+)'].min()-2, data_to_plot1['% obese (30+)'].max()+1, 4)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot2,
    columns = ['region', '% obese (30+)'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'BuPu',
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    show = True,
    overlay = True,
    name = 'Percentage of Adults Classified as Obese',
    legend_name = 'Percentage of Adults Classified as Obese').add_to(map)

# add labels indicating the name of the community
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl().add_to(map)

map

In [23]:
# save the 2019 obesity map
map.save('map_obesity19.html')

**The map shows that more regions have a higher percentage of adults classified as obese. The North West and East Midlands in England  now also have  30-35 percent of adults classified as obese. Most noticeable is the increase between 2015 and 2019 in the number of adults classified as obese in the South West. Why?**

---

## 05. Map Percentage Classified as Overweight for the Years 2015 and 2019

### Mapping Overweight in the Year 2015

In [24]:
# Create a data frame with just the regions and the overweight values in 2015
data_to_plot3 = df_BMI_16_geo15[['region','% overweight (25+)']]
data_to_plot3

Unnamed: 0,region,% overweight (25+)
44,North East,68.65693
45,North West,66.07106
46,Yorkshire and the Humber,64.21664
47,East Midlands,66.875755
48,West Midlands,67.61381
49,East,62.14406
50,London,56.94491
51,South East,60.696827
52,South West,58.89248
53,England,65.812035


In [25]:
# Check for null values
data_to_plot3.isnull().sum()

region                0
% overweight (25+)    1
dtype: int64

In [26]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Percentage of Adults Classified as Overweight (BMI 25+) in 2015'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar
myscale = np.linspace(data_to_plot3['% overweight (25+)'].min()-2, data_to_plot3['% overweight (25+)'].max()+1, 4)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot3,
    columns = ['region', '% overweight (25+)'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'BuPu', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage of Adults Classified as Overweight',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage of Adults Classified as Overweight').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

**In 2015 Scotland, the northern and midland regions of England have the highest percentage of adults classified as overweight. The South West and London have the lowest.**

### Mapping Overweight in the Year 2019

In [27]:
# Create a data frame with just the regions and the overweight values
data_to_plot4 = df_BMI_16_geo19[['region','% overweight (25+)']]
data_to_plot4

Unnamed: 0,region,% overweight (25+)
0,North East,66.21953
1,North West,68.457054
2,Yorkshire and the Humber,63.45212
3,East Midlands,64.74126
4,West Midlands,69.204834
5,East,63.278572
6,London,59.91344
7,South East,59.891193
8,South West,68.66314
9,England,67.55466


In [28]:
# Check for null values
data_to_plot4.isnull().sum()

region                0
% overweight (25+)    0
dtype: int64

In [29]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Percentage of Adults Classified as Overweight (BMI 25+) in 2019'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar using the data for the 2015 overweight plot
myscale = np.linspace(data_to_plot3['% overweight (25+)'].min()-2, data_to_plot3['% overweight (25+)'].max()+1, 4)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot4,
    columns = ['region', '% overweight (25+)'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'BuPu', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage of Adults Classified as Overweight',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage of Adults Classified as Overweight').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

**The overweight data tells the same story as the obese data. In 2019 the South West region in England shows an increase in the percentage of adults classified as overweight (the East Midlands has only dropped from 67-65%). Why?** 

---

## 06. Create a Subset of the LCF Analysis Data to Explore

**The subset of the data will contain two new variables - the regional mean of the %unprocessed_food and %ultra_processed_food (see below). It will also include the year and region**

*['year', 'region', 'r%unprocessed_food', 'r%ultra_processed_food']*<br>

In [30]:
# Create a new column for the mean regional total food cost
df_lcf_analysis['rW_total_food_cost'] = df_lcf_analysis.groupby(['region', 'year'])['W_total_food_cost'].transform(np.mean)
df_lcf_analysis[['region', 'year', 'rW_total_food_cost']].head(20) 

Unnamed: 0,region,year,rW_total_food_cost
0,Eastern,2016,1939.661948
1,North West and Merseyside,2015,1759.924158
2,North West and Merseyside,2016,1729.658686
3,South East,2015,2055.044167
4,North West and Merseyside,2015,1759.924158
5,South East,2015,2055.044167
6,Eastern,2015,1908.358848
7,South West,2015,1777.917498
8,Wales,2015,1683.241202
9,London,2015,2807.363772


In [31]:
# Create a new column for the mean regional ultra_processed food cost
df_lcf_analysis['rW_ultra_processed_food'] = df_lcf_analysis.groupby(['region', 'year'])['W_ultra_processed_food'].transform(np.mean)
df_lcf_analysis[['region', 'year', 'rW_ultra_processed_food']].head(20)                                                                

Unnamed: 0,region,year,rW_ultra_processed_food
0,Eastern,2016,617.777123
1,North West and Merseyside,2015,551.273975
2,North West and Merseyside,2016,548.335454
3,South East,2015,615.104431
4,North West and Merseyside,2015,551.273975
5,South East,2015,615.104431
6,Eastern,2015,561.560598
7,South West,2015,544.943729
8,Wales,2015,568.14477
9,London,2015,737.5561


In [32]:
# Create a new column for the % of ultra_processed food by region based on the mean
df_lcf_analysis['r%ultra_processed_food'] = df_lcf_analysis['rW_ultra_processed_food']/df_lcf_analysis['rW_total_food_cost'] * 100
df_lcf_analysis[['region', 'year', 'r%ultra_processed_food']].head(20)   

Unnamed: 0,region,year,r%ultra_processed_food
0,Eastern,2016,31.849732
1,North West and Merseyside,2015,31.323735
2,North West and Merseyside,2016,31.701946
3,South East,2015,29.931446
4,North West and Merseyside,2015,31.323735
5,South East,2015,29.931446
6,Eastern,2015,29.426363
7,South West,2015,30.650676
8,Wales,2015,33.753022
9,London,2015,26.272196


In [33]:
# Create a new column for the mean regional unprocessed food cost
df_lcf_analysis['rW_unprocessed_food'] = df_lcf_analysis.groupby(['region', 'year'])['W_unprocessed_food'].transform(np.mean)
df_lcf_analysis[['region', 'year', 'rW_unprocessed_food']].head(20) 

Unnamed: 0,region,year,rW_unprocessed_food
0,Eastern,2016,459.5605
1,North West and Merseyside,2015,416.637546
2,North West and Merseyside,2016,419.067614
3,South East,2015,484.539985
4,North West and Merseyside,2015,416.637546
5,South East,2015,484.539985
6,Eastern,2015,431.970013
7,South West,2015,432.899032
8,Wales,2015,424.15764
9,London,2015,668.777116


In [34]:
# Create a new column for the % of unprocessed food by region based on the mean
df_lcf_analysis['r%unprocessed_food'] = df_lcf_analysis['rW_unprocessed_food']/df_lcf_analysis['rW_total_food_cost'] * 100
df_lcf_analysis[['region', 'year', 'r%unprocessed_food']].head(20)  

Unnamed: 0,region,year,r%unprocessed_food
0,Eastern,2016,23.692814
1,North West and Merseyside,2015,23.673608
2,North West and Merseyside,2016,24.228342
3,South East,2015,23.578081
4,North West and Merseyside,2015,23.673608
5,South East,2015,23.578081
6,Eastern,2015,22.635681
7,South West,2015,24.348657
8,Wales,2015,25.198863
9,London,2015,23.822246


### Match the  lcf_analysis regions to the geoJSON file

In [35]:
# create new dataframe df_lcf_analysis_geo
df_lcf_analysis_geo = df_lcf_analysis

# show the unique values in the region column
df_lcf_analysis_geo['region'].unique()

array(['Eastern', 'North West and Merseyside', 'South East', 'South West',
       'Wales', 'London', 'West Midlands', 'Yorkshire and the Humber',
       'North East', 'Northern Ireland', 'East Midlands', 'Scotland'],
      dtype=object)

**Change 'North West and Merseyside' to North West and 'Eastern' to East**

In [36]:
# changing region column entries using .replace() and recheck column
df_lcf_analysis_geo['region'].replace({'North West and Merseyside' : 'North West', 'Eastern' : 'East'}, inplace = True)
df_lcf_analysis_geo['region'].unique()

array(['East', 'North West', 'South East', 'South West', 'Wales',
       'London', 'West Midlands', 'Yorkshire and the Humber',
       'North East', 'Northern Ireland', 'East Midlands', 'Scotland'],
      dtype=object)

**To get the average for the region keep the last unique year and region entries in the dataframe and create a subset with the required columns**

In [37]:
# Create a subset df_lcf_analysisW of the df_lcf_analysis dataframe with the specified columns

# creating a dataframe without duplicate regions
df_lcf_analysis_geo = df_lcf_analysis.drop_duplicates(['year', 'region'], keep='last')

# create a subset with only the relevant columns
df_lcf_analysis_geo =  df_lcf_analysis_geo[['year', 'region', 'r%unprocessed_food', 'r%ultra_processed_food']]
df_lcf_analysis_geo.head(60)

Unnamed: 0,year,region,r%unprocessed_food,r%ultra_processed_food
4817,2015,Northern Ireland,20.377653,26.524968
4823,2015,Scotland,22.253107,32.520439
4858,2015,West Midlands,24.42406,31.376977
4869,2015,London,23.822246,26.272196
4875,2015,East,22.635681,29.426363
4881,2015,Wales,25.198863,33.753022
4883,2015,South East,23.578081,29.931446
4884,2015,North West,23.673608,31.323735
4886,2015,Yorkshire and the Humber,22.719292,30.162747
4888,2015,South West,24.348657,30.650676


---

## 07. Map the Percentage spent on Ultra-Processed Food for the Years 2015 to 2020, 2015 and 2020

### Mapping the Percentage Spent on Ultra-Processed Food for the Period 2015 - 2020

In [38]:
# Create a data frame with just the regions and the % ultra-processed food values
data_to_plot5 = df_lcf_analysis_geo[['region', 'r%ultra_processed_food']]
data_to_plot5.head(15)

Unnamed: 0,region,r%ultra_processed_food
4817,Northern Ireland,26.524968
4823,Scotland,32.520439
4858,West Midlands,31.376977
4869,London,26.272196
4875,East,29.426363
4881,Wales,33.753022
4883,South East,29.931446
4884,North West,31.323735
4886,Yorkshire and the Humber,30.162747
4888,South West,30.650676


In [39]:
# Check for null values
data_to_plot5.isnull().sum()

region                    0
r%ultra_processed_food    0
dtype: int64

In [40]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Average Percentage Spent on Ultra-processed Food from 2015 to 2020'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar
myscale = np.linspace(data_to_plot5['r%ultra_processed_food'].min()+1, data_to_plot5['r%ultra_processed_food'].max()+3, 5)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot5,
    columns = ['region', 'r%ultra_processed_food'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'YlOrRd', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage Spent on Ultra-Processed Food',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage Spent on Ultra-Processed Food').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

In [41]:
# save the 2015 to 2020 % ultra-processed food map
map.save('map_perultra_all.html')

### Mapping % Ultra-Processed Food for the Year 2015

In [42]:
# Create a data frame with just the regions and the % ultra-processed food values
df_lcf_analysis_geo15 = df_lcf_analysis_geo.groupby('year').get_group(2015)
data_to_plot6 = df_lcf_analysis_geo15[['region', 'r%ultra_processed_food']]
data_to_plot6.head(15)

Unnamed: 0,region,r%ultra_processed_food
4817,Northern Ireland,26.524968
4823,Scotland,32.520439
4858,West Midlands,31.376977
4869,London,26.272196
4875,East,29.426363
4881,Wales,33.753022
4883,South East,29.931446
4884,North West,31.323735
4886,Yorkshire and the Humber,30.162747
4888,South West,30.650676


In [43]:
# Check for null values
data_to_plot6.isnull().sum()

region                    0
r%ultra_processed_food    0
dtype: int64

In [44]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Percentage Spent on Ultra-processed Food in 2015'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar
myscale = np.linspace(data_to_plot6['r%ultra_processed_food'].min()-6, data_to_plot6['r%ultra_processed_food'].max()+6, 5)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot6,
    columns = ['region', 'r%ultra_processed_food'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'YlOrRd', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage Spent on Ultra-Processed Food',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage Spent on Ultra-Processed Food').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

In [45]:
# save the 2015 % ultra-processed food map
map.save('map_perultra_food15.html')

### Mapping % Ultra-Processed Food for the Year 2019

In [46]:
# Create a data frame with just the regions and the % ultra-processed food values
df_lcf_analysis_geo19 = df_lcf_analysis_geo.groupby(['year']).get_group(2020)
data_to_plot7 = df_lcf_analysis_geo19[['region','r%ultra_processed_food']]
data_to_plot7.head(15)

Unnamed: 0,region,r%ultra_processed_food
26043,South West,29.222212
26063,North West,35.576769
26072,Northern Ireland,33.671381
26079,Wales,33.578029
26093,London,23.503362
26103,Scotland,33.134563
26105,South East,30.277927
26135,East,30.158357
26141,North East,36.570528
26143,Yorkshire and the Humber,33.180941


In [47]:
# Check for null values
data_to_plot7.isnull().sum()

region                    0
r%ultra_processed_food    0
dtype: int64

In [48]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Percentage Spent on Ultra-processed Food in 2019'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar
myscale = np.linspace(data_to_plot6['r%ultra_processed_food'].min()-6, data_to_plot6['r%ultra_processed_food'].max()+6, 5)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot7,
    columns = ['region', 'r%ultra_processed_food'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'YlOrRd', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage Spent on Ultra-Processed Food',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage Spent on Ultra-Processed Food').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

In [49]:
# save the 2020 % ultra-processed food map
map.save('map_perultra_food19.html')

---

## 08. Map the Percentage spent on Unprocessed Food for the Years 2015 to 2020, 2015 and 2020

### Mapping the Percentage Spent on Unprocessed Food for the Period 2015 - 2020

In [50]:
# Create a data frame with just the regions and the % unprocessed food values
data_to_plot8 = df_lcf_analysis_geo[['region', 'r%unprocessed_food']]
data_to_plot8.head(15)

Unnamed: 0,region,r%unprocessed_food
4817,Northern Ireland,20.377653
4823,Scotland,22.253107
4858,West Midlands,24.42406
4869,London,23.822246
4875,East,22.635681
4881,Wales,25.198863
4883,South East,23.578081
4884,North West,23.673608
4886,Yorkshire and the Humber,22.719292
4888,South West,24.348657


In [51]:
# Check for null values
data_to_plot8.isnull().sum()

region                0
r%unprocessed_food    0
dtype: int64

In [52]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Average Percentage Spent on Unprocessed Food from 2015 to 2020'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar
myscale = np.linspace(data_to_plot8['r%unprocessed_food'].min(), data_to_plot8['r%unprocessed_food'].max()+1, 5)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot8,
    columns = ['region', 'r%unprocessed_food'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'YlGn', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage Spent on Unprocessed Food',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage Spent on Unprocessed Food').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

### Mapping the Percentage Spent on Unprocessed Food for the Year 2015 

In [53]:
# Create a data frame with just the regions and the % unprocessed food values
data_to_plot9 = df_lcf_analysis_geo15[['region','r%unprocessed_food']]
data_to_plot9.head(15)

Unnamed: 0,region,r%unprocessed_food
4817,Northern Ireland,20.377653
4823,Scotland,22.253107
4858,West Midlands,24.42406
4869,London,23.822246
4875,East,22.635681
4881,Wales,25.198863
4883,South East,23.578081
4884,North West,23.673608
4886,Yorkshire and the Humber,22.719292
4888,South West,24.348657


In [54]:
# Check for null values
data_to_plot9.isnull().sum()

region                0
r%unprocessed_food    0
dtype: int64

In [55]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Percentage Spent on Unprocessed Food in 2015'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar
myscale = np.linspace(data_to_plot8['r%unprocessed_food'].min(), data_to_plot8['r%unprocessed_food'].max()+1, 5)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot9,
    columns = ['region', 'r%unprocessed_food'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'YlGn', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage Spent on Unprocessed Food',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage Spent on Unprocessed Food').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

In [56]:
# save the 2015 % unprocessed food map
map.save('map_perun_food15.html')

### Mapping the Percentage Spent on Unprocessed Food for the Year 2019

In [57]:
# Create a data frame with just the regions and the % unprocessed food values
data_to_plot10 = df_lcf_analysis_geo19[['region','r%unprocessed_food']]
data_to_plot10.head(15)

Unnamed: 0,region,r%unprocessed_food
26043,South West,25.815048
26063,North West,23.87747
26072,Northern Ireland,23.496769
26079,Wales,25.377112
26093,London,28.770634
26103,Scotland,24.227071
26105,South East,22.476785
26135,East,24.789197
26141,North East,22.555328
26143,Yorkshire and the Humber,22.505418


In [58]:
# Check for null values
data_to_plot10.isnull().sum()

region                0
r%unprocessed_food    0
dtype: int64

In [59]:
# Setup a folium map for the United Kingdom at a high-level zoom
map = folium.Map(location = [56, 0], zoom_start = 5, tiles = 'cartodbpositron')

# add a title
title = 'Percentage Spent on Unprocessed Food in 2019'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(title)   

# add tile layers to the map
tiles = ['cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map)
    
# set the parameters for the legend colour bar
myscale = np.linspace(data_to_plot8['r%unprocessed_food'].min(), data_to_plot8['r%unprocessed_food'].max()+1, 5)

# create the chorpleth map
choropleth = folium.Choropleth(
    geo_data = UKregions_geo, 
    data = data_to_plot10,
    columns = ['region', 'r%unprocessed_food'],
    key_on = 'feature.properties.rgn19nm',
    fill_color = 'YlGn', 
    threshold_scale = myscale,
    fill_opacity = 1, 
    line_opacity = 0.7,
    smooth_factor = 0,
    Highlight= True,
    name = 'Percentage Spent on Unprocessed Food',
    show = True,
    overlay = True,
    nan_fill_color = 'Grey',
    legend_name = 'Percentage Spent on Unprocessed Food').add_to(map)

# add labels showing the name of the region
style_function = "font-size: 15px; font-weight: bold"

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['rgn19nm'], style=style_function, labels=False))

# add a title
map.get_root().html.add_child(folium.Element(title_html))

folium.LayerControl(collapsed=False).add_to(map)

map

In [60]:
# save the 2020 % unprocessed food map
map.save('map_perun_food19.html')

### Create Map Using Plotly

*As with the tooltip above I couldn't get this to work and didn't have time to explore why.* 

---

## 09. Export the Dataframe as lcf_analysis_regions.csv

In [61]:
# Check the columns in the dataframe
df_lcf_analysis.columns

Index(['unique_id', 'no_people', 'household_type', 'quarter',
       'OECD_disposable_income', 'region', 'total_income', 'OECD_scale',
       'gross_income', 'weekly_disposable_income', 'income_source',
       'adult_food_cost', 'child_food_cost', 'total_food_cost',
       'adult_total_expenditure', 'child_total_expenditure',
       'total_expenditure', 'eng_rural_urb', 'scot_rural_urb',
       'quarterly_weight', 'year', 'no_children', 'no_adult',
       'unprocessed_food', 'processed_food', 'ultra_processed_food',
       'no_households', 'W_OECD_disposable_income', 'W_total_income',
       'W_gross_income', 'W_disposable_income', 'W_adult_food_cost',
       'W_child_food_cost', 'W_total_food_cost', 'W_adult_total_expenditure',
       'W_child_total_expenditure', 'W_total_expenditure',
       'W_unprocessed_food', 'W_processed_food', 'W_ultra_processed_food',
       '%unprocessed_food', '%processed_food', '%ultra_processed_food',
       'rW_total_food_cost', 'rW_ultra_processed_food',

In [62]:
# Check the shape of the dataframe
df_lcf_analysis.shape

(26146, 48)

In [63]:
# Export the dataframe into the Prepared_Data folder as lcf_analysis_regions.csv
df_lcf_analysis.to_csv(os.path.join(path, '02_Data','02_2_Prepared_Data', 'lcf_analysis_regions.csv'))
df_lcf_analysis.to_pickle(os.path.join(path, '02_Data','02_2_Prepared_Data', 'lcf_analysis_regions.pkl'))

---

## 10.  Discussion of the results and what they mean.
* Does the analysis answer any of your existing research questions?
* Does the analysis lead you to any new research questions?

The analysis shows the change in the regions. To answer the research questions factors like income source and househgold type would need to be incorporated. It would also be useful to look at the change within the regions over time by different categories rather than single plots. 
**One question that stands out is why has obesity in the South West increased by more than the other regions from around 1 in 5 adults to 1 in 3 adults?**
* With the exception of London and Wales the number of adults classified as obese has risen or remained roughly the same. 
* In 2015 (on average) a quarter of the total household food budget was spent on ultra-processed food. 
* Households in the northern regions of England in 2020 spent around a third of their total food budget on ultra_proceesed food.
* While housholds in London only spent around a quarter of their total budget on ultra-processed food.
* The range between the regions for the percentage spent on unprocessed food is much smaller than the range for ultra-processed food. 