# Choropleth Maps for Merged data

This notebook is aming to cleaned and manipulate the greenspace geopackage file (gpkg) and use its information to create choropleth map.
also create regional and divisional maps. 

Obejectives:
- Exploring the GPKG file from GreenSpace datasets
- Create choropleth map by using converted gpkg file and merged datasets


Conclusion:
- The file can be read by geopandas. 
- The ...V1_2.gpkg contains MULTIPOLYGON data, can be used as area boundary for the choropleth map.

*Note: ...V1_2.gpkg file is very large, so it may take few minutes to load.*

In [1]:
import Merged_Map as mm

# import geopandas as gpd
# import pandas as pd
# import numpy as np
# import folium
# import json
# import os

# from shapely.geometry import Point



## Prepare the Geopackage file
The geopackage file contains columns that are almost identical to those in the greenspace CSV file, with the addition of a geometry column containing MULTIPOLYGON data.

To make sure gdf is aligned with greenspaced_cleaned.csv and merged_greenspace_mh.csv, we will clean the gdf by importing the cleaning process of greenspaced_cleaned.csv and merged_greenspace_mh.csv.

In the end, we will keep the essential data for gdf and output as a geojson file `Greenspace_US.geojson`.

In [2]:
# Load the geopackage file an output to 
gpkg_path = 'GreenspaceDownload/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg'

green_us = mm.loading_and_cleaning(gpkg_path)
green_us.head(3)

Loading existing cleaned file...


Unnamed: 0,UC_Grouping,Latitude,Longitude,Country,Urban Center,Cities in Urban Center,E_GR_AV14,E_GR_AT14,SDG_A2G14,SDG_OS15MX,...,B15,BUCAP15,INCM_CMI,DEV_CMI,GDP15_SM,E_BM_NM_LST,E_WR_T_14,Cities in Urban Center_copy,State,geometry
0,0,21.340678,-157.893497,United States,Honolulu,Honolulu,0.36929,183.811667,0.226415,56.41,...,80.647377,157.252219,HIC,MDR,21926680000.0,Tropical and Subtropical Dry Broadleaf Forests,23.526622,Honolulu; Waipahu; Pearl City; Aiea,HI,"MULTIPOLYGON (((-158.01244 21.42219, -157.9915..."
1,2,34.923123,-120.434372,United States,Santa Maria,Santa Maria,0.312846,54.450694,0.040129,23.64,...,42.000805,340.96742,HIC,MDR,4174295000.0,"Mediterranean Forests, Woodlands, and Scrub",14.718191,Santa Maria,CA,"MULTIPOLYGON (((-120.46375 34.98933, -120.4411..."
2,4,34.427664,-119.743693,United States,Santa Barbara,Santa Barbara,0.362785,59.576284,0.061348,36.5,...,38.101749,332.032274,HIC,MDR,4159702000.0,"Mediterranean Forests, Woodlands, and Scrub",15.376907,Santa Barbara,CA,"MULTIPOLYGON (((-119.82444 34.45783, -119.8131..."


After cleaned the geometry file and generate a dataframe `green_us` based on Greenspace_Data_Cleaning.ipynb, we need to conduct a further cleaning to make sure the geometry data align with the merged dataset.

So we will load the merged dataset and manipulated `green_us` for further cleaning.

In [3]:
file_path = 'merged_greenspace_mh.csv'

In [4]:
# load merged dataset and manipulate rows to match the merged dataset
uc_merged, up_greenus = mm.rows_matching_with_merged(file_path, green_us, key_col = 'UC_Grouping')

# presenting merged dataframe
print(uc_merged.shape)
uc_merged.head(3)

(329, 19)


Unnamed: 0,Population2010,MHLTH_AdjPrev,UC_Grouping,Latitude,Longitude,E_GR_AV14,E_GR_AT14,SDG_A2G14,SDG_OS15MX,P15,B15,BUCAP15,GDP15_SM,E_WR_T_14,State,INCM_CMI,DEV_CMI,E_BM_NM_LST,Cities in Urban Center_copy
0,212237,15.6,485,33.509025,-86.823651,0.494568,219.99623,0.773812,74.85,196387.767,152.894608,778.534274,6184143000.0,17.497644,AL,HIC,MDR,Temperate Broadleaf and Mixed Forests,Birmingham;
1,180105,13.4,501,34.726065,-86.609995,0.521522,88.700999,0.802599,66.37,86467.06209,59.674004,690.135667,2498489000.0,16.321889,AL,HIC,MDR,Temperate Broadleaf and Mixed Forests,Huntsville
2,195111,15.0,422,30.692377,-88.093685,0.467515,122.669298,0.822213,63.32,118578.6789,71.298004,601.271703,4072112000.0,20.312027,AL,HIC,MDR,Temperate Coniferous Forests,Mobile


In [5]:
uc_merged.columns

Index(['Population2010', 'MHLTH_AdjPrev', 'UC_Grouping', 'Latitude',
       'Longitude', 'E_GR_AV14', 'E_GR_AT14', 'SDG_A2G14', 'SDG_OS15MX', 'P15',
       'B15', 'BUCAP15', 'GDP15_SM', 'E_WR_T_14', 'State', 'INCM_CMI',
       'DEV_CMI', 'E_BM_NM_LST', 'Cities in Urban Center_copy'],
      dtype='object')

In [6]:
# presenting updated green_us dataframe
up_greenus.shape
up_greenus.head()

Unnamed: 0,UC_Grouping,Latitude,Longitude,Country,Urban Center,Cities in Urban Center,E_GR_AV14,E_GR_AT14,SDG_A2G14,SDG_OS15MX,...,B15,BUCAP15,INCM_CMI,DEV_CMI,GDP15_SM,E_BM_NM_LST,E_WR_T_14,Cities in Urban Center_copy,State,geometry
0,0,21.340678,-157.893497,United States,Honolulu,Honolulu,0.36929,183.811667,0.226415,56.41,...,80.647377,157.252219,HIC,MDR,21926680000.0,Tropical and Subtropical Dry Broadleaf Forests,23.526622,Honolulu; Waipahu; Pearl City; Aiea,HI,"MULTIPOLYGON (((-158.01244 21.42219, -157.9915..."
1,2,34.923123,-120.434372,United States,Santa Maria,Santa Maria,0.312846,54.450694,0.040129,23.64,...,42.000805,340.96742,HIC,MDR,4174295000.0,"Mediterranean Forests, Woodlands, and Scrub",14.718191,Santa Maria,CA,"MULTIPOLYGON (((-120.46375 34.98933, -120.4411..."
2,4,34.427664,-119.743693,United States,Santa Barbara,Santa Barbara,0.362785,59.576284,0.061348,36.5,...,38.101749,332.032274,HIC,MDR,4159702000.0,"Mediterranean Forests, Woodlands, and Scrub",15.376907,Santa Barbara,CA,"MULTIPOLYGON (((-119.82444 34.45783, -119.8131..."
3,6,36.688991,-121.640831,United States,Salinas,Salinas,0.339631,53.886276,0.076114,24.61,...,41.044956,274.027026,HIC,MDR,4813837000.0,"Mediterranean Forests, Woodlands, and Scrub",15.27411,Salinas,CA,"MULTIPOLYGON (((-121.66750 36.74127, -121.6560..."
4,7,34.217486,-119.209132,United States,Oxnard,Oxnard,0.299903,135.224578,0.036199,28.65,...,97.043526,325.861123,HIC,MDR,10745820000.0,"Mediterranean Forests, Woodlands, and Scrub",17.053577,Oxnard; Ventura,CA,"MULTIPOLYGON (((-119.31772 34.29254, -119.2952..."


Seems updated green_us has more rows than merged dataset. 

It's acutally OK if the number of rows of green_us doesn't match of merged dataset, as long as we have holistic geometry data that can present the all cities areas of merged dataset.

So we need to make sure we got enough geometry data in updated green_us by comparing the unique number of UC_Grouping of both datasets.

In [7]:
# testing unique number of UC_Grouping in both datasets
n_mer = len(uc_merged['UC_Grouping'].unique())
n_green = len(up_greenus['UC_Grouping'].unique())

if n_mer == n_green:
    print(f"The numbers of unique UC_Grouping are the same, which is {n_mer}.")
else:
    print(f"Numbers are not the same. Merged datasets has {n_mer} unique UC_Grouping, Green_us has {n_green} uniques UC_Grouping")

The numbers of unique UC_Grouping are the same, which is 228.


Since we have covered all cities, we can just keep 1 geometry data point for 1 unique UC_Grouping and also remove less relevant data to make the file size smaller.


In [8]:
ess_geo = mm.smaller_file(up_greenus)
print(ess_geo.shape)
ess_geo.head()

(228, 2)


Unnamed: 0,UC_Grouping,geometry
0,0,"MULTIPOLYGON (((-158.01244 21.42219, -157.9915..."
1,2,"MULTIPOLYGON (((-120.46375 34.98933, -120.4411..."
2,4,"MULTIPOLYGON (((-119.82444 34.45783, -119.8131..."
3,6,"MULTIPOLYGON (((-121.66750 36.74127, -121.6560..."
4,7,"MULTIPOLYGON (((-119.31772 34.29254, -119.2952..."


We have cleaned less relevant rows and columns, we are ready to output the geojson file.

In [9]:
# output cleaned geometry df to geojson
_ = mm.df_to_geojson(ess_geo)

Geojson file already exists.


# Choropleth Map for Merged Dataset

Now we will create a choropleth map to visualize features geographically.

In [10]:
# check the geojson file
f = open('GEOJSON/Greenspace_US.geojson', 'r')
# f.readlines()[:10]

## City-level Map
Firstly, we will identify some insights based on city-level map.

In [11]:
# create a list of columns to be used in the choropleth map
unwanted_col = ['Population2010','UC_Grouping','Latitude','Longitude','State','Cities in Urban Center_copy']
col_list = [x for x in uc_merged.columns if x not in unwanted_col]

city_map = mm.merged_choropleth_map('GEOJSON/Greenspace_US.geojson',uc_merged, col_list)

# output the map to html if needed
# mm.output_map_html(city_map, 'city_map.html')

# display(city_map)

### Findings:
- Inspected cities are more centralized in the southwest and northeast areas of the US, with fewer cities in the central area.

- Areas with the same E_BM_NM_LST feature, such as cities around New York (labeled as Temperate Broadleaf and Mixed Forests), may have significant mental health issue prevalence. This allows further examination of other features' influences by controlling the `E_BM_NM_LST`.

- Cities may have lower mental illness prevalence with a higher number of `E_GR_AT14` (i.e., total area of greenness estimated for 2015).

- `GDP15_SM` seems correlated with mental illness prevalence; further research is needed.

- It seems the higher `E_WR_T_14` (average temperature), the less mental illness prevalence.

- Some features seem to have no numbers, such as:
    - `E_GR_AV14`: average greenness estimated for 2014 located in the built-up area
of epoch 2014.
    - `SDG_A2G14`:share of population living in the high green area in 2015 in the
Urban Centre of 2015
    - Need further inspection on them.

- `P15`, `B15`, and `BUCAP15` seem more related to the size of cities, less relevant to mental illness prevalence.

- I think feature `INCM_CMI` is less relevant here, it labels all cities as "HIC", meaning "High Income Countries". This is likely a country-level label, and we can remove it from the merged dataset. As well as `DEV_CMI`.


## Divison-level Map
Inspected cities seem very sptaial over the US, hence we'd like to group them into division geogrpahic level based on the [Census Regions and Divisions of the United States](https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf).

In [12]:
# call us_division function to return the division dictionary
division_dic = mm.us_division()

# apply the division labels to the merged dataset
div_merged01 = mm.apply_geo_labels(uc_merged, 'Division', division_dic, 'State')
div_merged01.head(3)

Unnamed: 0,Population2010,MHLTH_AdjPrev,UC_Grouping,Latitude,Longitude,E_GR_AV14,E_GR_AT14,SDG_A2G14,SDG_OS15MX,P15,B15,BUCAP15,GDP15_SM,E_WR_T_14,State,INCM_CMI,DEV_CMI,E_BM_NM_LST,Cities in Urban Center_copy,Division
0,212237,15.6,485,33.509025,-86.823651,0.494568,219.99623,0.773812,74.85,196387.767,152.894608,778.534274,6184143000.0,17.497644,AL,HIC,MDR,Temperate Broadleaf and Mixed Forests,Birmingham;,East South Central
1,180105,13.4,501,34.726065,-86.609995,0.521522,88.700999,0.802599,66.37,86467.06209,59.674004,690.135667,2498489000.0,16.321889,AL,HIC,MDR,Temperate Broadleaf and Mixed Forests,Huntsville,East South Central
2,195111,15.0,422,30.692377,-88.093685,0.467515,122.669298,0.822213,63.32,118578.6789,71.298004,601.271703,4072112000.0,20.312027,AL,HIC,MDR,Temperate Coniferous Forests,Mobile,East South Central


Uses enumerate() on unique values of `E_BM_NM_LST`, their index will be used as the numberic labels.

- 0: 'Temperate Broadleaf and Mixed Forests',
- 1: 'Temperate Coniferous Forests',
- 2: 'Boreal Forests/Taiga',
- 3: 'Deserts and Xeric Shrublands',
- 4: 'Mediterranean Forests, Woodlands, and Scrub',
- 5: 'Temperate Grasslands, Savannas, and Shrublands',
- 6: 'Flooded Grasslands and Savannas',
- 7: 'Tropical and Subtropical Dry Broadleaf Forests',
- 8: 'Tropical and subtropical grasslands, savannas, and shrublands'

In [13]:
#convert `E_BM_NM_LST` to numberic categorial labels
env_dict = mm.cate_to_num_labels(div_merged01, 'E_BM_NM_LST')

# apply to the merged dataset by using the function
div_merged02 = mm.apply_num_labels(div_merged01, 'Biome_Class', env_dict, 'E_BM_NM_LST')

div_merged02.head(3)



Unnamed: 0,Population2010,MHLTH_AdjPrev,UC_Grouping,Latitude,Longitude,E_GR_AV14,E_GR_AT14,SDG_A2G14,SDG_OS15MX,P15,...,BUCAP15,GDP15_SM,E_WR_T_14,State,INCM_CMI,DEV_CMI,E_BM_NM_LST,Cities in Urban Center_copy,Division,Biome_Class
0,212237,15.6,485,33.509025,-86.823651,0.494568,219.99623,0.773812,74.85,196387.767,...,778.534274,6184143000.0,17.497644,AL,HIC,MDR,Temperate Broadleaf and Mixed Forests,Birmingham;,East South Central,0
1,180105,13.4,501,34.726065,-86.609995,0.521522,88.700999,0.802599,66.37,86467.06209,...,690.135667,2498489000.0,16.321889,AL,HIC,MDR,Temperate Broadleaf and Mixed Forests,Huntsville,East South Central,0
2,195111,15.0,422,30.692377,-88.093685,0.467515,122.669298,0.822213,63.32,118578.6789,...,601.271703,4072112000.0,20.312027,AL,HIC,MDR,Temperate Coniferous Forests,Mobile,East South Central,1


In [14]:
drop_lst = ['E_BM_NM_LST','State', 'Cities in Urban Center_copy','INCM_CMI', 'DEV_CMI','Latitude', 'Longitude','UC_Grouping','Population2010']

div_merged03 = mm.div_merged_dropcols(div_merged02, drop_lst)
div_merged03.head(3)

Unnamed: 0,MHLTH_AdjPrev,E_GR_AV14,E_GR_AT14,SDG_A2G14,SDG_OS15MX,P15,B15,BUCAP15,GDP15_SM,E_WR_T_14,Division,Biome_Class
0,15.6,0.494568,219.99623,0.773812,74.85,196387.767,152.894608,778.534274,6184143000.0,17.497644,East South Central,0
1,13.4,0.521522,88.700999,0.802599,66.37,86467.06209,59.674004,690.135667,2498489000.0,16.321889,East South Central,0
2,15.0,0.467515,122.669298,0.822213,63.32,118578.6789,71.298004,601.271703,4072112000.0,20.312027,East South Central,1


In [15]:
division_df = mm.aggregate_division(div_merged03, non_mean_cols = ['E_BM_NM_LST','Biome_Class', 'Division'], mode_col = 'Biome_Class',groupby_col = 'Division')
division_df

Unnamed: 0,Division,MHLTH_AdjPrev,E_GR_AV14,E_GR_AT14,SDG_A2G14,SDG_OS15MX,P15,B15,BUCAP15,GDP15_SM,E_WR_T_14,Biome_Class
0,East North Central,13.380952,0.516777,774.961635,0.694572,70.210238,1154564.0,551.798662,556.505963,37947440000.0,10.293027,0
1,East South Central,14.545455,0.506293,203.779326,0.744445,69.174545,230505.7,135.019078,581.579101,6840949000.0,16.397906,0
2,Middle Atlantic,14.673333,0.521972,997.342724,0.586376,71.242667,2586011.0,669.952574,416.810349,106589800000.0,10.850969,0
3,Mountain,11.545,0.331719,798.551943,0.123852,31.426,1317501.0,578.163302,444.736287,42253140000.0,14.014378,3
4,New England,14.82381,0.524327,212.376133,0.578994,68.622381,359803.3,133.033906,392.815708,14503150000.0,10.644886,0
5,Pacific,12.26044,0.333942,1939.330183,0.102669,29.927253,4812977.0,1536.889651,345.002676,185071100000.0,16.090889,4
6,South Atlantic,12.747619,0.501641,830.607874,0.64259,61.842857,1289559.0,554.068995,495.78543,49641710000.0,19.482446,1
7,West North Central,11.466667,0.513145,354.265216,0.746819,67.955714,428671.1,241.532846,548.684956,13716180000.0,11.082957,5
8,West South Central,12.176087,0.440452,1086.265811,0.377547,42.223261,1527971.0,763.32852,522.963661,54526360000.0,19.41294,5


In [16]:
# create the divisional choropleth map

# create a list of columns to be used in the choropleth map
col_list = [x for x in division_df.columns]

division_map = mm.merged_choropleth_map('GEOJSON/division_gdf.geojson',division_df, col_list, geo_col = ['Division', 'MHLTH_AdjPrev'], key = 'NAME')

# output the map to html if needed
# mm.output_map_html(division_map, 'division_map.html')

# display(division_map)


## Region-level Map

In [21]:
region_df = mm.aggregation_manipulation(mm.us_region(),uc_merged, non_mean_cols = ['Biome_Class', 'Region'], mode_col = 'Biome_Class',groupby_col = 'Region')
region_df.sort_values(by ='MHLTH_AdjPrev', ascending = False)


Unnamed: 0,Region,MHLTH_AdjPrev,E_GR_AV14,E_GR_AT14,SDG_A2G14,SDG_OS15MX,P15,B15,BUCAP15,GDP15_SM,E_WR_T_14,Biome_Class
1,Northeast,14.72973,0.521819,544.908017,0.581369,69.451892,1289376.0,359.062028,400.710715,53014870000.0,10.796793,0
0,Midwest,12.742857,0.515566,634.729495,0.711987,69.45873,912599.5,448.376723,553.89896,29870350000.0,10.556337,5
2,South,12.672449,0.473796,881.160955,0.530497,53.475306,1281866.0,606.397195,519.915053,47043650000.0,19.167966,0
3,West,12.041985,0.333263,1591.000949,0.109137,30.384885,3745656.0,1244.148781,375.455687,141462600000.0,15.45684,4


In [20]:
# create the regional choropleth map

# create a list of columns to be used in the choropleth map
col_list = [x for x in region_df.columns]

region_map = mm.merged_choropleth_map('GEOJSON/region_gdf.geojson', region_df, col_list, geo_col = ['Region', 'MHLTH_AdjPrev'], key = 'NAME')

# output the map to html if needed
# mm.output_map_html(region_map, 'region_map.html')

display(region_map)