In [2]:
from matplotlib import pyplot as plt
from scipy.stats import linregress
import numpy as np
from sklearn import datasets
import pandas as pd
from pathlib import Path
import scipy.stats as st
import panel as pn
import geopandas as gpd
import hvplot.pandas


%matplotlib inline

Unnamed: 0,State,CityName,UniqueID,CityFIPS,TractFIPS,ObesityScore,Lat,Lon,WalkScore
0,TX,Houston,4835000-48201451401,4835000.0,48201450000.0,25.5,29.750367,-95.612367,51.0
1,TX,Garland,4829000-48113018138,4829000.0,48113020000.0,30.6,32.850897,-96.582081,26.0
2,TX,Mesquite,4847892-48113012500,4847892.0,48113010000.0,33.4,32.831949,-96.657915,46.0
3,TX,Dallas,4819000-48113004800,4819000.0,48113000000.0,38.5,32.743677,-96.81755,85.0
4,TX,Fort Worth,4827000-48439105800,4827000.0,48439110000.0,37.5,32.665385,-97.342787,54.0


In [6]:

# Convert "ObesityScore" and "WalkScore" columns to numeric, coercing non-numeric values to NaN
merged_df["ObesityScore"] = pd.to_numeric(merged_df["ObesityScore"], errors="coerce")
merged_df["WalkScore"] = pd.to_numeric(merged_df["WalkScore"], errors="coerce")

# Create the GeoDataFrame with latitude and longitude and set CRS to EPSG:4326
gdf = gpd.GeoDataFrame(merged_df, geometry=gpd.points_from_xy(merged_df["Lon"], merged_df["Lat"]))
gdf.crs = "EPSG:4326"

# Check for and drop rows with invalid geometries
gdf = gdf[gdf.geometry.is_valid] 

# Create the map plot using hvplot
map_plot = gdf.hvplot(geo=True, tiles="CartoDark", c="State", cmap="Category20", size=15, alpha=0.75,
                      hover_cols=["WalkScore", "ObesityScore"],
                      xlabel="Obesity Rate", ylabel="Walk Score (0-100)",
                      title="Walk Score vs Obesity Rate on US Map")
map_plot

1st Map (Color by State):

Purpose: The first map provides a visual representation of the distribution of walk scores and obesity rates across different states in the US.
Insights:
It allows you to identify the states with higher and lower walk scores and obesity rates at a glance.
The color differentiation helps to distinguish between states and their unique values.
It gives a general sense of how the walk scores and obesity rates are spread across the country.
Analysis:
You can quickly observe which states have high or low walk scores and obesity rates, and potentially look for patterns or correlations based on geographic location.
It helps in understanding the regional variations in walk scores and obesity rates, and any potential clusters of states with similar characteristics.

In [10]:

# Check for and drop rows with invalid geometries
gdf = gdf[gdf.geometry.is_valid]

# Group the data by state and calculate the mean walk score and obesity rate for each state
grouped_data = gdf.groupby("State").agg({"WalkScore": "mean", "ObesityScore": "mean"}).reset_index()
grouped_data.rename(columns={"WalkScore": "WalkScore_mean", "ObesityScore": "ObesityScore_mean"}, inplace=True)

# Merge the aggregated data with the original GeoDataFrame based on the "State" column
gdf = gdf.merge(grouped_data, on="State")

# Create the map plot using hvplot
map_plot = gdf.hvplot(geo=True, tiles="CartoDark", c="ObesityScore_mean", cmap="viridis", size=15, alpha=0.75,
                      hover_cols=["State", "WalkScore_mean", "ObesityScore_mean"],
                      xlabel="Obesity Rate (0-100%)", ylabel="Walk Score (0-100%)",
                      title="Average Walk Score vs Obesity Rate on US Map (Color per Obesity Rate)")

map_plot


2nd Map (Color by Average Obesity Score):

Purpose: The second map focuses on visualizing the average obesity scores for each state, providing a more aggregated view of the data.

Insights:
It highlights the states with higher or lower average obesity rates.
The color intensity helps to differentiate between states based on their average obesity scores, allowing for more precise comparisons.
It emphasizes the states with the most significant health concerns regarding obesity.

Analysis:
You can easily identify states with above-average or below-average obesity rates, making it ideal for targeting specific regions for health interventions or policy measures.
It provides a clearer picture of the states that need more attention in terms of obesity-related initiatives.
By comparing this map with other relevant data (e.g., socioeconomic factors, access to healthcare), you can explore potential correlations and patterns that might explain the obesity rates in different regions.


In summary, the first map provides a detailed view of individual data points (states) and their specific walk scores and obesity rates, while the second map offers a broader overview of the average obesity rates across states. Depending on the analysis goal, either map or both together can be used to gain insights and make data-driven decisions related to walk scores and obesity rates at the state level.


Code Analysis The two codes are related and serve the same purpose of creating a map plot to visualize the relationship between walk scores and obesity rates for different states. However, there are some differences between the two maps.

Data Aggregation: 1st Illustration: The first code does not perform any data aggregation. It simply creates a GeoDataFrame (gdf) from the given CSV data, converts "ObesityScore" and "WalkScore" columns to numeric type, drops rows with invalid geometries, and then directly creates a map plot using HoloViews hvplot. The color (c) of each point on the map is based on the unique values in the "State" column. 2nd Illustration: The second code performs data aggregation using Pandas groupby and agg methods. It groups the data in the GeoDataFrame gdf by "State", calculates the mean walk score and obesity rate for each state, and stores the aggregated data in a new DataFrame called grouped_data. It then merges this aggregated data with the original GeoDataFrame gdf based on the "State" column. The color (c) of each point on the map is based on the calculated average obesity rate (ObesityScore_mean) for each state.

Color Mapping: 1st illustration: In the first code, the color mapping is based on the unique values in the "State" column, which means each state will be represented by a different color. 2nd illustration: In the second code, the color mapping is based on the calculated average obesity rate (ObesityScore_mean) for each state. The color scale is determined by the "viridis" colormap, and the color of each point on the map corresponds to the average obesity rate of the respective state.

Hover Information: 1st illsutration: The first code specifies the columns "WalkScore" and "ObesityScore" as hover columns. When hovering over a point on the map, the tooltip will display the values of these columns for that specific point (state). 2nd illsutration: The second code specifies the columns "State", "WalkScore_mean", and "ObesityScore_mean" as hover columns. When hovering over a point on the map, the tooltip will display the state name along with the calculated average walk score and obesity rate for that specific state. Overall, the second code provides more insightful information by showing the average walk score and obesity rate for each state, making it easier to compare and analyze the data on the map.