The determination of which states are better suited for individual preferences depends on the values one places on different aspects of life. For some, a lower cost of living may be a priority, leading them to favor states that offer economic advantages. Others might prioritize safety and, thus, prioritize states with lower crime rates. Additionally, those who highly value healthcare quality may prioritize states that excel in healthcare services. This analysis has provided a comprehensive overview of the factors influencing state comparisons, allowing individuals to make informed decisions based on their unique priorities and preferences. Ultimately, the best state for one person may not be the same for another, making it essential to consider personal values when evaluating the data presented.

The following allows for different weights to be assigned to each category, then a score is calucalted and the top 5 scoring states displayed on a map. 

In [4]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import time
import scipy.stats as st
import hvplot.pandas

# Import the OpenWeatherMap API key
from api_keys import weather_api_key

In [5]:
# Read in final state data
state_data=pd.read_csv("final_data.csv")
state_data.head()

Unnamed: 0,Abbreviation,State,"Homicide Rate (per 100,000)",Healthcare Score,Cost of Living Index
0,CA,California,0.313473,56.74,134.5
1,HI,Hawaii,0.416218,78.81,179.0
2,NY,New York,0.625129,32.75,125.1
3,NH,New Hampshire,1.007925,30.47,115.0
4,VT,Vermont,1.239215,56.22,114.9


In [6]:
# We will use the OpenWeather API to get the latitude and longitude for each state. 
# First, the URL of Open Weather Map API 
url= "http://api.openweathermap.org/data/2.5/weather?"
units="metric"

info_geo_state=[]

# Print to logger
print("Beginning Data Retrieval     ")
print("-----------------------------")

# Create counters
record_count = 1
set_count=1
# Log the url, record, and set numbers

for state in state_data["State"]:
    # Print("Processing Record %s of Set %s | %s" % (record_count,set_count, state))
    # This, is to see the name of the information that i need.
    state_url = f"{url}appid={weather_api_key}&units={units}&q="

    try:
        state_weather = requests.get(state_url + state+",US").json()
        state_lat=state_weather["coord"]["lat"]
        state_lng=state_weather["coord"]["lon"]

        info_geo_state.append({"State":state,
                            "Lat":state_lat,
                            "Lng":state_lng})
    except:
        print("State not found. Skipping...")
        pass
    
    record_count += 1 
info_geo_state

Beginning Data Retrieval     
-----------------------------


[{'State': 'California', 'Lat': 38.3004, 'Lng': -76.5074},
 {'State': 'Hawaii', 'Lat': 20.7503, 'Lng': -156.5003},
 {'State': 'New York', 'Lat': 40.7143, 'Lng': -74.006},
 {'State': 'New Hampshire', 'Lat': 43.667, 'Lng': -71.4998},
 {'State': 'Vermont', 'Lat': 44.0003, 'Lng': -72.7498},
 {'State': 'Maine', 'Lat': 45.5003, 'Lng': -69.2498},
 {'State': 'New Jersey', 'Lat': 40.1671, 'Lng': -74.4999},
 {'State': 'North Dakota', 'Lat': 47.5003, 'Lng': -100.0007},
 {'State': 'Massachusetts', 'Lat': 42.3657, 'Lng': -71.1083},
 {'State': 'Idaho', 'Lat': 44.5002, 'Lng': -114.2512},
 {'State': 'Iowa', 'Lat': 42.0003, 'Lng': -93.5005},
 {'State': 'Alaska', 'Lat': 64.0003, 'Lng': -150.0003},
 {'State': 'Utah', 'Lat': 39.2502, 'Lng': -111.751},
 {'State': 'Arizona', 'Lat': 34.5003, 'Lng': -111.501},
 {'State': 'Montana', 'Lat': 47.0003, 'Lng': -109.751},
 {'State': 'South Dakota', 'Lat': 44.5003, 'Lng': -100.2507},
 {'State': 'Wyoming', 'Lat': 43.0002, 'Lng': -107.5009},
 {'State': 'Kansas', 'Lat':

In [8]:
# Put information into DataFrame
state_data_df=pd.DataFrame(info_geo_state)

Unnamed: 0,State,Lat,Lng
0,California,38.3004,-76.5074
1,Hawaii,20.7503,-156.5003
2,New York,40.7143,-74.006
3,New Hampshire,43.667,-71.4998
4,Vermont,44.0003,-72.7498


In [9]:
# Merge with final data 
state_data_complete=pd.merge(state_data_df,state_data,on="State",how="outer")

Unnamed: 0,State,Lat,Lng,Abbreviation,"Homicide Rate (per 100,000)",Healthcare Score,Cost of Living Index
0,California,38.3004,-76.5074,CA,0.313473,56.74,134.5
1,Hawaii,20.7503,-156.5003,HI,0.416218,78.81,179.0
2,New York,40.7143,-74.006,NY,0.625129,32.75,125.1
3,New Hampshire,43.667,-71.4998,NH,1.007925,30.47,115.0
4,Vermont,44.0003,-72.7498,VT,1.239215,56.22,114.9


In [11]:
# Create a new column with a Total score, weighted equally for each category. Each variable is divided by 
# its mean so that the three metrics can be meaningfully added together for a score. "Healthcare Score" has one added to
# it for the scoring so that South Dakota (which has a score of 0) can be weighted properly.

state_data_complete["Rank Same Priority"]=(
    (((state_data_complete["Cost of Living Index"])/(state_data_complete["Cost of Living Index"].mean()))*(1/3))
    +(((state_data_complete["Healthcare Score"]+1)/(state_data_complete["Healthcare Score"].mean()))*(1/3))
    +(((state_data_complete["Homicide Rate (per 100,000)"])/(state_data_complete["Homicide Rate (per 100,000)"].mean())))*(1/3))

Unnamed: 0,State,Lat,Lng,Abbreviation,"Homicide Rate (per 100,000)",Healthcare Score,Cost of Living Index,Rank Same Priority
0,California,38.3004,-76.5074,CA,0.313473,56.74,134.5,0.83046
1,Hawaii,20.7503,-156.5003,HI,0.416218,78.81,179.0,1.124783
2,New York,40.7143,-74.006,NY,0.625129,32.75,125.1,0.662184
3,New Hampshire,43.667,-71.4998,NH,1.007925,30.47,115.0,0.639943
4,Vermont,44.0003,-72.7498,VT,1.239215,56.22,114.9,0.825068


In [12]:
# Create a score column with Healthcare Costs receiving the priority
state_data_complete["Rank Healthcare Priority"]=(
    (((state_data_complete["Cost of Living Index"])/(state_data_complete["Cost of Living Index"].mean()))*(1/4))
    +(((state_data_complete["Healthcare Score"]+1)/(state_data_complete["Healthcare Score"].mean()))*(1/2))
    +(((state_data_complete["Homicide Rate (per 100,000)"])/(state_data_complete["Homicide Rate (per 100,000)"].mean())))*(1/4))

# Create a column with Cost of Living INdex receiving the priority
state_data_complete["Rank Cost Living Priority"]=(
    (((state_data_complete["Cost of Living Index"])/(state_data_complete["Cost of Living Index"].mean()))*(1/2))
    +(((state_data_complete["Healthcare Score"]+1)/(state_data_complete["Healthcare Score"].mean()))*(1/4))
    +(((state_data_complete["Homicide Rate (per 100,000)"])/(state_data_complete["Homicide Rate (per 100,000)"].mean())))*(1/4))

# Create a column with Homicide Rate received the priority
state_data_complete["Rank Homicide Priority"]=state_data_complete["Rank Same Priority"]=(
    (((state_data_complete["Cost of Living Index"])/(state_data_complete["Cost of Living Index"].mean()))*(1/4))
    +(((state_data_complete["Healthcare Score"]+1)/(state_data_complete["Healthcare Score"].mean()))*(1/4))
    +(((state_data_complete["Homicide Rate (per 100,000)"])/(state_data_complete["Homicide Rate (per 100,000)"].mean())))*(1/2))

## MAPPING TOP 5 STATES

### SAME PRIORITY OF EVERY SCORE


In [21]:
# Sort states by the equal priority score
equal_priority = state_data_complete.sort_values("Rank Same Priority")
# Select just the first 5 states 
equal_priority = equal_priority[["State", "Lat", "Lng", "Healthcare Score", "Cost of Living Index", 
                    "Homicide Rate (per 100,000)", "Rank Same Priority"]].reset_index(drop = True).head()

In [22]:
equal_priority_map=equal_priority.hvplot.points(
    "Lng",
    "Lat",
    geo=True,
    tiles="OSM",
    frame_width = 800,
    frame_height = 600,
    size="Rank Same Priority",
    scale=15,
    color="State",
    hover_cols=["Homicide Rate (per 100,000)","Healthcare Score","Cost of Living Index"]
)
plt.savefig("plots/Map_Same_Prior.png")
equal_priority_map

<Figure size 640x480 with 0 Axes>

The best five states are displayed in order in the legend. More information can be seen by hovering over each state.


In [15]:
# Sort states by Healthcare Priority score
healthcare_priority = state_data_complete.sort_values("Rank Healthcare Priority")
# Select the first 5 states
healthcare_priority = healthcare_priority[["State", "Lat", "Lng", "Healthcare Score", "Cost of Living Index", 
                                        "Homicide Rate (per 100,000)", "Rank Healthcare Priority"]].reset_index(drop = True).head()
# Display the top 5 states with Healthcare Costs as the highest priority
healthcare_priority



Unnamed: 0,State,Lat,Lng,Healthcare Score,Cost of Living Index,"Homicide Rate (per 100,000)",Rank Healthcare Priority
0,South Dakota,44.5003,-100.2507,0.0,93.8,2.903808,0.375947
1,Wyoming,43.0002,-107.5009,21.37,92.8,2.937096,0.587247
2,Maine,45.5003,-69.2498,25.92,111.5,1.311717,0.597465
3,New Hampshire,43.667,-71.4998,30.47,115.0,1.007925,0.636098
4,West Virginia,38.5004,-80.5001,17.69,90.3,5.328221,0.661813


In [16]:
healthcare_map=healthcare_priority.hvplot.points(
    "Lng",
    "Lat",
    geo=True,
    tiles="OSM",
    frame_width = 800,
    frame_height = 600,
    size="Rank Healthcare Priority",
    scale=15,
    color="State",
    hover_cols=["Homicide Rate (per 100,000)","Healthcare Score","Cost of Living Index"]
)

healthcare_map

The best five states are displayed in order in the legend. More information can be seen by hovering over each state. When Healthcare Costs are prioritized, South Dakota has the best score. 

In [17]:
# Sort states by Cost of Living Priority score
cost_priority = state_data_complete.sort_values("Rank Cost Living Priority")
# Select the first 5 states
cost_priority = cost_priority[["State", "Lat", "Lng", "Healthcare Score", "Cost of Living Index", 
                            "Homicide Rate (per 100,000)", "Rank Cost Living Priority"]].reset_index(drop = True).head()
# Display the top 5 states with Cost of Living as the highest priority
cost_priority



Unnamed: 0,State,Lat,Lng,Healthcare Score,Cost of Living Index,"Homicide Rate (per 100,000)",Rank Cost Living Priority
0,South Dakota,44.5003,-100.2507,0.0,93.8,2.903808,0.594864
1,Wyoming,43.0002,-107.5009,21.37,92.8,2.937096,0.697749
2,Maine,45.5003,-69.2498,25.92,111.5,1.311717,0.730024
3,New Hampshire,43.667,-71.4998,30.47,115.0,1.007925,0.754435
4,Kansas,38.5003,-98.5006,41.16,87.7,2.964647,0.772942


In [18]:
cost_map=cost_priority.hvplot.points(
    "Lng",
    "Lat",
    geo=True,
    tiles="OSM",
    frame_width = 800,
    frame_height = 600,
    size="Rank Cost Living Priority",
    scale=10,
    color="State",
    hover_cols=["Homicide Rate (per 100,000)","Healthcare Score","Cost of Living Index"]
)

cost_map

The best five states are displayed in order in the legend. More information can be seen by hovering over each state. When Cost of Living is prioritized, South Dakota has the best score. 

In [19]:
# Sort states by Homicide Rank Priority score
homicide_priority = state_data_complete.sort_values("Rank Homicide Priority")
# Select the first 5 states
homicide_priority = homicide_priority[["State", "Lat", "Lng", "Healthcare Score", "Cost of Living Index", 
                         "Homicide Rate (per 100,000)", "Rank Homicide Priority"]].reset_index(drop = True).head()
# Display the top 5 states with Homicide Rank as the highest priority
homicide_priority

Unnamed: 0,State,Lat,Lng,Healthcare Score,Cost of Living Index,"Homicide Rate (per 100,000)",Rank Homicide Priority
0,South Dakota,44.5003,-100.2507,0.0,93.8,2.903808,0.513132
1,New York,40.7143,-74.006,32.75,125.1,0.625129,0.527239
2,Maine,45.5003,-69.2498,25.92,111.5,1.311717,0.52811
3,New Hampshire,43.667,-71.4998,30.47,115.0,1.007925,0.529297
4,Wyoming,43.0002,-107.5009,21.37,92.8,2.937096,0.620033


In [20]:
homicide_map=homicide_priority.hvplot.points(
    "Lng",
    "Lat",
    geo=True,
    tiles="OSM",
    frame_width = 800,
    frame_height = 600,
    size="Rank Homicide Priority",
    scale=10,
    color="State",
    hover_cols=["Homicide Rate (per 100,000)","Healthcare Score","Cost of Living Index"]
)

homicide_map

The best five states are displayed in order in the legend. More information can be seen by hovering over each state. When Homicide Rate is prioritized, South Dakota has the best score. 