# Climate change and heat-related deaths in USA

This project studies whether hotter summer temperatures are associated with higher heat-related mortality across U.S. states. The idea is simple. If extreme heat is becoming more common, we want to know whether it is already producing measurable health impacts. We begin by describing the data we use, then we explore the relationship visually through scatter plots, and finally we estimate a regression model to test the relationship more formally.

##### Heat-related deaths data

The first part of the dataset contains information on heat-related deaths per 100,000 people. This measure adjusts for population size, so it allows us to compare states fairly. A small state and a large state can have very different numbers of total deaths, but the rate per 100k lets us observe the real health impact. The data is reported annually for each state, which gives us a time series within each state and allows us to observe how mortality changes over time.

In [58]:
import pandas as pd

#Data URL
url = "https://www.ncei.noaa.gov/pub/data/cirs/climdiv/climdiv-tmaxst-v1.0.0-20251117"

#Load raw fixed-width data
tmax_raw = pd.read_fwf(url, header=None)

#Make sure code column is string
tmax_raw[0] = tmax_raw[0].astype(str).str.strip().str.zfill(10)

#Extract year (last 4 digits)
tmax_raw["Year"] = tmax_raw[0].str[-4:].astype(int)

#Extract state code (first 3 digits)
tmax_raw["StateCode3"] = tmax_raw[0].str[:3]

#Rename monthly columns
tmax_raw.columns = [
    "code_year","Jan","Feb","Mar","Apr","May","Jun",
    "Jul","Aug","Sep","Oct","Nov","Dec",
    "Year","StateCode3"
]

#Compute summer max temperature
tmax_raw["summer_max"] = tmax_raw[["Jun","Jul","Aug"]].mean(axis=1)

#Assign readable state names
state_map = {
    "001": "Alabama","002": "Arizona","003": "Arkansas","004": "California",
    "005": "Colorado","006": "Connecticut","007": "Delaware","008": "Florida",
    "009": "Georgia","010": "Idaho","011": "Illinois","012": "Indiana",
    "013": "Iowa","014": "Kansas","015": "Kentucky","016": "Louisiana",
    "017": "Maine","018": "Maryland","019": "Massachusetts","020": "Michigan",
    "021": "Minnesota","022": "Mississippi","023": "Missouri","024": "Montana",
    "025": "Nebraska","026": "Nevada","027": "New Hampshire","028": "New Jersey",
    "029": "New Mexico","030": "New York","031": "North Carolina","032": "North Dakota",
    "033": "Ohio","034": "Oklahoma","035": "Oregon","036": "Pennsylvania",
    "037": "Rhode Island","038": "South Carolina","039": "South Dakota",
    "040": "Tennessee","041": "Texas","042": "Utah","043": "Vermont",
    "044": "Virginia","045": "Washington","046": "West Virginia",
    "047": "Wisconsin","048": "Wyoming","049": "Hawaii","050": "Alaska"
}
tmax_raw["State"] = tmax_raw["StateCode3"].map(state_map)

#Average across climate divisions to get one value per state-year
tmax_state = (
    tmax_raw.groupby(["State","Year"])["summer_max"]
    .mean()
    .reset_index()
)

#Display
tmax_state.head()


Unnamed: 0,State,Year,summer_max
0,Alabama,1895,89.233333
1,Alabama,1896,91.166667
2,Alabama,1897,92.1
3,Alabama,1898,90.533333
4,Alabama,1899,92.066667


##### Temperature data

The second part of the dataset includes summer_max, which represents the average maximum summer temperature for each state and year. This variable captures the intensity of heat exposure. We use it as our main explanatory variable because the hypothesis of interest is straightforward:
hotter summers may lead to more heat-related deaths.

In [76]:
import pandas as pd

#Load heat-related death data
deaths = pd.read_csv("Multiple Cause of Death, 1999-2020.csv")

#Keep only real states (drop "Total" rows)
deaths = deaths[deaths["State"] != "Total"]

#Make sure Year is integer
deaths["Year"] = pd.to_numeric(deaths["Year"],errors="coerce").astype("Int64")

#Compute deaths per 100k
deaths["deaths_per_100k"] = deaths["Deaths"] / deaths["Population"] * 100000

deaths.head()


Unnamed: 0,Notes,State,State Code,Year,Year Code,Deaths,Population,Crude Rate,deaths_per_100k
0,,Alabama,1.0,1999,1999.0,11.0,4430141.0,Unreliable,0.248299
1,,Alabama,1.0,2007,2007.0,14.0,4672840.0,Unreliable,0.299604
2,,Alabama,1.0,2016,2016.0,12.0,4863300.0,Unreliable,0.246746
3,,Alabama,1.0,2017,2017.0,10.0,4874747.0,Unreliable,0.205139
4,,Alabama,1.0,2018,2018.0,16.0,4887871.0,Unreliable,0.327341


##### Merging the datasets

We merge both datasets using state and year. After merging, each observation represents one state in one year, with both temperature and heat-death information available. This structure produces a panel dataset, which is helpful because it lets us compare states to each other and also examine changes within the same state over time.

In [78]:
#Merge two data sets
merged = deaths.merge(
    tmax_state,
    on=["State","Year"],
    how="left"
)

#Display
merged.head()

Unnamed: 0,Notes,State,State Code,Year,Year Code,Deaths,Population,Crude Rate,deaths_per_100k,summer_max
0,,Alabama,1.0,1999,1999.0,11.0,4430141.0,Unreliable,0.248299,90.3
1,,Alabama,1.0,2007,2007.0,14.0,4672840.0,Unreliable,0.299604,92.633333
2,,Alabama,1.0,2016,2016.0,12.0,4863300.0,Unreliable,0.246746,91.733333
3,,Alabama,1.0,2017,2017.0,10.0,4874747.0,Unreliable,0.205139,87.866667
4,,Alabama,1.0,2018,2018.0,16.0,4887871.0,Unreliable,0.327341,90.0


##### State-level graphs

We begin by creating scatter plots with trend lines for individual states. These visualizations help us form an initial intuition.

In California, the points show a slight upward pattern. Years with higher summer temperatures tend to have slightly higher heat-related death rates. This early pattern suggests a positive relationship.

In [79]:
#Graph California
import plotly.express as px
state = "California"
sub = merged[merged["State"]==state]
fig = px.scatter(
    sub,
    x="summer_max",
    y="deaths_per_100k",
    hover_name="Year",
    title=f"{state}:Heat vs deaths",
    labels={
        "summer_max":"Summer max t",
        "deaths_per_100k":"Heat-related deaths"
    },
    trendline="ols"
)
fig.show()

In Arizona, the upward trend is stronger. Arizona is hotter in general, and the mortality rate seems more responsive to temperature changes. These first two states make the relationship look intuitive and consistent.

In [80]:
#Graph Ariizona
import plotly.express as px
state = "Arizona"
sub = merged[merged["State"]==state]
fig = px.scatter(
    sub,
    x="summer_max",
    y="deaths_per_100k",
    hover_name="Year",
    title=f"{state}:Heat vs deaths",
    labels={
        "summer_max":"Summer max temperature",
        "deaths_per_100k":"Heat-related deaths per 100k"
    },
    trendline="ols"
)
fig.show()

However, Texas does not fit this pattern. The trend line for Texas slopes downward. In hotter years, Texas does not necessarily experience higher death rates. This contradicts the simple hypothesis and signals that the relationship is not universal. It suggests that other factors — like adaptation, infrastructure, demographics, or reporting differences — may influence heat-related mortality.

In [81]:
#Graph Texas
import plotly.express as px
state = "Texas"
sub = merged[merged["State"]==state]
fig = px.scatter(
    sub,
    x="summer_max",
    y="deaths_per_100k",
    hover_name="Year",
    title=f"{state}:Heat vs deaths",
    labels={
        "summer_max":"Summer max temperature",
        "deaths_per_100k":"Heat-related deaths per 100k"
    },
    trendline="ols"
)
fig.show()

##### Plots for all states

When we plot all states together, the overall trend becomes very weak. The variation across states is large, and many states do not show a clear relationship between heat and deaths. This reinforces the idea that the simple visual pattern is not strong enough to confirm the hypothesis.

In [82]:
#Graph all states
import plotly.express as px
fig = px.scatter(
    merged,
    x="summer_max",
    y="deaths_per_100k",
    color="State",
    hover_name="State",
    hover_data={"Year":True, "summer_max":True,"deaths_per_100k":True},
    title="Heat vs deaths in all states of USA",
    labels={
        "summer_max":"Summer max temperature",
        "deaths_per_100k":"Heat-related deaths per 100k"
    },
    trendline="ols",
    opacity=0.7
)
fig.update_traces(marker=dict(size=6))
fig.show()

##### Regression with fixed effects

To study the relationship more formally, we estimate a regression model that includes both state fixed effects and year fixed effects. These fixed effects help isolate the effect of temperature by controlling for two important sources of variation. State fixed effects control for all characteristics that are constant within each state, such as climate, infrastructure, demographics, or long-term adaptation capacity. Year fixed effects control for national shocks that affect all states in the same year, such as federal policy changes, heat wave reporting standards, or nationwide economic conditions.

In [83]:
#Analyze statistical significance

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

merged["State"]=merged["State"].astype(str)
merged["Year"]=merged["Year"].astype(str)

df = merged.dropna(subset=["summer_max","deaths_per_100k","State","Year"]).copy()

fe_model = smf.ols(
    formula="deaths_per_100k ~ summer_max + C(Year)",
    data=df
).fit(cov_type="HC3")

#Print results
coef = fe_model.params["summer_max"]
pval = fe_model.pvalues["summer_max"]
print("\nTemperature effect (within-state, controlling for state FE):")
print(f"  Coefficient on summer_max: {coef:.4f}")
print(f"  p-value: {pval:.4f}")


Temperature effect (within-state, controlling for state FE):
  Coefficient on summer_max: 0.0148
  p-value: 0.2317


Once we include both sets of controls, the estimated effect of summer temperature becomes very small and statistically insignificant. The coefficient on summer_max is close to zero, and the p-value is high. This means that, after accounting for differences across states and differences across years, summer temperature does not show a clear or consistent relationship with heat-related deaths in this dataset.

This result suggests that the simple patterns we saw in the state-level plots were not capturing a causal relationship. Other factors not included in the model likely play a major role in explaining heat-related mortality, and temperature alone is not sufficient to predict changes in death rates across states.