As you've seen in the [previous notebook](https://www.kaggle.com/jedbell/climate-analysis-i-how-much-time), we're hurtling towards a day of reckoning where the global average temperature increases by 1.5 degrees. But when that day of reckoning comes, how much will it cost? And how much will we regret not taking action sooner? What will the costs of not taking action now be in one or two decades time? This notebook seeks to answer these questions. 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from statsmodels.tsa.stattools import acf, pacf, adfuller
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Sea Level Forecast

One well known impact of climate change is rising sea levels due to melting glaciers and expaning water volume (due to warmer water). But how much is is expected to increase in the next 20 years? Let's take a look and do some forecasting.

In [None]:
pred_temps = pd.read_csv('/kaggle/input/predicted-global-temps-through-2040/PredTemps2040.csv', index_col=0)
temps = pd.read_csv('/kaggle/input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv', index_col=0)

pred_temps.index = pd.to_datetime(pred_temps.index)
temps.index = pd.to_datetime(temps.index)

temps = temps[['LandAndOceanAverageTemperature']]
temps['LOMA_Temp'] = temps.rolling(12).mean()
temps.drop('LandAndOceanAverageTemperature', axis=1, inplace=True)

exog_temps = temps.loc['1880-01-01':'2013-12-01']

temps.rename({'LOMA_Temp':'Mean_Predicted'}, axis=1, inplace=True)

temps['Upper_Mean'] = np.nan
temps['Lower_Mean'] = np.nan
temps['Highest_Bound'] = np.nan
temps['Lowest_Bound'] = np.nan
temps['SO_Mean'] = np.nan

pred_temps = pd.concat([temps.loc['2014-01-01':'2015-12-01'], pred_temps])

pred_temps['Upper_Mean'].fillna(pred_temps['Mean_Predicted'], inplace=True)
pred_temps['Lower_Mean'].fillna(pred_temps['Mean_Predicted'], inplace=True)
pred_temps['Highest_Bound'].fillna(pred_temps['Mean_Predicted'], inplace=True)
pred_temps['Lowest_Bound'].fillna(pred_temps['Mean_Predicted'], inplace=True)
pred_temps['SO_Mean'].fillna(pred_temps['Mean_Predicted'], inplace=True)

First I load the predicted temperatures from my previous notebook to use as exogenous variables. 

In [None]:
sea = pd.read_csv('/kaggle/input/sea-level-change/sea_levels_2015.csv')
sea['Time'] = pd.to_datetime(sea['Time'])
sea.set_index('Time', inplace=True)
sea = sea.resample(rule='MS').mean()
sea['GMSL_Rolling'] = sea['GMSL'].rolling(12).mean()
sea.dropna(inplace=True)

fig = px.line(sea, x=sea.index, y='GMSL_Rolling', title='Sea Level Height 1880-2013', labels={'Time':'Date', 'GMSL':'Height (mm)'})
fig.show()

Here is a graph of the twelve month rolling average of the Global Mean Sea Level height. There is a very obvious upward trend and there may be some seasonality so let's check.

In [None]:
decomp_sea = seasonal_decompose(sea['GMSL_Rolling'], period=12)

fig, ax = plt.subplots(3,1, figsize=(20, 8))
ax[0].plot(decomp_sea.seasonal)
ax[1].plot(decomp_sea.trend)
ax[2].plot(decomp_sea.resid)
ax[0].set_title('Seasonality')
ax[1].set_title('Trend')
ax[2].set_title('Residual')
plt.show()

There seems to be a negligible amount of seasonality in this time series, so I will use the ARIMA model without any seasonal components.

In [None]:
#diff1 = sea.diff().diff(12).dropna()
#sdiff = sea.diff(12).dropna()
diff = sea.diff().dropna()
fig, ax = plt.subplots(1, 1, figsize=(20,3))
#ax[0].plot(diff1.GMSL_Rolling)
#ax[1].plot(sdiff.GMSL_Rolling)
ax.plot(diff.GMSL_Rolling)

plt.show()


#result = adfuller(diff1.GMSL_Rolling)
#result2 = adfuller(sdiff.GMSL_Rolling)
result3 = adfuller(diff.GMSL_Rolling)
#adtest = pd.Series(result[0:4], index=['Test Statistic', 'p-value', 'Lags Used', 'Observations'])
#adtest2 = pd.Series(result2[0:4], index=['Test Statistic', 'p-value', 'Lags Used', 'Observations'])
adtest3 = pd.Series(result3[0:4], index=['Test Statistic', 'p-value', 'Lags Used', 'Observations'])

#print(adtest)
#print(adtest2)
print(adtest3)

Taking the first difference here makes the data stationary, so let's continue onto the ACF and PACF graphs.

In [None]:
fig, ax = plt.subplots(1,2, figsize=(20,5))

plot_acf(diff.GMSL_Rolling, lags=11, zero=False, ax=ax[0], title='Autocorrelation')
plot_pacf(diff.GMSL_Rolling, lags=11, zero=False, ax=ax[1], title='Partial Autocorrelation')

plt.show()


The PACF here looks a bit unusual. These plots seem to suggest that the model should only have AR terms, but after some experimentation with the model that didn't produce the best results. 

In [None]:
#(3,1,2) 199 aic, no seasonal component
order = (3,1,2)

sarima = SARIMAX(sea.GMSL_Rolling, order=order, exog=exog_temps.loc['1880-12-01':], trend='c')

sea_model=sarima.fit()

sea_model.summary()


Using an order of (3,1,2) gave the best AIC despite what the ACF and PACF suggested. Let's check the in-sample predictions and the diagnostics to make sure this is a good model.

In [None]:
forecast = sea_model.get_prediction(start=-240, exog=exog_temps)
fig1 = px.line(sea, x=sea.index, y=sea.GMSL_Rolling)
fig1.add_scatter(x=forecast.predicted_mean.index, y=forecast.predicted_mean, mode='lines', name='Predicted')
fig1.add_scatter(x=forecast.predicted_mean.index, y=forecast.conf_int().iloc[:,0], name='Lower Bound')
fig1.add_scatter(x=forecast.predicted_mean.index, y=forecast.conf_int().iloc[:,1], fill='tonexty', name='Upper Bound')
fig1.show()

def mape(fc, true):
    mape = np.mean((np.abs(fc-true)/np.abs(true)))
    return mape

acc = mape(forecast.predicted_mean, sea['GMSL_Rolling'].iloc[-240:])
print('Mean Absolute Percent Error:', acc)

sea_model.plot_diagnostics(figsize=(20, 10))
plt.show()

The MAPE is below 1% and the diagnostics look fine although they could be a little more normally distributed. I think this model will be fine for forecasting.

In [None]:
fc_m = sea_model.get_forecast(steps=327, exog=pred_temps['Mean_Predicted'])

print('March 2041 GMSL Prediction:', round(fc_m.predicted_mean[-1], 3), 'mm')

fig = px.line(sea, x=sea.index, y=sea.GMSL_Rolling)
fig.add_scatter(x=fc_m.predicted_mean.index, y=fc_m.predicted_mean, mode='lines', name='Predicted')
fig.add_scatter(x=fc_m.predicted_mean.index, y=fc_m.conf_int().iloc[:,0], name='Lower Bound')
fig.add_scatter(x=fc_m.predicted_mean.index, y=fc_m.conf_int().iloc[:,1], fill='tonexty', name='Upper Bound')
fig.show()

Here is my first prediction using the mean predicted temperatures as exogenous variables.  This predicts an increase in the Global Mean Sea Level by 44mm between 2014 and 2041. While this doesn't seem like a lot, keep in mind that this 44mm isn't distributed evenly across the entire ocean. Some places will see much higher rises while others could even see small decreases in sea level. This prediction is meant for a more general analysis to confirm that sea levels are indeed rising steadily and show no sign of reversing.

In [None]:
fc_h = sea_model.get_forecast(steps=327, exog=pred_temps['Highest_Bound'])

print('March 2041 GMSL Prediction:', round(fc_h.predicted_mean[-1], 3), 'mm')

fig = px.line(sea, x=sea.index, y=sea.GMSL_Rolling)
fig.add_scatter(x=fc_h.predicted_mean.index, y=fc_h.predicted_mean, mode='lines', name='Predicted')
fig.add_scatter(x=fc_h.predicted_mean.index, y=fc_h.conf_int().iloc[:,0], name='Lower Bound')
fig.add_scatter(x=fc_h.predicted_mean.index, y=fc_h.conf_int().iloc[:,1], fill='tonexty', name='Upper Bound')
fig.show()

Here is my second prediction using the highest bound values for the predicted temperatures as the exogenous variable. This resulted in a very small increase to Global Mean Sea Level in the given timescale. After doing further research[1], it seems that looking at a longer timescale (in the range of 2050-2100) would show more significant sea level changes in response to different temperatures. This suggests that this is a longer term issue, and while this predicted level of sea rise is inevitable there is still time to prevent even more disastrous levels of sea rise. 

[1] Lindsey, Rebecca. “Climate Change: Global Sea Level.” Climate.gov, NOAA, 25 Jan. 2021, www.climate.gov/news-features/understanding-climate/climate-change-global-sea-level. 

In [None]:
fc_l = sea_model.get_forecast(steps=327, exog=pred_temps['Lowest_Bound'])

print('March 2041 GMSL Prediction:', round(fc_l.predicted_mean[-1], 3), 'mm')

fig = px.line(sea, x=sea.index, y=sea.GMSL_Rolling)
fig.add_scatter(x=fc_l.predicted_mean.index, y=fc_l.predicted_mean, mode='lines', name='Predicted')
fig.add_scatter(x=fc_l.predicted_mean.index, y=fc_l.conf_int().iloc[:,0], name='Lower Bound')
fig.add_scatter(x=fc_l.predicted_mean.index, y=fc_l.conf_int().iloc[:,1], fill='tonexty', name='Upper Bound')
fig.show()


Here is my final prediction using the lowest bound values of predicted temperatures as the exogenous variable. Again, it made very little difference in the prediction compared to the previous two. Since both the highest and lowest bound temperature values made little difference, I didn't run any predictions with values in between since those would've had an even more negligible effect.

# Natural Disaster Analysis

Another well documented effect of increasing temperatures is an increased frequency and severity of natural disasters. Let's take a look at the frequency and severity of natural disasters over time and see what's changed.

In [None]:
dis_dam = pd.read_csv('/kaggle/input/natural-disaster-data/economic-damage-from-natural-disasters.csv')
dis_ct = pd.read_csv('/kaggle/input/natural-disaster-data/number-of-natural-disaster-events.csv')
deaths = pd.read_csv('/kaggle/input/global-cause-of-the-deaths-other-than-diseases/Caused of Deaths.csv')

dis_dam.drop('Code', axis=1, inplace=True)
dis_dam.rename(mapper={'Total economic damage from natural disasters (US$)':'Damage Cost'}, axis=1, inplace=True)
dis_ct.drop('Code', axis=1, inplace=True)
dis_ct.rename({'Number of reported natural disasters (reported disasters)':'Disasters'}, axis=1, inplace=True)

In [None]:
dis_types = dis_dam['Entity'].unique()
all_dis = dis_dam[dis_dam['Entity']=='All natural disasters']
brk_dis = dis_dam[dis_dam['Entity']!='All natural disasters']

brk_dis_ct = dis_ct[dis_ct['Entity']!='All natural disasters']
fig1=px.bar(brk_dis_ct, x='Year', y='Disasters', color='Entity', title='Global Natural Disaster Count by Type 1900-2018', hover_name='Entity', labels={'Disasters':'Disaster Count'})
fig1.show()

all_dc = dis_ct[dis_ct['Entity']=='All natural disasters']
all_dc['Pct_Chnge'] = all_dc['Disasters'].pct_change()
print('Mean Percent Change Per Year:', round(all_dc['Pct_Chnge'].mean(), 4))

fig2=px.bar(brk_dis, x='Year', y='Damage Cost', color='Entity', title='Global Disaster Costs by Type 1900-2018', labels={'Damage Cost':'Damage Cost (2018 USD)'})
fig2.show()

nd_deaths = deaths[deaths['Cause']=='Natural Disaster']

dby = pd.DataFrame(nd_deaths.groupby('Year')['Deaths'].sum())

fig3=px.bar(dby, x=dby.index, y='Deaths', title='Natural Disaster Deaths 1980-2017')
fig3.show()

The top graph shows the number of global natural disasters each year from 1900 to 2018. Clearly, the number of natural disasters per year has exploded since the 1960s, particularly due to increases in flooding and extreme weather (hurricanes and cyclones). In fact, since 1900, the number of natural disasters has increased by an average of 16.5% per year.

The middle graph shows the economic damage caused by natural disasters each year. This also shows a strong upward trend, but it's not as consistent since not all natural disasters are created equal, and some do far more damage than others depending on their size and where they hit. Still, an increase in disaster frequency means there will be more opportunities for these more devastating disasters to strike. Also, a greater frequency of disasters means that even years without one or a few particularly devastating disasters will be more expensive than previous years without these more devastating disasters.

The final graph shows total deaths due to natural disasters. This graph doesn't have the same kind of upward trend as the previous two, but it does have increasingly frequent spikes. This is for similar reasons as the graph of costs, as some disasters are much more deadly than other depending on location and severity, and again the increased frequency of disasters will only make spikes like the ones in 2004, 2008 and 2010 more frequent. 

Note: On the bright side, those three spikes were mainly caused by earthquakes (and an ensuing tsunami in 2004), which aren't really a result of increased temperature or CO2 emissions[2], meaning these were more random than part of a trend.

[2] Buis, Alan. “Can Climate Affect Earthquakes, Or Are the Connections Shaky? – Climate Change: Vital Signs of the Planet.” NASA, NASA, 29 Oct. 2019 

In [None]:
us_dis = pd.read_csv('/kaggle/input/us-natural-disaster-declarations/us_disasters_m5.csv')
us_dis_dec = pd.read_csv('/kaggle/input/us-natural-disaster-declarations/us_disaster_declarations.csv')


Now let's narrow our scope a bit and take a look at the United States specifically. How has this increase in natural disasters affected it and its economy?

In [None]:
#23 disaster types, 17 natural disasters
#All Biological disasters are Covid 19
nat_dis = list(us_dis_dec['incident_type'].unique())
nat_dis.remove('Human Cause')
nat_dis.remove('Terrorist')
nat_dis.remove('Biological')
nd = us_dis_dec[us_dis_dec['incident_type'].isin(nat_dis)]
nd.drop_duplicates(subset='declaration_request_number', inplace=True)

Here I filtered out diseases, terrorist attacks, and human accidents from the dataframe of US Disaster Declarations since 1953.

In [None]:
nde = nd[nd['fy_declared']<=1987]
ndl = nd[nd['fy_declared']>1987]
early_ct = nde.groupby('state')['declaration_title'].count()
late_ct = ndl.groupby('state')['declaration_title'].count()

edf = pd.DataFrame({'1953-1987 Disaster Count':early_ct})
ldf = pd.DataFrame({'1988-2021 Disaster Count':late_ct})

st_df = edf.merge(ldf, how='left', left_on=edf.index, right_on=ldf.index)
st_df['Disaster Change'] = st_df['1988-2021 Disaster Count']-st_df['1953-1987 Disaster Count']
st_df.rename({'key_0':'State'}, axis=1, inplace=True)

fig1 = px.bar(st_df, x=st_df['State'], y='Disaster Change', title='Comparing State Disaster Counts from 1953-1987 to 1988-2021')
fig1.show()

e_ct = nde.groupby('incident_type')['incident_type'].count()
l_ct = ndl.groupby('incident_type')['incident_type'].count()

e_df = pd.DataFrame({'1953-1987 Disaster Count':e_ct})
l_df = pd.DataFrame({'1988-2021 Disaster Count':l_ct})

type_df = e_df.merge(l_df, how='left', left_on=e_df.index, right_on=l_df.index)
type_df['Disaster Change'] = type_df['1988-2021 Disaster Count']-type_df['1953-1987 Disaster Count']
type_df.rename({'key_0':'Disaster Type'},axis=1, inplace=True)

fig2=px.bar(type_df, x='Disaster Type', y='Disaster Change', title='Comparing Disaster Counts from 1953-1987 to 1988-2021 by Type')
fig2.show()

yr_ct = nd.groupby('fy_declared')['state'].count() 
ydf = pd.DataFrame({'Disaster Count':yr_ct})

fig3=px.line(ydf[:-1], x=ydf[:-1].index, y='Disaster Count', labels={'x':'Year'}, title='Yearly US Disaster Declarations')
fig3.show()

print('Cumulative Disaster Count 1953-1995:', ydf.loc[:1995]['Disaster Count'].sum())
print('Cumulative Disaster Count 1996-2020:', ydf.loc[1996:]['Disaster Count'].sum())

The top graph shows the change in number of natural disasters for each state between the first 34 years of data (1953-1987) and the second 34 years (1988-2021). Every state had a positive difference, with California and Texas having the largest increases, both with over 200 more natural disasters in the second period. This is somewhat expected as they are two of the largest and most populated states, but the fact that every single state saw an increase reinforces the earlier conclusion that natural disasters are getting more frequent. 

The middle graph shows the change in disaster type between the two time periods. Fires, hurricanes, and severe storms increased massively, while surprisingly floods decreased a little. Still, the significant increase in the other three are a direct result of increased global temperatures. 

The final graph shows the number of US disaster declarations per year. This just reinforces again that disaster frequency in increasing rapidly. Prior to 1996, the US only declared at least 50 disasters in a year seven times. In the 25 years since, they've only declared fewer than 100 four times.

Next, let's take a look at a map of where disasters are the most frequent.

In [None]:
txfires = nd[(nd['state']=='TX')&(nd['designated_area']=='Statewide')&(nd['incident_type']=='Fire')]
txfires_new = txfires.drop_duplicates(subset='declaration_date')

nd = nd[~nd.isin(txfires)].dropna()
nd = pd.concat([nd, txfires_new])
nd['fips'] = nd['fips'].astype({'fips':'int'})



In [None]:
cnt_grp = nd[nd['designated_area']!='Statewide']
st_grp = nd[nd['designated_area']=='Statewide']
cnties = cnt_grp.groupby(['fips', 'designated_area']).count().sort_values('state', ascending=False) #'fy_declared',
sts = st_grp.groupby(['fips']).count().sort_values('state', ascending=False) #'fy_declared',

cdf = pd.DataFrame({'Disaster Count':cnties['state']})
cdf.reset_index(inplace=True)
cdf['fips'] = cdf['fips'].astype({'fips':'string'})
#cdf['fy_declared'] = cdf['fy_declared'].astype({'fy_declared':'int'})

sdf = pd.DataFrame({'Disaster Count':sts['state']})                  
sdf.reset_index(inplace=True)
sdf['fips'] = sdf['fips'].astype({'fips':'string'})
#sdf['fy_declared'] = sdf['fy_declared'].astype({'fy_declared':'int'})


In [None]:
def concat(s1):
    if len(s1) == 4:
        s1 = '0'+s1
        
    return s1

cdf['fips'] = cdf['fips'].apply(concat)
sdf['fips'] = sdf['fips'].apply(concat)

cdf['state'] = cdf['fips'].str[:2]
sdf['state'] = sdf['fips'].str[:2]


In [None]:
codes = pd.read_csv('/kaggle/input/zipcodes-county-fips-crosswalk/ZIP-COUNTY-FIPS_2017-06.csv')

fips = codes.groupby(['STCOUNTYFP', 'COUNTYNAME']).first()

fips.reset_index(inplace=True)

fdf = pd.DataFrame({'Fips':fips['STCOUNTYFP'], 'County':fips['COUNTYNAME']})
fdf['Fips'] = fdf['Fips'].astype({'Fips':'string'})
fdf['Fips'] = fdf['Fips'].apply(concat)

fdf = fdf.merge(cdf, how='left', left_on='Fips', right_on='fips')

fdf['Disaster Count'].fillna(0, inplace=True)
fdf.drop(['designated_area', 'fips'], axis=1, inplace=True)
fdf['state'].fillna(fdf['Fips'].str[:2], inplace=True)

ddf = fdf.merge(sdf, how='left', on=['state'])

ddf['Total_Count'] = ddf['Disaster Count_x']+ddf['Disaster Count_y']
#ddf.drop(['Disaster Count_x', 'state', 'fips','Disaster Count_y'], axis=1, inplace=True)

In [None]:
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)


In [None]:
fig = px.choropleth(ddf, geojson=counties, locations='Fips', color='Total_Count',
                           color_continuous_scale="reds",
                           range_color=(0, 70),
                           scope="usa",
                           labels={'Total_Count':'Natural Disaster Count'}, 
                           hover_name='County', title='Natural Disaster Declarations By County Since 1953')
fig.show()

Here is a map of disaster count by county. Texas has the most disasters, although their count is largely driven by two historically bad wildfire seasons in 1996 and 1998. Still, this shows that the places most susceptible to disasters are Texas, Florida, Oklahoma, and the Pacific Coast.

In [None]:
fl_cnt_grp = cnt_grp[cnt_grp['incident_type']=='Flood']
fl_st_grp = st_grp[st_grp['incident_type']=='Flood']

fl_cnties = fl_cnt_grp.groupby(['fips', 'designated_area']).count().sort_values('state', ascending=False)
fl_sts = fl_st_grp.groupby('fips').count().sort_values('state', ascending=False)

fl_cdf = pd.DataFrame({'Flood Count':fl_cnties['state']})
fl_cdf.reset_index(inplace=True)
fl_cdf['fips'] = fl_cdf['fips'].astype({'fips':'string'})

fl_sdf = pd.DataFrame({'Flood Count':fl_sts['state']})                  
fl_sdf.reset_index(inplace=True)
fl_sdf['fips'] = fl_sdf['fips'].astype({'fips':'string'})

fl_cdf['fips'] = fl_cdf['fips'].apply(concat)
fl_sdf['fips'] = fl_sdf['fips'].apply(concat)

fl_cdf['state'] = fl_cdf['fips'].str[:2]
fl_sdf['state'] = fl_sdf['fips'].str[:2]

fl_fdf = pd.DataFrame({'Fips':fips['STCOUNTYFP'], 'County':fips['COUNTYNAME']})
fl_fdf['Fips'] = fl_fdf['Fips'].astype({'Fips':'string'})
fl_fdf['Fips'] = fl_fdf['Fips'].apply(concat)

fl_fdf = fl_fdf.merge(fl_cdf, how='left', left_on='Fips', right_on='fips')

fl_fdf['Flood Count'].fillna(0, inplace=True)
fl_fdf.drop(['designated_area', 'fips'], axis=1, inplace=True)
fl_fdf['state'].fillna(fl_fdf['Fips'].str[:2], inplace=True)

for i in fl_fdf['state'].unique():
    if i not in fl_sdf['state'].unique():
        row = {'fips':0, 'Flood Count':0, 'state':str(i)}
        rdf = pd.DataFrame(row, index=[0])
        
        fl_sdf = pd.concat([fl_sdf, rdf])

fl_ddf = fl_fdf.merge(fl_sdf, how='left', on='state')

fl_ddf['Total_Count'] = fl_ddf['Flood Count_x']+fl_ddf['Flood Count_y']
fl_ddf.drop(['Flood Count_x', 'state', 'fips','Flood Count_y'], axis=1, inplace=True)


In [None]:
fig = px.choropleth(fl_ddf, geojson=counties, locations='Fips', color='Total_Count',
                           color_continuous_scale="blues",
                           range_color=(0, 10),
                           scope="usa",
                           labels={'Total_Count':'Flood Count'}, 
                           hover_name='County', title='Flood Disaster Declarations By County Since 1953')
fig.show()

Some of the disaster counts were drowned out by the signifcant amount of wildfires in the previous graph, so I broke it down into two other map plots of the next most common disasters, floods and hurricanes/severe storms. Here is a graph of floods by county. The Pacific Coast, states along the Mississippi River, and West Virginia have the highest susceptibility to flooding. This is a problem that will only get worse in Florida, and along the Atlantic Coast as sea levels continue to rise.

In [None]:
st_cnt_grp = cnt_grp[(cnt_grp['incident_type']=='Hurricane')|(cnt_grp['incident_type']=='Severe Storm(s)')|(cnt_grp['incident_type']=='Coastal Storm')|(cnt_grp['incident_type']=='Typhoon')] #
st_st_grp = st_grp[(st_grp['incident_type']=='Hurricane')|(st_grp['incident_type']=='Severe Storm(s)')|(st_grp['incident_type']=='Coastal Storm')|(st_grp['incident_type']=='Typhoon')]

st_cnties = st_cnt_grp.groupby(['fips', 'designated_area']).count().sort_values('state', ascending=False)
st_sts = st_st_grp.groupby('fips').count().sort_values('state', ascending=False)

st_cdf = pd.DataFrame({'Storm Count':st_cnties['state']})
st_cdf.reset_index(inplace=True)
st_cdf['fips'] = st_cdf['fips'].astype({'fips':'string'})

st_sdf = pd.DataFrame({'Storm Count':st_sts['state']})                  
st_sdf.reset_index(inplace=True)
st_sdf['fips'] = st_sdf['fips'].astype({'fips':'string'})

st_cdf['fips'] = st_cdf['fips'].apply(concat)
st_sdf['fips'] = st_sdf['fips'].apply(concat)

st_cdf['state'] = st_cdf['fips'].str[:2]
st_sdf['state'] = st_sdf['fips'].str[:2]

st_fdf = pd.DataFrame({'Fips':fips['STCOUNTYFP'], 'County':fips['COUNTYNAME']})
st_fdf['Fips'] = st_fdf['Fips'].astype({'Fips':'string'})
st_fdf['Fips'] = st_fdf['Fips'].apply(concat)

st_fdf = st_fdf.merge(st_cdf, how='left', left_on='Fips', right_on='fips')

st_fdf['Storm Count'].fillna(0, inplace=True)
st_fdf.drop(['designated_area', 'fips'], axis=1, inplace=True)
st_fdf['state'].fillna(st_fdf['Fips'].str[:2], inplace=True)

for i in st_cdf['state'].unique():
    if i not in st_sdf['state'].unique():
        row = {'fips':0, 'Storm Count':0, 'state':str(i)}
        rdf = pd.DataFrame(row, index=[0])
        
        st_sdf = pd.concat([st_sdf, rdf])

st_ddf = st_fdf.merge(st_sdf, how='left', on='state')

st_ddf['Total_Count'] = st_ddf['Storm Count_x']+st_ddf['Storm Count_y']
st_ddf.drop(['Storm Count_x', 'state', 'fips','Storm Count_y'], axis=1, inplace=True)

In [None]:
fig = px.choropleth(st_ddf, geojson=counties, locations='Fips', color='Total_Count',
                           color_continuous_scale="greens",
                           range_color=(0, 10),
                           scope="usa",
                           labels={'Total_Count':'Storm Count'}, 
                           hover_name='County', title='Storm Disaster Declarations By County Since 1953')
fig.show()

Finally here is a graph of hurricanes and severe storms by county. States along the Gulf of Mexico and the East Coast are the most suceptible to these disasters. Floods also often accompany these disasters in the form of a storm surge, which again will only be made worse as sea levels continue to rise.

In [None]:
bdd = pd.read_csv('/kaggle/input/billion-dollar-us-natural-disasters-19802021/BillionDollarDisasters1980-2021.csv')

bdd.iloc[178,2] = '2012-01-01'
bdd['Begin Date'] = pd.to_datetime(bdd['Begin Date'])
bdd['End Date'] = pd.to_datetime(bdd['End Date'])
bdd.set_index('Begin Date', inplace=True)
bdd['Year'] = pd.DatetimeIndex(bdd.index).year
bdd.rename(mapper={'Total CPI-Adjusted Cost (Millions of Dollars)':'Cost_Mil'}, axis=1, inplace=True)
bdd = bdd.loc[:'2020-12-31']
bdd['Cost_Mil'] = bdd['Cost_Mil'].astype('float')

For the last part of this analysis we will look at the financial impact natural disasters have had on the US.

In [None]:
yr_dmg = bdd.groupby('Year')['Cost_Mil'].sum()
yr_cnt = bdd.groupby('Year')['Cost_Mil'].count()

dddf = pd.DataFrame({'Cost (Mil)':yr_dmg})
cddf = pd.DataFrame({'Count':yr_cnt})

fig2 = px.line(cddf, x=cddf.index, y=cddf['Count'], title='Number of Billion Dollar US Natural Disasters', labels={'Count':'Number of Disasters'})
fig2.show()

fig = px.line(dddf, x=dddf.index, y=dddf['Cost (Mil)'], title='Total Cost of Billion Dollar Natural Disasters', labels={'Cost (Mil)':'Cost (In Millions of 2020 USD)'})
fig.show()

The top graph shows the number of disasters each year sicne 1980 that have cost at least one billion dollars. This shows a clear upward trend, just like the earlier graph of the total number of natural disasters worldwide. Not only are disasters becoming more freuqent, but *expensive* disasters are becoming more frequent.

The bottom graph shows the cumulative cost of these disasters each year in 2020 US Dollars. Again, like the previous graph of costs, there is a less clear upward trend but signifcantly more spikes, meaning that the more frequent disasters also increase the potential for particularly devastating ones. The top 10 most expensive disasters are listed on the table below. It's worth noting that this is slightly outdated since the 2021 winter storm in Texas has already cost $195 Billion, making it the single most expensive disaster in US history.

In [None]:
bdd.sort_values('Cost_Mil', ascending=False).head(10)

The number of natural disasters each year nearly follows a Poisson Distribution, although the variance and mean are not equal so it's not a true Poisson Distribution. This makes forecasting harder than other types of data like temperature, CO2, or sea levels. However, the trend is still clear. Billion dollar disasters are becoming more frequent, and 7 of the 10 most expensive ever have occurred since 2005. Some areas are more susceptible than others, especially the Pacific, Gulf, and East coasts. Rising temperatures are not only wreaking havoc on the environment, but also our collective wallets, and the recent data suggests this problem is about to only get worse.

# Part III

For the final part of my climate analysis to see the impact of increasing renewable energy, check out [Part III](https://www.kaggle.com/jedbell/climate-analysis-iii-what-now)