In [1]:
import pandas as pd
from warnings import filterwarnings
filterwarnings(action='ignore', category=FutureWarning)

EVENTS = '/kaggle/input/database-of-volcanic-eruptions/volcano-events.csv'
df = pd.read_csv(filepath_or_buffer=EVENTS)
df = df.tail(n=len(df)-1)

df.head()

Unnamed: 0,Year,Month,Day,Name,Location,Country,Latitude,Longitude,Elevation (m),Type,...,Total Deaths,Total Death Description,Total Missing,Total Missing Description,Total Injuries,Total Injuries Description,Total Damage ($Mil),Total Damage Description,Total Houses Destroyed,Total Houses Destroyed Description
1,-4360.0,,,Macauley,Kermadec Is,New Zealand,-30.21,-178.475,238.0,Caldera,...,,,,,,,,,,
2,-4350.0,,,Kikai,Ryukyu Is,Japan,30.793,130.305,704.0,Caldera,...,,3.0,,,,,,3.0,,3.0
3,-4050.0,,,Masaya,Nicaragua,Nicaragua,11.985,-86.165,594.0,Caldera,...,,,,,,,,,,
4,-4000.0,,,Witori,New Britain-SW Pac,Papua New Guinea,-5.576,150.516,724.0,Caldera,...,,1.0,,,,,,1.0,,
5,-3580.0,,,Taal,Luzon-Philippines,Philippines,14.002,120.993,311.0,Stratovolcano,...,,,,,,,,,,


In [2]:
from plotly.express import histogram
histogram(data_frame=df, x='Year', log_y=True)

For obvious reasons almost all of our data is from the 19th, 20th, or 21st centuries.

In [3]:
from plotly.express import scatter_mapbox
from plotly.colors import sequential

scatter_mapbox(data_frame=df, center={'lat': 0, 'lon': 0}, lat='Latitude', lon='Longitude', color='Type', mapbox_style='open-street-map', zoom=1, height=900).show()
scatter_mapbox(data_frame=df, center={'lat': 0, 'lon': 0}, lat='Latitude', lon='Longitude', color='VEI', mapbox_style='open-street-map', zoom=1, height=900, color_continuous_scale=sequential.Magma).show()

Because VEI is a logarithmic scale we expect to see exponentially more low-VEI eruptions than high-VEI eruptions, and almost no VEI 8 or 9 eruptions. Is that in fact what we see?

In [4]:
histogram(data_frame=df, x='VEI')

We see almost no earthquakes above VEI 6, suggesting that the scale is sufficient to accomodate global catastrophes; but the rest of the distribution is a little surprising; it suggests that most earthquakes below VEI 3 go unnoticed, or rather that most small earthquakes simply do not register on current detectors.

We expect the damage data to be somewhat correlated with VEI, although the different kinds of damages have other variables embedded in them that we can't observe directly, so the correlation will vary by the type of damage.

In [5]:
from plotly.express import scatter
from plotly.colors import sequential

scatter(data_frame=df, x='Deaths', y='Total Deaths', log_x=True, log_y=True, color='VEI', color_continuous_scale=sequential.Magma)

There is probably a good reason why this dataset has figures for both deaths and total deaths, but the two very rarely differ.

In [6]:
from plotly.express import imshow
from plotly.colors import sequential
imshow(img=df.drop(columns=[column for column in df.columns if 'Description' in column]).corr(numeric_only=True),
       height=900, color_continuous_scale=sequential.Magma)

Not surprisingly we see that the damage measures are highly correlated and everything else is moderately correlated. Maybe it is surprising that VEI is more highly correlated with missing than deaths.