In [None]:
import pandas as pd
df = pd.read_csv(filepath_or_buffer='/kaggle/input/tornados/tornados.csv', parse_dates=['datetime_utc'])
df.head()

In [None]:
df.info()

In [None]:
from plotly.express import bar
bar(data_frame=df['st'].value_counts().to_frame().reset_index(), x='st', y='count')

In [None]:
bar(data_frame=df['yr'].value_counts().to_frame().reset_index(), x='yr', y='count')

It really looks like the number of tornadoes observed annually is increasing. Let's add a linear model and see what it tells us.

In [None]:
from plotly.express import scatter
scatter(data_frame=df['yr'].value_counts().to_frame().reset_index(), x='yr', y='count', trendline='ols')

Our linear model has an r2 of 0.62 and a slope of 12.4. An average of 12.4 more tornados per year seems like a lot.

In [None]:
from plotly.express import scatter_geo
scatter_geo(data_frame=df.sample(n=len(df) // 10), lat='slat', lon='slon', color='st', scope='usa')

If we take a random sample and plot by the state we see that the density of tornadoes varies significantly from state to state, and there are relatively few west of the Rockies.

In [None]:
scatter_geo(data_frame=df.sample(n=len(df) // 10), lat='slat', lon='slon', color='mo', scope='usa')

In [None]:
from plotly.express import histogram
histogram(data_frame=df, x='mo', )

Our data exhibits substantial monthly seasonality, meaning that there really is a thing called Tornado Season.

In [None]:
scatter_geo(data_frame=df.sample(n=len(df) // 10), lat='slat', lon='slon', color='mo', scope='usa')

Also, Tornado Season does not appear to occur everywhere in the same way.

In [None]:
histogram(data_frame=df, x='mag', )

Magnitude looks like a linear scale, and not surprisingly less severe tornadoes dominate our data.

In [None]:
scatter_geo(data_frame=df.sample(n=len(df) // 10), lat='slat', lon='slon', color='mag', scope='usa')

At first glance the more severe tornados seem to be randomly distributed.

In [None]:
scatter_geo(data_frame=df.sample(n=len(df) // 10), lat='slat', lon='slon', color='stf', scope='usa')

STF is a numerical value for the states in alphabetical order, so when we plot it we see the state map again, but with our continuous color scheme.

In [None]:
inj_df = df[df['inj'] > 0]
scatter_geo(data_frame=inj_df, lat='slat', lon='slon', color='inj', scope='usa')

In [None]:
from plotly.express import histogram
for column in ['inj', 'fat', 'loss', 'len', 'wid', 'ns', 'sn', 'f1', 'f2', 'f3', 'f4']:
    histogram(data_frame=df, x=column, log_y=(column in {'inj', 'fat', 'loss', 'len', 'wid'})).show()