# Tornadoes [1950-2022]

Source: https://www.kaggle.com/datasets/sujaykapadnis/tornados



### Scenario:
- We are hired by NOAA to investigate the US tornadic activity from historical data, from 1950 to today.

### Objectives:
- Understand the historical trends of tornadoes. This includes examination of qualities such as severity, duration, and geographical area affected.
- Determine how these qualities have changed over time by comparing old figures to those from present day.
- Make statement(s) about the pattern(s) in tornadic activity we find. We hope to use any found information to make recommendations for next steps in more robust efforts to quantify future expectations of tornadic activity, such as predictive modeling and forecasting.  
- 
-

### Questions:
1. **Tornado severity**
- Has the length of paths changed? (look at both the average, and median)
- Has the severity (on F/EF scale) changed?

2. **Tornado freqency**
- Has tornado frequency increased over time?
- Has the freqency of more severe tornados increased over time?

3. **Tornado zone** (Optional)
- Has "tornado alley" increased/decreased in size over time? (look at "states effected")
- And/or has this area moved?  



### Data Acquisition
`Tornadoes.csv`

In [None]:
import pandas as pd

tornadoes = pd.read_csv('tornados.csv', index_col=None)
tornadoes.head()

In [None]:
tornadoes.info()

variables:

- `om`: Integer - Tornado number. Effectively an ID for this tornado in this year.
- `yr`: Integer - Year, 1950-2022.
- `mo`: Integer - Month, 1-12.
- `dy`: Integer - Day of the month, 1-31.
- `date`: Date - Date.
- `time`: Time - Time.
- `tz`: Character - Canonical tz database timezone.
- `datetime_utc`: Datetime - Date and time normalized to UTC.
- `st`: Character - Two-letter postal abbreviation for the state (DC = Washington, DC; PR = Puerto Rico; VI = Virgin Islands).
- `stf`: Integer - State FIPS (Federal Information Processing Standards) number.
- `mag`: Integer - Magnitude on the F scale (EF beginning in 2007). Some of these values are estimated (see fc).
- `inj`: Integer - Number of injuries. When summing for state totals, use sn == 1 (see below).
- `fat`: Integer - Number of fatalities. When summing for state totals, use sn == 1 (see below).
- `loss`: Double - Estimated property loss information in dollars. Prior to 1996, values were grouped into ranges. The reported number for such years is the maximum of its range.
- `slat`: Double - Starting latitude in decimal degrees.
- `slon`: Double - Starting longitude in decimal degrees.
- `elat`: Double - Ending latitude in decimal degrees.
- `elon`: Double - Ending longitude in decimal degrees.
- `len`: Double - Length in miles.
- `wid`: Double - Width in yards.
- `ns`: Integer - Number of states affected by this tornado. 1, 2, or 3.
- `sn`: Integer - State number for this row. 1 means the row contains the entire track information for this state, 0 means there is at least one more entry for this state for this tornado (om + yr).
- `f1`: Integer - FIPS code for the 1st county.
- `f2`: Integer - FIPS code for the 2nd county.
- `f3`: Integer - FIPS code for the 3rd county.
- `f4`: Integer - FIPS code for the 4th county.
- `fc`: Logical - Was the mag column estimated?

### Initial Data Exploration
1. check dataset shape
2. check data types
3. obtain basic statistics for quantitative/numeric columns

In [None]:
# Get the size of the dataset
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nThere are " + str(tornadoes.shape[1]) + " columns and " + str(tornadoes.shape[0]) + " rows.\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n")

In [None]:
# Display columns in tornadoes
tornadoes.columns

In [None]:
# Display dtypes of all variables
tornadoes.dtypes

In [None]:
# Display descriptive stats for numerical columns
tornadoes.describe().round(3)

### Dataset Size: 
- Number of rows (tornadoes): 68,693
- Number of columns (variables): 21

### Data Types:

- **Numerical Data (int64 or float64)**: `om`, `yr`, `mo`, `dy`, `stf`, `mag`, `inj`, `fat`, `loss`, `slat`, `slon`, `elat`, `elon`, `len`, `wid`, `ns`, `sn`, `f1`, `f2`, `f3`, `f4`
- **Categorical Data (object)**: `date`, `time`, `tz`, `datetime_utc`, `st`
- **Boolean Data (bool)**: `fc`

### Basic Statistics:
For the numerical columns:
- `om`: Ranges from 1 to 622,080.
- `yr`: Ranges from 1950 to 2022.
- `mo`: Ranges from 1 to 12.
- `dy`: Ranges from 1 to 31.
- `stf`: Ranges from 1 to 78.
- `mag`: Ranges from 0 to 5.
- `inj`: Ranges from 0 to 1,740.
- `fat`: Ranges from 0 to 158.
- `loss`: Ranges from $50 to $2,800,100,000.
- `slat`: Ranges from 17.7212 to 61.02.
- `slon`: No specific range provided.
- `elat`: Has a minimum value of 0.
- `elon`: No specific range provided.
- `len`: Ranges from 0 to 234.7 miles.
- `wid`: Ranges from 0 to 4,576 yards.
- `ns`: Ranges from 1 to 3.
- `sn`: Ranges from 0 to 1.
- `f1`: Has a maximum value of 810.
- `f2`: Has a maximum value of 820.
- `f3`: Has a maximum value of 710.
- `f4`: Has a maximum value of 507.

# Data Cleansing: 
1. identify **missing values** and decide whether to impute, fill, or drop.

2. check for and remove **duplicates**.

3. ensure that each column is of the correct **data type**, and convert if not.

4. look for **outliers** using statistical methods or visualization.  

5. **standardization**: if necessary

In [None]:
# Check for missing values
missing_values = tornadoes.isnull().sum()

# Calculate missing value counts
missing_values = pd.DataFrame(missing_values[missing_values > 0])

# Calculate missing percent of values for each column
missing_pct = ((missing_values/tornadoes.shape[0]*100).round(3))

# Add missing count and percent to table
missing_values["% Missing"] = missing_pct
missing_values.rename(columns={0: 'Count Missing'}, inplace=True)
missing_values

In [None]:
# Visualize missing values
import missingno as msno
msno.matrix(tornadoes)

columns with missing values:
- **mag**: 756 missing values, or 1.101%
- **loss**: 27170 missing values, or 38.553%

how we will handle these:
- **mag**: Drop rows with missing values since it only accounts for about 1% of data.
- **loss**: Dropping this variable since it isn't needed.

In [None]:
# Drop NA values from mag.
tornadoes.dropna(subset=['mag'], inplace=True)

# Drop loss variable.
tornadoes.drop(['loss'], axis=1, inplace=True)

In [None]:
tornadoes.info()

now that missing values have been handled, we'll check for duplicates. 

In [None]:
# Check for duplicate rows
duplicate_rows = tornadoes.duplicated().sum()
print("There are " + str(duplicate_rows) + " duplicate row(s).\n")

# Get a boolean series indicating which rows are duplicates (including the original rows)
duplicate_mask_all = tornadoes.duplicated(keep=False)

# Use this mask to filter and display both the original and duplicate rows
duplicate_rows_all_df = tornadoes[duplicate_mask_all]
duplicate_rows_all_df

In [None]:
# Remove duplicate rows, if any
if duplicate_rows > 0: 
    tornadoes.drop_duplicates(inplace=True)

# Verify that the removal worked
duplicate_rows = tornadoes.duplicated().sum()
print("There are " + str(duplicate_rows) + " duplicate row(s).\n")

now let's determine if any data types need conversion.

In [None]:
# Display data types for each column
tornadoes.dtypes

we might consider changing `date` (the date), `time` (the time), and `datetime_utc` (the date and time normalized to UTC) from objects to DateTime. But we can wait to do this until we know we need these columns

# Exploratory Data Analysis

### 1. quick feature creation to aid in analysis
**create `region` by mapping states to cultural region**  

**create `decade` by organizing years into decades**  

**create categories which classify each tornado based on levels of its `wid`, `len`**  
*> This is to get a more well-rounded idea of tornado severity rather than relying only on `mag`*  
*> current labels are `wid_level` for `wid`, `Track Length` for length. I need to change these names so they are easier to understand*

**create `in_alley` column to classify each tornado as either "inside" or "outside" tornado alley**

**create subset dataframes for each decade**  

**create other features as needed**


### 2. visualization
**visualize simple frequency of tornadoes over time.**  
*> this will let us see and discuss the overall trend over time.*  
*> then we can break this down further into categories of severity, using `mag` and the newly-created categorical variables `wid_level` and `Track Length`*  
*> then break the overall frequency down into inside/outside tornado alley using `in_alley` so we can see how the overall frequency is changing in these locations over time.*  

**what is the frequency of tornadoes across states and regions? what about for larger (higher mag, larger wid, longer len) tornadoes?**  
*> bar charts displaying frequencies in regions and states*  
*> like above, can do further break-down into the severity trends*  

**of which months in the year are tornadoes most frequent? what about for larger (higher mag, larger wid, longer len) tornadoes?**  
*> we could analyze the seasonality of tornadoes over time using `mo`. the hypothesis might be that "tornado season" is increasing (there is an increase in months in which tornadoes are common)*   
*> if we figure out how to do the above, we could then see how the further break-down looks across different severity levels*  

**other indicators**  
*> injuries, fatalities as indicators*  

**geographical analysis**  
*> how does geographical distribution of all tornadoes look like over the decades?*  
*> what about across severity levels?*  
*> use heatmaps to highlight hotspots*  

### 3. statistical analysis
**visualize descriptive statistics of numerical columns**

### 4. relationships between variables
**determine whether correlations exist** 

**do any variables exhibit linear relationships?**    

**do we see an uptick in tornado frequency across decades?**  

**do we see an uptick in tornado intensity across decades?**  

**do we see an uptick in tornado frequency in certain regions across decades?**  

**do we see an uptick in injuries, fatalities, or property loss (factors) across decades (groups)?**  

**do we see any change in the mean latitudes and longitudes across decades? (this would indicate a shift in the tornado zone)**  

**look for nonlinear relationships**  

### 1. quick feature creation
**create `region`:**  
below we establish a dictionary for all 50 possible us states. then, iterate thru `st` and classify each state into a region by postal code.  

In [None]:
# Dictionary to map the 50 US states to regions
states_and_regions = {
    'ME': 'New England', 'NH': 'New England', 'VT': 'New England', 'MA': 'New England', 
    'RI': 'New England', 'CT': 'New England',
    
    'NY': 'Mid-Atlantic', 'NJ': 'Mid-Atlantic', 'PA': 'Mid-Atlantic', 
    'DE': 'Mid-Atlantic', 'MD': 'Mid-Atlantic', 'DC': 'Mid-Atlantic', 
    
    'VA': 'Upper South', 'WV': 'Upper South', 'KY': 'Upper South', 'TN': 'Upper South', 
    'NC': 'Upper South', 'AR': 'Upper South', 'MO': 'Upper South',
    
    'OK': 'Deep South', 'SC': 'Deep South', 'GA': 'Deep South', 'AL': 'Deep South', 
    'MS': 'Deep South', 'LA': 'Deep South', 'FL': 'Deep South',
    
    'TX': 'Southwest', 'AZ': 'Southwest', 'NM': 'Southwest', 
    
    'CA': 'Pacific', 'NV': 'Pacific', 'OR': 'Pacific', 'WA': 'Pacific', 'HI': 'Pacific', 
    
    'AK': 'Pacific Northwest',
    
    'OH': 'Midwest', 'MI': 'Midwest', 'IN': 'Midwest', 'IL': 'Midwest', 'WI': 'Midwest',
    'MN': 'Midwest', 'IA': 'Midwest', 'NE': 'Midwest', 'SD': 'Midwest', 'ND': 'Midwest', 
    'KS': 'Midwest',
    
    'MT': 'Mountain West', 'ID': 'Mountain West', 'WY': 'Mountain West', 
    'CO': 'Mountain West', 'UT': 'Mountain West'
}

# Map the states to a region in new column
tornadoes['region'] = tornadoes['st'].map(states_and_regions)

**create `decade`:**  
this will iterate thru `yr` and attach the categorical decade label in new column called `decade`. in the format e.g. "1950-1959"

In [None]:
# classifies each row's `yr` into new column of objects, `decade`
tornadoes['decade'] = tornadoes['yr'].apply(lambda x: f"{x//10*10}-{x//10*10+9}")

 **create `wid_level`**  
 this will iterate through `wid`, a continuous variable, to classify tornado widths based on three classes tornadoes can be organized into.  
 *get source for class info.*  

In [None]:
# Create new column that classifies each tornado by width
def wid_categories(wid):
    if wid < 50:
        return 'Narrow (< 50 Yards)'
    elif wid >= 50 and wid < 500:
        return 'Medium (50 - 500 Yards)'
    elif wid >= 500:
        return 'Large (>= 500 Yards)'

tornadoes['wid_level'] = tornadoes['wid'].apply(wid_categories)
tornadoes['wid_level']

**create `wid_category`**  
*forgot if still needed this. might delete.*  

In [None]:
# Create new column that classifies each tornado by width
def wid_categories2(wid):
    if wid < 50:
        return 'Narrow'
    elif wid >= 50 and wid < 500:
        return 'Medium'
    elif wid >= 500:
        return 'Large'

tornadoes['wid_category'] = tornadoes['wid'].apply(wid_categories2)
tornadoes['wid_category']

**create `Track Length`** 
iterate thru `len` and classify into three classes based on tornado track length (the path in which the tornado takes while it is touching the ground).  
 *get source for class info.*  
 *also consider changing name to be more like wid_length OR vice versa.*  

In [None]:
# Create new column that classifies each tornado by length
def len_categories(len):
    if len < 10:
        return 'Short_Track (< 10 Miles)'
    elif len >= 10 and len < 50:
        return 'Medium-Track (10 - 50 Miles)'
    elif len >= 50:
        return 'Long-Track (>= 50 Miles)'

tornadoes['Track Length'] = tornadoes['len'].apply(len_categories)
tornadoes['Track Length']

**create `in_alley`**  
using the inside of longitudes -105 and -95 as a general 'tornado alley', classifies each tornado based on it's `slon` (starting longitude) as either 'inside' or 'outside' tonrado alley  
*get source for this info.*  

In [None]:
# Create new column that classifies each tornado by its location (inside tornado alley, outside tornado alley)
def inside_outside(slon):
    if slon >= -105 and slon <= -95:
        return 'Inside Tornado Alley'
    else:
        return 'Outside Tornado Alley'

tornadoes['in_alley'] = tornadoes['slon'].apply(inside_outside)
tornadoes['in_alley']

**create subset dfs, `tornadoes_1950s` thru `tornadoes_2010s`**  
*might use for examining seasonality and/or geographical representations*  

In [None]:
# Create subsets of tornadoes for each decade
tornadoes_1950s = tornadoes.query("decade == 1950")
tornadoes_1960s = tornadoes.query("decade == 1960")
tornadoes_1970s = tornadoes.query("decade == 1970")
tornadoes_1980s = tornadoes.query("decade == 1980")
tornadoes_1990s = tornadoes.query("decade == 1990")
tornadoes_2000s = tornadoes.query("decade == 2000")
tornadoes_2010s = tornadoes.query("decade == 2010")

### 2. Visualization
**Tornado frequencies over time**  
*Below, we can see that the overall count of all tornadoes is increasing over time.*  

*Then, we break this down into those tornadoes inside and outside tornado alley. It appears that tornadoes are increasing in frequency outside tornado alley. this suggests that either the area we consider "tornado alley" is either shifting or widening (tornadoes are spilling out).*  

*Using different width classifications, we notice that different widths of tornadoes appear to be increasing at different rates. we definitely see an increased frequency of medium and large width tornadoes, with narrow ones remaining the same or decreaing. This might suggest that overall, tornados are getting bigger.*  

*We also do this with tornado track lengths. it seems that short-track and medium-track tornadoes are increasing in frequency, but long-track ones don't seem to be changing much.*  

**all tornadoes:**

In [None]:
# Visualize simple frequencies of all tornadoes over time
import matplotlib.pyplot as plt
import seaborn as sns

# Create figure
sns.set_style('darkgrid')
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.histplot(x='yr', data=tornadoes, color='black', ax=ax, bins=12)
ax.set_title('Tornado Frequency over time (1950-2022)', weight='bold')

**by location (in/out of tornado alley):**

In [None]:
sns.set_style('darkgrid')
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.histplot(x='yr', data=tornadoes, hue='in_alley', ax=ax, bins=12)
ax.set_title('Tornado frequencies (inside, outside tornado alley) over Time (1950-2022)', weight='bold')

**by width:**

In [None]:
# Histogram of the number of different sized tornadoes over time 

# Filter tornadoes by width 
high_wid = tornadoes[tornadoes['wid'] >= 500]
med_wid = tornadoes[(tornadoes['wid'] >= 50) & (tornadoes['wid'] < 500)]
low_wid = tornadoes[tornadoes['wid'] < 50]

sns.set_style('darkgrid')
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.histplot(x='yr', data=low_wid, color="Blue", ax=ax, bins=12)
sns.histplot(x='yr', data=med_wid, color="Yellow", ax=ax, bins=12)
sns.histplot(x='yr', data=high_wid, color="Green", ax=ax, bins=12)
ax.set_title('Tornado Frequencies over Time (1950-2022) and widths', weight='bold')
ax.set_xlabel('Year', weight='bold')
ax.set_ylabel('Number of tornadoes', weight='bold')

legend_labels=['Narrow (< 50 Yards)', 'Medium (50 - 500 Yards)', 'Large (>= 500 Yards)']
legend_colors=['Blue', 'Yellow', 'Green']
legend_handles = [plt.Line2D([0], [0], marker='s', color='White', label=label, 
                             markersize=11, markerfacecolor=color, linestyle='None') 
                  for label, color in zip(legend_labels, legend_colors)]
ax.legend(handles=legend_handles, title='Tornado Width')

**by track length:**  
*two representations because i was still figuring out how to do it.*  

In [None]:
# Filter by tornadoes with width over 500 yds
high_len = tornadoes[tornadoes['len'] >= 50]
med_len = tornadoes[(tornadoes['len'] >= 10) & (tornadoes['len'] < 50)]
low_len = tornadoes[tornadoes['len'] < 10]

sns.set_style('darkgrid')
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.histplot(x='yr', data=low_len, color="Blue", ax=ax, bins=12)
sns.histplot(x='yr', data=med_len, color="Brown", ax=ax, bins=12)
sns.histplot(x='yr', data=high_len, color="Green", ax=ax, bins=12)
ax.set_title('Tornado frequencies over time (1950-2022) and track length', weight='bold')
ax.set_xlabel('Year', weight='bold')
ax.set_ylabel('Number of tornadoes', weight='bold')

In [None]:
# just another way to visuzlize track length

sns.set_style('darkgrid')
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.histplot(x='yr', hue='Track Length', data=tornadoes, palette='deep', bins=12)
ax.set_title('Tornado frequencies over time (1950-2022) and track length', weight='bold')
ax.set_xlabel('Year', weight='bold')
ax.set_ylabel('Number of tornadoes', weight='bold')
plt.show()

**by magnitude:**  
*several representations because i was still figuring out how to do it.*  

In [None]:
# Frequency of tornadoes and their magnitudes across states

sns.set_style('darkgrid')
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.histplot(x='yr', hue='mag', data=tornadoes, palette='deep', bins=12, alpha=0.7)
ax.set_title('Tornado frequencies over time (1950-2022) and magnitude', weight='bold')
ax.set_xlabel('Year', weight='bold')
ax.set_ylabel('Number of tornadoes', weight='bold')
plt.show()

In [None]:
# Magnitudes over time again
sns.set_style('darkgrid')
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.histplot(x='yr', hue='mag', multiple='layer', data=tornadoes, palette="deep")
ax.set_title('Frequency of all tornadoes over time (1950-2022) by magnitude', weight='bold')
ax.set_xlabel('Year', weight='bold')
ax.set_ylabel('Number of tornadoes', weight='bold')

In [None]:
# Magnitudes over time (another)
sns.set_style('darkgrid')
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.histplot(x='yr', hue='mag', multiple='stack', data=tornadoes, palette="Pastel2")
ax.set_title('Frequency of tornadoes across decades (1950-2022) by magnitude', weight='bold')
ax.set_xlabel('Decade', weight='bold')
ax.set_ylabel('Number of tornadoes', weight='bold')

### 3. statistical analysis
**starting to visualize descriptive statistics of numerical columns**  
*probably could use some more basic univariate visualizations and discussion*  

**various bivariate/multivariate exploratory plots are below**  
*needs some cleanup and discussion*  

In [None]:
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.boxplot(x='decade', y='wid', data=high_wid, palette='Pastel2')
ax.set_title('Large-Width (>= 500 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax.set_xlabel('Decade', weight='bold')
ax.set_ylabel('Width (Yards)', weight='bold')

In [None]:
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.boxplot(x='decade', y='wid', data=med_wid, palette='Pastel2')
ax.set_title('Medium-Width (50 - 500 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax.set_xlabel('Decade', weight='bold')
ax.set_ylabel('Width (Yards)', weight='bold')

In [None]:
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.boxplot(x='decade', y='wid', hue='in_alley', data=high_wid, palette='Pastel2')
ax.set_title('High-Width (500+ Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax.set_xlabel('Decade', weight='bold')
ax.set_ylabel('Width (Yards)', weight='bold')

In [None]:
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.boxplot(x='decade', y='wid', hue='in_alley', data=high_len, palette='Pastel2')
ax.set_title('Widths of Long-Track (50+ Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax.set_xlabel('Decade', weight='bold')
ax.set_ylabel('Width (Yards)', weight='bold')

In [None]:
f, ax = plt.subplots(3, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()

# Plot for short-track only
sns.boxplot(x='decade', y='wid', hue='in_alley', data=low_len, palette='Pastel2', ax=ax[0])
ax[0].set_title('Widths of Short-Track (>10 Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

# Plot for med-track only
sns.boxplot(x='decade', y='wid', hue='in_alley', data=med_len, palette='Pastel2', ax=ax[1])
ax[1].set_title('Widths of Medium-Track (10-50 Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

# Plot for long-track only
sns.boxplot(x='decade', y='wid', hue='in_alley', data=high_len, palette='Pastel2', ax=ax[2])
ax[2].set_title('Widths of Long-Track (50+ Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')

**what is the frequency of tornadoes across states and regions? what about for larger (higher mag, larger wid, longer len) tornadoes?**
- The top 10 states of all time as well as the regions are shown below.   
*As expected, southern and midwestern regions ad their states are the most common*  
*Visualize this over time*  
*Put total frequency and in/out tornado alley stuff here*

In [None]:
# Other indicators across space and time
f, ax = plt.subplots(2, 1, figsize=(12, 12))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()

# Plot for injuries
sns.boxplot(x='decade', y='inj', hue='in_alley', data=tornadoes, palette='Pastel2', ax=ax[0])
ax[0].set_title('Injuries due to Short-Track (>10 Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Inj', weight='bold')
ax[0].set_ylim(0,5)

# Plot for fatalities
sns.boxplot(x='decade', y='fat', hue='in_alley', data=tornadoes, palette='Pastel2', ax=ax[1])
ax[1].set_title('Fatalities due to Medium-Track (10-50 Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('fatalities', weight='bold')
ax[1].set_ylim(0,20)

In [None]:
# Injuries by track length
f, ax = plt.subplots(3, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()

# Plot for short track only
sns.boxplot(x='decade', y='inj', hue='in_alley', data=low_len, palette='Pastel2', ax=ax[0])
ax[0].set_title('Injuries due to Short-Track (>10 Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

# Plot for med track only
sns.boxplot(x='decade', y='inj', hue='in_alley', data=med_len, palette='Pastel2', ax=ax[1])
ax[1].set_title('Injuries due to Medium-Track (10-50 Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

# Plot for long track only
sns.boxplot(x='decade', y='inj', hue='in_alley', data=high_len, palette='Pastel2', ax=ax[2])
ax[2].set_title('Injuries due to Long-Track (50+ Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')

In [None]:
# Fatalities by track length
f, ax = plt.subplots(3, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()
sns.boxplot(x='decade', y='fat', hue='in_alley', data=low_len, palette='Pastel2', ax=ax[0])
ax[0].set_title('Fatalities due to Short-Track (>10 Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

sns.boxplot(x='decade', y='fat', hue='in_alley', data=med_len, palette='Pastel2', ax=ax[1])
ax[1].set_title('Fatalities due to Medium-Track (10-50 Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

sns.boxplot(x='decade', y='fat', hue='in_alley', data=high_len, palette='Pastel2', ax=ax[2])
ax[2].set_title('Fatalities due to Long-Track (50+ Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')
ax[2].set_ylim(0,400000000)

In [None]:
# lengths by width
f, ax = plt.subplots(3, 1, figsize=(12, 8))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()
sns.boxplot(x='decade', y='wid', hue='in_alley', data=low_len, palette='deep', ax=ax[0])
ax[0].set_title('Medium-Width (50 - 500 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

sns.boxplot(x='decade', y='wid', hue='in_alley', data=med_len, palette='deep', ax=ax[1])
ax[1].set_title('Medium-Width (50 - 500 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

sns.boxplot(x='decade', y='wid', hue='in_alley', data=high_len, palette='deep', ax=ax[2])
ax[2].set_title('Medium-Width (50 - 500 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')

In [None]:
# widths by width
f, ax = plt.subplots(3, 1, figsize=(12, 8))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()
sns.boxplot(x='decade', y='wid', hue='in_alley', data=low_wid, palette='deep', ax=ax[0])
ax[0].set_title('Low-Width (>50 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

sns.boxplot(x='decade', y='wid', hue='in_alley', data=med_wid, palette='deep', ax=ax[1])
ax[1].set_title('Medium-Width (50 - 500 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

sns.boxplot(x='decade', y='wid', hue='in_alley', data=high_wid, palette='deep', ax=ax[2])
ax[2].set_title('High-Width (500+ Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')

In [None]:
# lengths by widths line charts
f, ax = plt.subplots(3, 1, figsize=(12, 8))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()
sns.lineplot(x='yr', y='wid', hue='in_alley', data=low_len, palette='muted', ax=ax[0])
ax[0].set_title('Short-Track Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

sns.lineplot(x='yr', y='wid', hue='in_alley', data=med_len, palette='muted', ax=ax[1])
ax[1].set_title('Medium-Track Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

sns.lineplot(x='yr', y='wid', hue='in_alley', data=high_len, palette='muted', ax=ax[2])
ax[2].set_title('Long Track Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')

In [None]:
# width by year and location
f, ax = plt.subplots(figsize=(12, 6))
sns.lineplot(x='yr', y='wid', data=tornadoes, hue='in_alley')

In [None]:
# scatter of long track by decade
f, ax = plt.subplots(figsize=(12, 6))
sns.despine()
sns.scatterplot(x='yr', y='wid', hue='in_alley', data=high_len, palette='deep')
ax.set_title('Long track Tornadoes by Decade (1950-2022)', weight='bold')
ax.set_xlabel('Decade', weight='bold')
ax.set_ylabel('Width (Yards)', weight='bold')

In [None]:
# track length by wid
f, ax = plt.subplots(3, 1, figsize=(12, 8))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()
sns.scatterplot(x='yr', y='wid', hue='in_alley', data=low_len, palette='deep', ax=ax[0])
ax[0].set_title('short track Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

sns.scatterplot(x='yr', y='wid', hue='in_alley', data=med_len, palette='deep', ax=ax[1])
ax[1].set_title('med track Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

sns.scatterplot(x='yr', y='wid', hue='in_alley', data=high_len, palette='deep', ax=ax[2])
ax[2].set_title('Long track Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')

In [None]:
f, ax = plt.subplots(3, 1, figsize=(12, 8))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()
sns.scatterplot(x='yr', y='len', data=low_wid, color='Green', ax=ax[0])
ax[0].set_title('narrow Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Track Length (Miles)', weight='bold')

sns.scatterplot(x='yr', y='len', data=med_wid, color='Orange', ax=ax[1])
ax[1].set_title('medium wid Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Track Length (Miles)', weight='bold')

sns.scatterplot(x='yr', y='len', data=high_wid, color='Red', ax=ax[2])
ax[2].set_title('wide Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Track Length (Miles)', weight='bold')

In [None]:
f, ax = plt.subplots(3, 1, figsize=(12, 8))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()
sns.scatterplot(x='yr', y='wid', data=low_len, color='green', ax=ax[0])
ax[0].set_title('Low-Width (>50 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

sns.scatterplot(x='yr', y='wid', data=med_len, color='orange', ax=ax[1])
ax[1].set_title('Medium-Width (50 - 500 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

sns.scatterplot(x='yr', y='wid', data=high_len, color='red', ax=ax[2])
ax[2].set_title('Widths of Long-Track (50+ Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')

In [None]:
f, ax = plt.subplots(figsize=(12, 8))
sns.set_style('ticks')
sns.despine()
# Set background color
ax.set_facecolor('black')  # e.g., a light gray color
sns.scatterplot(x='yr', y='wid', data=med_len, color='blue', alpha=0.4)
sns.scatterplot(x='yr', y='wid', data=high_len, color='red', marker='D', s=70)
ax.set_title('Low-Width (>50 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax.set_xlabel('Decade', weight='bold')
ax.set_ylabel('Width (Yards)', weight='bold')

In [None]:
f, ax = plt.subplots(3, 1, figsize=(12, 8))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine()
sns.scatterplot(x='yr', y='wid', hue='in_alley', data=low_len, palette='deep', ax=ax[0])
ax[0].set_title('Low-Width (>50 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[0].set_xlabel('Decade', weight='bold')
ax[0].set_ylabel('Width (Yards)', weight='bold')

sns.scatterplot(x='yr', y='wid', hue='in_alley', data=med_len, palette='deep', ax=ax[1])
ax[1].set_title('Medium-Width (50 - 500 Yard) Tornadoes by Decade (1950-2022)', weight='bold')
ax[1].set_xlabel('Decade', weight='bold')
ax[1].set_ylabel('Width (Yards)', weight='bold')

sns.scatterplot(x='yr', y='wid', hue='in_alley', data=high_len, palette='deep', ax=ax[2])
ax[2].set_title('Widths of Long-Track (50+ Mile) Tornadoes by Decade (1950-2022)', weight='bold')
ax[2].set_xlabel('Decade', weight='bold')
ax[2].set_ylabel('Width (Yards)', weight='bold')

In [None]:
f, ax = plt.subplots(2, 1, figsize=(10, 15))
# Plot boxplot for len
sns.boxplot(x=tornadoes['mag'], y=tornadoes['wid'], ax=ax[0])
ax[0].set_title('Tornado Widths by Magnitudes')
ax[0].set_ylabel('Width (yards)')

sns.boxplot(x=tornadoes['mag'], y=tornadoes['len'], ax=ax[1])
ax[1].set_title('Tornado Track Length by Magnitudes')
ax[1].set_ylabel('Path (miles)')

In [None]:
f, ax = plt.subplots(2, 1, figsize=(10, 20))
# Plot boxplot for len
sns.boxplot(x=tornadoes['decade'], y=tornadoes['inj'], ax=ax[0])
ax[0].set_title('Injuries by decade')
ax[0].set_ylabel('Injuries')

sns.boxplot(x=tornadoes['decade'], y=tornadoes['fat'], ax=ax[1])
ax[1].set_title('Fatalities by decade')
ax[1].set_ylabel('Fatalities')

In [None]:
# Group by year and get the count of each wid category tornadoes for each year
low_wid_counts = low_wid.groupby('yr').size().reset_index(name='count')
med_wid_counts = med_wid.groupby('yr').size().reset_index(name='count')
high_wid_counts = high_wid.groupby('yr').size().reset_index(name='count')

In [None]:
# Group by year and get the count of each wid category tornadoes for each year
low_wid_counts2 = low_wid.groupby(['yr', 'in_alley']).size().reset_index(name='count')
med_wid_counts2 = med_wid.groupby(['yr', 'in_alley']).size().reset_index(name='count')
high_wid_counts2 = high_wid.groupby(['yr', 'in_alley']).size().reset_index(name='count')

low_wid_counts2 = low_wid_counts2.pivot(index='yr', columns='in_alley', values='count').reset_index()
med_wid_counts2 = med_wid_counts2.pivot(index='yr', columns='in_alley', values='count').reset_index()
high_wid_counts2 = high_wid_counts2.pivot(index='yr', columns='in_alley', values='count').reset_index()

low_wid_counts2.columns.name = None 
med_wid_counts2.columns.name = None 
high_wid_counts2.columns.name = None 

In [None]:
# Group by year and get the count of each wid category tornadoes for each year
low_len_counts = low_len.groupby('yr').size().reset_index(name='count')
med_len_counts = med_len.groupby('yr').size().reset_index(name='count')
high_len_counts = high_len.groupby('yr').size().reset_index(name='count')
high_len_counts

In [None]:
# Group by year and get the count of each wid category tornadoes for each year
low_len_counts2 = low_len.groupby(['yr', 'in_alley']).size().reset_index(name='count')
med_len_counts2 = med_len.groupby(['yr', 'in_alley']).size().reset_index(name='count')
high_len_counts2 = high_len.groupby(['yr', 'in_alley']).size().reset_index(name='count')

low_len_counts2 = low_len_counts2.pivot(index='yr', columns='in_alley', values='count').reset_index()
med_len_counts2 = med_len_counts2.pivot(index='yr', columns='in_alley', values='count').reset_index()
high_len_counts2 = high_len_counts2.pivot(index='yr', columns='in_alley', values='count').reset_index()

low_len_counts2.columns.name = None 
med_len_counts2.columns.name = None 
high_len_counts2.columns.name = None 
low_len_counts2

In [None]:
# Plotting
f, ax = plt.subplots(3, 1, figsize=(15, 10))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine(f)
sns.lineplot(x='yr', y='Inside Tornado Alley', data=low_wid_counts2, ax=ax[0], color='green')
sns.lineplot(x='yr', y='Outside Tornado Alley', data=low_wid_counts2, ax=ax[0], color='purple')
ax[0].set_title('Number of Narrow Tornadoes Over Time')
ax[0].set_ylabel('Frequency')
ax[0].legend()

sns.lineplot(x='yr', y='Inside Tornado Alley', data=med_wid_counts2, ax=ax[1], color='green')
sns.lineplot(x='yr', y='Outside Tornado Alley', data=med_wid_counts2, ax=ax[1], color='purple')
ax[1].set_title('Number of Medium-Width Tornadoes Over Time')
ax[1].set_ylabel('Frequency')

sns.lineplot(x='yr', y='Inside Tornado Alley', data=high_wid_counts2, ax=ax[2], color='green')
sns.lineplot(x='yr', y='Outside Tornado Alley', data=high_wid_counts2, ax=ax[2], color='purple')
ax[2].set_title('Number of Wide Tornadoes Over Time')
ax[2].set_ylabel('Frequency')

In [None]:
# Plotting
f, ax = plt.subplots(3, 1, figsize=(15, 10))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine(f)
sns.lineplot(x='yr', y='Inside Tornado Alley', data=low_len_counts2, ax=ax[0], color='green')
sns.lineplot(x='yr', y='Outside Tornado Alley', data=low_len_counts2, ax=ax[0], color='purple')
ax[0].set_title('Number of Short-Track Tornadoes Over Time')
ax[0].set_ylabel('Frequency')

sns.lineplot(x='yr', y='Inside Tornado Alley', data=med_len_counts2, ax=ax[1], color='green')
sns.lineplot(x='yr', y='Outside Tornado Alley', data=med_len_counts2, ax=ax[1], color='purple')
ax[1].set_title('Number of Medium-Track Tornadoes Over Time')
ax[1].set_ylabel('Frequency')

sns.lineplot(x='yr', y='Inside Tornado Alley', data=high_len_counts2, ax=ax[2], color='green')
sns.lineplot(x='yr', y='Outside Tornado Alley', data=high_len_counts2, ax=ax[2], color='purple')
ax[2].set_title('Number of Long-Track Tornadoes Over Time')
ax[2].set_ylabel('Frequency')

In [None]:
# Plotting
f, ax = plt.subplots(figsize=(15, 10))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine(f)
sns.lineplot(x='yr', y='count', data=low_wid_counts, ax=ax, color='green')
sns.lineplot(x='yr', y='count', data=med_wid_counts, ax=ax, color='orange')
sns.lineplot(x='yr', y='count', data=high_wid_counts, ax=ax, color='red')
ax.set_title('Number of Tornadoes over time (by width)')
ax.set_ylabel('Frequency')

In [None]:
# Plotting
f, ax = plt.subplots(3, 1, figsize=(15, 10))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine(f)
sns.lineplot(x='yr', y='count', data=low_len_counts, ax=ax[0], color='orange')
ax[0].set_title('Number of Short-Track Tornadoes Over Time')
ax[0].set_ylabel('Number of Short-Track Tornadoes')

sns.lineplot(x='yr', y='count', data=med_len_counts, ax=ax[1], color='orange')
ax[1].set_title('Number of Medium-Track Tornadoes Over Time')
ax[1].set_ylabel('Number of Medium-Track Tornadoes')

sns.lineplot(x='yr', y='count', data=high_len_counts, ax=ax[2], color='orange')
ax[2].set_title('Number of Long-Track Tornadoes Over Time')
ax[2].set_ylabel('Number of Long-Track Tornadoes')

In [None]:
# Plotting
f, ax = plt.subplots(figsize=(15, 10))
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.despine(f)
sns.lineplot(x='yr', y='count', data=low_len_counts, ax=ax, color='green')
sns.lineplot(x='yr', y='count', data=med_len_counts, ax=ax, color='orange')
sns.lineplot(x='yr', y='count', data=high_len_counts, ax=ax, color='red')
ax.set_title('Number of Tornadoes over time (by length)')
ax.set_ylabel('Frequency')

In [None]:
# Plotting
f, ax = plt.subplots(figsize=(15, 10))
sns.despine(f)
sns.lineplot(x='yr', y='count', data=high_wid_counts, ax=ax, color='green')
ax.set_title('Number of Wide Tornadoes Over Time')
ax.set_ylabel('Number of Wide Tornadoes')
plt.show()

In [None]:
# Plotting
f, ax = plt.subplots(figsize=(15, 10))
sns.despine(f)
sns.scatterplot(x='yr', y='Outside Alley Counts', data=high_wid_counts2, ax=ax, color='green')
ax.set_title('Number of Wide Tornadoes Over Time')
ax.set_ylabel('Number of Wide Tornadoes')
plt.show()

In [None]:
# Plotting
f, ax = plt.subplots(figsize=(15, 10))
sns.despine(f)
sns.scatterplot(x='yr', y='count', data=high_wid_counts, ax=ax, color='green')
ax.set_title('Number of Wide Tornadoes Over Time')
ax.set_ylabel('Number of Wide Tornadoes')
plt.show()

In [None]:
corr_matrix = tornadoes.corr(numeric_only=True)
corr_matrix

In [None]:
# Plot the heatmap
plt.figure(figsize=(12, 10))  # Adjust the size as needed
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', annot_kws={"size": 6})
plt.show()

In [None]:
# Plotting
f, ax = plt.subplots(figsize=(12, 8))
sns.despine(f)
sns.scatterplot(x='yr', y='inj', hue='wid_level', data=tornadoes)
ax.set_title('Injuries due to Tornadoes Over Time (1950-2022)', weight='bold')
ax.set_xlabel('Year', weight='bold')
ax.set_ylabel('Number of Injuries', weight='bold')
plt.show()

In [None]:
# Plotting
f, ax = plt.subplots(figsize=(12, 8))
sns.despine(f)
sns.scatterplot(x='yr', y='fat', hue='wid_level', data=tornadoes)
ax.set_title('Fatalities due to Tornadoes Over Time (1950-2022)', weight='bold')
ax.set_xlabel('Year', weight='bold')
ax.set_ylabel('Number of Fatalities', weight='bold')
plt.show()

In [None]:
# Plotting
f, ax = plt.subplots(figsize=(12, 8))
sns.despine(f)
sns.scatterplot(x='yr', y='fat', hue='wid_level', data=tornadoes)
ax.set_title('Fatalities due to Tornadoes Over Time (1950-2022)', weight='bold')
ax.set_xlabel('Year', weight='bold')
ax.set_ylabel('Number of Fatalities', weight='bold')
plt.show()

In [None]:
import numpy as np

# read the csv file
df = pd.read_csv("tornados.csv")
# Filter and group the data by year to analyze trends over time
df['date'] = pd.to_datetime(df['date'])  # Convert the 'date' column to a datetime object
df['yr'] = df['date'].dt.year  # Extract the year from the 'date' column
# Group by year and calculate property loss and fatalities
property_loss_by_year = df.groupby('yr')['loss'].sum()
fatalities_by_year = df.groupby('yr')['fat'].sum()
# Calculate mean and median for property loss and fatalities for each year
property_loss_mean = df.groupby('yr')['loss'].mean()
property_loss_median = df.groupby('yr')['loss'].median()
fatalities_mean = df.groupby('yr')['fat'].mean()
fatalities_median = df.groupby('yr')['fat'].median()
# tornado path length
path_length_mean = df.groupby('yr')['len'].mean()
path_length_median = df.groupby('yr')['len'].median()
# tornado severity
tornado_severity_mean = df.groupby('yr')['mag'].mean()
all_avg_severity = df['mag'].mean()
tornado_severity_median = df.groupby('yr')['mag'].median()
print("\nMean Property Loss by Year:")
print(property_loss_mean)
print("\nMedian Property Loss by Year:")
print(property_loss_median)
print("\nMean Fatalities by Year:")
print(fatalities_mean)
print("\nMedian Fatalities by Year:")
print(fatalities_median)
print("\nAverage Tornado Path Length by Year:")
print(path_length_mean)
print("\nMedian Tornado Path Length by Year:")
print(path_length_median)
print("\nTrend in Tornado Severity Over Time:")
print(tornado_severity_mean.pct_change())
print("\nAverage Tornado Severity by Year:")
print(tornado_severity_mean)
print("\nMedian Tornado Severity by Year:")
print(tornado_severity_median)
#graph to represent loss, fatalities, path length, and tornado severity
plt.figure(figsize=(16, 16))
plt.subplot(4, 1, 1)
plt.plot(property_loss_by_year.index, property_loss_by_year, label='Property Loss')
plt.title('Property Loss due to Tornadoes Over Time')
plt.xlabel('Year')
plt.ylabel('Property Loss')
plt.legend()
plt.subplot(4, 1, 2)
plt.plot(fatalities_by_year.index, fatalities_by_year, label='Fatalities', color='orange')
plt.title('Fatalities due to Tornadoes Over Time')
plt.xlabel('Year')
plt.ylabel('Fatalities')
plt.legend()
plt.subplot(4, 1, 3)
plt.plot(path_length_mean.index, path_length_mean, label='Average Path Length', color='green')
plt.plot(path_length_median.index, path_length_median, label='Median Path Length', color='blue')
plt.title('Tornado Path Length Over Time')
plt.xlabel('Year')
plt.ylabel('Path Length')
plt.legend()
plt.subplot(4, 1, 4)
plt.plot(tornado_severity_mean.index, tornado_severity_mean, label='Average Severity', color='red')
plt.plot(tornado_severity_median.index, tornado_severity_median, label='Median Severity', color='purple')
# plt.plot(0,all_avg_severity, label="Cumulative Avg Severity", color='blue')
plt.axhline(y=np.nanmean(df['mag']))
plt.title('Tornado Severity Over Time')
plt.xlabel('Year')
plt.ylabel('Tornado Severity')
plt.legend()
plt.tight_layout()  
plt.show()

In [None]:
import matplotlib.ticker as mtick

injuries_by_year = df.groupby('yr')['inj'].sum()

# Calculate mean and median for property loss and fatalities for each year
injuries_mean = df.groupby('yr')['inj'].mean()
injuries_median = df.groupby('yr')['inj'].median()

#graph to represent loss, fatalities, path length, and tornado severity
plt.figure(num=1,figsize=(10, 2))
plt.subplot(1, 1, 1)
plt.plot(property_loss_mean.index, property_loss_mean, label='Property Loss')
plt.title('Avg Property Loss due to Tornadoes Over Time')
plt.xlabel('Year')
plt.ylabel('Property Loss (Dollars)')
plt.legend()

plt.figure(num=2,figsize=(10, 2))
plt.subplot(1, 2, 1)
plt.plot(fatalities_mean.index, fatalities_mean, label='Avg Fatalities')
plt.title('Avg Fatalities due to Tornadoes Over Time')
plt.xlabel('Year')
plt.ylabel('Fatalities')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(injuries_mean.index, injuries_mean, label='Avg Injuries')
plt.title('Avg Injuries due to Tornadoes Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Injuries')
plt.legend()