# Introduction to Data
The dataset, which was scraped from Weather Underground's hurricane archives, provides detailed information on all 175 Atlantic storms between 2010 to date (2020). (Weather Underground). In fact, the website has severe weather data that goes back to 1851 and spans multiple basins such as Pacific Ocean and Indian Ocean. However, we decided to focus on the Atlantic Ocean basin between 2010 and 2020. We derived a new dataset from the initial dataset scrapped from Weather Underground hurricane archives. The initial attributes are: date, time, lat(titude), lon(gitude), wind (mph), pressure(mb), storm type, sates (start date - end date), and maximum strength. The final dataset contains some extra attributes that were derived from the initial ones.

# Data Description
#### Shape of Dataset: 26 attributes and 10448 tuples
#### Attributes Description
- NameYear: The name and year of storm. It acts as the identifier of the storm since two storms from the same year cannot share the same name. Storms may share the same if they happened in different years though. 
- Time: Timestamp in UNIX time calculated using the date and time of the initial dataset.
- Lon, Lat: Current location of storm.
- Wind (mph): Current wind in mph.
- Pressure: Current pressure of the storm in mb.
- Durationdays: Duration of the storm in days. It was calculated from the initial dataset's dates.
- Season: Summer, Autumn, Winter or Spring.
- Storm Type: Type/Category of storm.
- MaxStrength: Maximum type/category the storm has reached over its course.
- StartTime, StartLat, StartLon, StartWind, StartPressure, StartStormType: time, location, wind, pressure and storm type at which storm started.
- EndTime	EndLat	EndLon	EndWind	EndPressure	EndStormType: time, location, wind, pressure and storm type at which storm ended.
- MinWind, MaxWind: Minimum and minimum wind of the storm over its course.
- MinPressure	MaxPressure: Minimum and minimum pressure of the storm over its course.

The dataset is relatively clean with very few NA values (only 23 NA values for StormType and 83 for EndStormType).

# References

Weather Underground. (n.d.). Hurricane and Tropical Cyclones. Retrieved July 17, 2020, from https://www.wunderground.com/hurricane/archive <br>
Stages of Development from disturbance to hurricane. (n.d.). Retrieved July 18, 2020, from http://ww2010.atmos.uiuc.edu/(Gh)/guides/mtr/hurr/stages.rxml

# Appendices

## Generating Extra Columns

In [1]:
import pandas as pd

In [2]:
df=pd.read_csv('Storm_2010_2020.csv')
df.head()

Unnamed: 0,NameYear,Time,Lat,Lon,Wind,Pressure,DurationDays,Season,StormType,MaxStrength
0,Alex_2010,1277402400,15.9,-82.0,29,1007,9,Summer,Tropical Low,Hurricane
1,Alex_2010,1277413200,15.95,-82.04,29,1006,9,Summer,Tropical Low,Hurricane
2,Alex_2010,1277424000,16.0,-82.1,29,1006,9,Summer,Tropical Low,Hurricane
3,Alex_2010,1277434800,16.05,-82.193,29,1006,9,Summer,Tropical Low,Hurricane
4,Alex_2010,1277445600,16.1,-82.3,29,1006,9,Summer,Tropical Low,Hurricane


#### Getting the start time and end time of each storm

In [3]:
df_time=df[['NameYear','Time']]                                       #extracting the columns Time and NameYear
df_time_start=df_time.groupby(['NameYear']).min()                     #getting the minimum time of each storm
df_time_start.rename(columns={'Time':'StartTime'},inplace=True)       #renaming the column Time to StartTime
df_time_end=df_time.groupby(['NameYear']).max()                      #getting the maximum time of each storm
df_time_end.rename(columns={'Time':'EndTime'},inplace=True)          #renaming the column Time to EndTime

#### Getting the inital longitude, lattitude, wind, pressure, and storm type of each storm

In [4]:
InitialStorm=pd.merge(df_time_start, df, left_on=['NameYear','StartTime'], right_on=['NameYear','Time'])
InitialStorm.rename(columns={'Lon':'StartLon', 'Lat':'StartLat', 'Wind':'StartWind', 'Pressure':'StartPressure',
                             'StormType':'StartStormType'},inplace=True)  #renaming the columns
InitialStorm.drop(['Time','DurationDays','Season','MaxStrength'], axis=1, inplace=True)

#### Getting the ending longitude, lattitude, wind, pressure, and storm type of each storm

In [5]:
EndingStorm=pd.merge(df_time_end, df, left_on=['NameYear','EndTime'], right_on=['NameYear','Time'])
EndingStorm.rename(columns={'Lon':'EndLon', 'Lat':'EndLat', 'Wind':'EndWind', 'Pressure':'EndPressure',
                            'StormType':'EndStormType'},inplace=True)  #renaming the columns
EndingStorm.drop(['Time','DurationDays','Season','MaxStrength'], axis=1, inplace=True)

#### Getting the minimum and maximum wind of each storm

In [6]:
df_wind=df[['NameYear','Wind']]                                       #extracting the columns Wind and NameYear
df_wind_min=df_wind.groupby(['NameYear']).min()                       #getting the minimum Wind of each storm
df_wind_min.rename(columns={'Wind':'MinWind'},inplace=True)           #renaming the column Wind to MinWind
df_wind_max=df_wind.groupby(['NameYear']).max()                       #getting the maximum Wind of each storm
df_wind_max.rename(columns={'Wind':'MaxWind'},inplace=True)           #renaming the column Wind to MaxWind

#### Getting the minimum and maximum pressure of each storm

In [7]:
df_wind=df[['NameYear','Pressure']]                                           #extracting the columns Pressure and NameYear
df_pressure_min=df_wind.groupby(['NameYear']).min()                           #getting the minimum Pressure of each storm
df_pressure_min.rename(columns={'Pressure':'MinPressure'},inplace=True)       #renaming the column Pressure to MinPressure
df_pressure_max=df_wind.groupby(['NameYear']).max()                           #getting the maximum Pressure of each storm
df_pressure_max.rename(columns={'Pressure':'MaxPressure'},inplace=True)       #renaming the column Pressure to MaxPressure

#### Merging the initial dataset with the derived sub-datasets to generate the final dataset

In [8]:
StormDataset= pd.merge(pd.merge(pd.merge(pd.merge(pd.merge(pd.merge(df, InitialStorm, on='NameYear'),
                                                           EndingStorm, on='NameYear'),
                                                  df_wind_min, on='NameYear'),
                                         df_wind_max, on='NameYear'),
                                df_pressure_min, on='NameYear'),
                       df_pressure_max, on='NameYear')
StormDataset.head()

Unnamed: 0,NameYear,Time,Lat,Lon,Wind,Pressure,DurationDays,Season,StormType,MaxStrength,...,EndTime,EndLat,EndLon,EndWind,EndPressure,EndStormType,MinWind,MaxWind,MinPressure,MaxPressure
0,Alex_2010,1277402400,15.9,-82.0,29,1007,9,Summer,Tropical Low,Hurricane,...,1278028800,23.2,-101.9,35,997,Tropical Depression,29,109,946,1007
1,Alex_2010,1277413200,15.95,-82.04,29,1006,9,Summer,Tropical Low,Hurricane,...,1278028800,23.2,-101.9,35,997,Tropical Depression,29,109,946,1007
2,Alex_2010,1277424000,16.0,-82.1,29,1006,9,Summer,Tropical Low,Hurricane,...,1278028800,23.2,-101.9,35,997,Tropical Depression,29,109,946,1007
3,Alex_2010,1277434800,16.05,-82.193,29,1006,9,Summer,Tropical Low,Hurricane,...,1278028800,23.2,-101.9,35,997,Tropical Depression,29,109,946,1007
4,Alex_2010,1277445600,16.1,-82.3,29,1006,9,Summer,Tropical Low,Hurricane,...,1278028800,23.2,-101.9,35,997,Tropical Depression,29,109,946,1007


#### Exporting the final dataset StormDataset

In [10]:
StormDataset.to_csv('StormDatasetFinal.csv', index = False)

## Data Description

#### Shape

In [11]:
StormDataset.shape

(10448, 26)

#### Counting the Null values

In [12]:
StormDataset.isna().sum()

NameYear           0
Time               0
Lat                0
Lon                0
Wind               0
Pressure           0
DurationDays       0
Season             0
StormType         23
MaxStrength        0
StartTime          0
StartLat           0
StartLon           0
StartWind          0
StartPressure      0
StartStormType     0
EndTime            0
EndLat             0
EndLon             0
EndWind            0
EndPressure        0
EndStormType      83
MinWind            0
MaxWind            0
MinPressure        0
MaxPressure        0
dtype: int64