# North Atlantic Tropical Cyclone Visualization, Part 1
## Megan Collins

## North Atlantic Tropical Cyclone Wind Speed Predictions

### Introduction

This notebook contains visualizations that support reseach focusing on developing a predictive statistical model that provides accurate predictions of the maximum sustained wind speed of hurricanes in the North Atlantic using only satellite data.  The predictive statistical model uses linear regression techniques in which data about the ocean surface and upper ocean in the North Atlantic near the hurricane are used to predict the maximum sustained wind speed of that hurricane.  Ultimately, the goal of this predictive statistical model is to provide a tool that would enable hurricane researchers to more closely monitor hurricanes that are approaching the continental United States without needing to use expensive and potentially dangerous methods to collect data in close proximity to the hurricane by leveraging existing data which is collected daily by multiple satellites in orbit around Earth.  Here, the primary consideration is whether or not specific types of data about the upper ocean and the ocean surface in the North Atlantic are correlated with the observed maximum sustained wind speed of various hurricanes which existed previously in the North Atlantic.  

Although predictions of the maximum sustained wind speed are most important for storms which are classified as hurricanes in the North Atlantic, storms which are classified as tropical storms are also important in order to better protect coastal communities in advance of hurricane and tropical storm landfall events.  Hence, these visualizations will examine the relationships between maximum sustained wind speed of hurricanes or tropical storms and data about the upper ocean and the ocean surface in the North Atlantic.   

### Data

The data that is used to generate these visualizations is a combination of data from multiple datasets.  The datasets which are used to create the composite dataset are the National Oceanic and Atmospheric Administration (NOAA) North Atlantic tropical cyclone dataset, available at: https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2018-051019.txt, sea surface temperature (SST) data from the NOAA AVHRR satellite, available at: https://podaac.jpl.nasa.gov/dataset/AVHRR_OI-NCEI-L4-GLOB-v2.0, ERA5 sea surface humidity data from the Copernicus satellite, available at: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels-monthly-means?tab=overview, and climatological mixed layer depth data from the French Research Institute for the Exploration of the Sea (IFREMER), available at: http://www.ifremer.fr/cerweb/deboyer/mld/Surface_Mixed_Layer_Depth.php.  When the data were combined into a single csv file, the data were cleaned and processed as part of a research project in order to provide data about the storm at each point along its track and data about the ocean surface and upper ocean at each point along the storm's track in a single file.

In order to investigate the question of how specific types of data about the upper ocean and the ocean surface in the North Atlantic are correlated with the observed maximum sustained wind speed of hurricanes and tropical storms in the North Atlantic, we will begin by importing the necessary packages, altair and pandas.  

In [1]:
import altair as alt
import pandas as pd

We will now read in the csv file containing the data about tropical cyclones and the upper ocean/ocean surface in the North Atlantic.  Once the data has been read into pandas, we will only select storms that occurred during the 2005 Atlantic hurricane season and only storms which eventually received a name, meaning that at some point in their lifecycle they were classified as a tropical storm.  These storms may have also attained hurricane classification and in the beginning and end of their lifecycles, they are weaker than a tropical storm classification.  However, storms that eventually organize to form a tropical storm have clouds that are more tightly organized around the center of the storm, called the eye.  This organization of clouds around the eye of the storm contributes to the relationship the storm has with the upper ocean and ocean surface as well as the maximum sustained wind speed that the storm can attain during its lifecycle.  This means that we are considering the maximum sustained wind speed for the storm at each day in its lifecycle and comparing that maximum sustained wind speed to the properties of the upper ocean and ocean surface on each of those days in order to find a method for predicting the maximum wind speed of these storms based on data about the upper ocean and ocean surface that satellites collect on a daily basis.  

Here we have selected only tropical storms and hurricanes from the 2005 Atlantic hurricane season because the 2005 Atlantic hurricane season was the most active Atlantic hurricane season for which complete instrumental records exist.  Since the 2005 Atlantic hurricane season was the most active Atlantic hurricane season for which we have complete instrumental records, the 2005 season has the most storms of any of the Atlantic hurricane seasons, which enables us to analyze the largest number of storms.   

*Note*: The 2020 Atlantic hurricane season was more active than the 2005 Atlantic hurricane season, but final data about the storms in the 2020 Atlantic hurricane season is still being compiled by NOAA, so it has not yet been incorporated into this dataset.  

In [3]:
data = pd.read_csv('Hurricanes.csv')
data = data[(data['Year'] == 2005) & (data['Name'] != 'UNNAMED')]
data.head()

Unnamed: 0,Category,Delta Q,Delta SST,Fractional Julian Days,Julian Days,Key,Latitude,Longitude,Maximum Wind Speed,Minimum Pressure,Mixed Layer Depth,Name,Predicted Wind,Record Identifiers,Status Values,Storm Classification,Year
8197,,0.004704,-0.539978,159.75,160,ARLENE2005,16.9,276.0,25,1004,39.006447,ARLENE,-4.559544,,2,TD,2005
8198,,0.004907,-0.080017,160.0,160,ARLENE2005,17.4,276.1,30,1003,39.297527,ARLENE,-0.652639,,2,TD,2005
8199,,0.005032,-1.200012,160.25,161,ARLENE2005,18.2,276.1,35,1003,39.297527,ARLENE,-9.54432,,3,TS,2005
8200,,0.005091,-0.839996,160.5,161,ARLENE2005,19.0,276.0,35,1002,39.297527,ARLENE,-6.602754,,3,TS,2005
8201,,0.004908,-0.580017,160.75,161,ARLENE2005,19.7,275.9,35,1002,33.35261,ARLENE,-4.013646,,3,TS,2005


### Visualization 1

The first two visualizations are a pair of interacting visualizations.  For the first visualization in the pair, the visualization shows a scatterplot where the maximum wind speed is shown on the y-axis and the minimum pressure of the storm is shown on the x-axis.  Based on the known physics about how tropical cyclones and hurricanes form as well as their properties after they form, there should be a linear relationship between the maximum wind speed and the minimum pressure of the storm.  Since the maximum wind speed and the minimum pressure of these storms should have a linear relationship, these quantities should be correlated, meaning that a change in the minimum pressure of the storm corresponds to a change in the maximum wind speed of that storm and a change in the maximum wind speed of a storm corresponds to a change in the minimum pressure of the storm.  In the scatterplot, each point is given a color that corresponds to the name of that storm.  This allows the audience to click on any point in the scatterplot and see all of the minimum pressure and maximum wind speed pairs for that storm, one point for each day in the storm's lifecycle.  In order to do this, when the audience clicks on a point, the remaining points for that storm still have the same color, while the color of the other points in the plot is changed to white, allowing the audience to quickly identify all the points for a specific storm on the plot.  Unfortunately, the 2005 Atlantic hurricane season had a very large number of storms, so the colors in this plot repeat several times.  In order to allow the audience to select the values for the storm they want to see in more detail, the audience can zoom in on the scatterplot since it is interactive to more clearly see individual points in the scatterplot.  Moreover, if the audience holds their cursor over a specific point in the plot, a small window will pop up next to the cursor telling the user the storm's name and the minimum pressure and maximum wind speed values of that point.  This gives the user more control over how they view and interact with the plot and allows the user to be able to select the data that is most relevant to them.  
### Visualization 2

The second visualization in this pair is a histogram.  The histogram shows the total number of days that there were hurricanes (HU), tropical storms (TS), tropical depressions (TD), tropical waves (WV), subtropical storms (SS), subtropical depressions (SD), a low (LO), or extratropical cyclones (EX) in the North Atlantic during the 2005 Atlantic hurricane season.  In this context, subtropical storms, subtropical depressions, lows, and tropical waves are not very relevant to the audience, but they are included for the sake of completeness.  Tropical depressions are the first stage in the formation of a tropical storm of hurricane and they occur at the beginning of the lifecycle for every tropical storm or hurricane, so the number of days when tropical depressions were in the North Atlantic indicates the number of days when tropical storms started forming in the North Atlantic during the 2005 season.  The numbers for tropical storms and hurricanes indicate the number of days in the 2005 Atlantic hurricane season when tropical storms or hurricanes were present in the North Atlantic.  Finally, extratropical cyclones denote tropical storms of hurricanes that have moved out of the tropics but are still functioning as a storm system and have not yet broken apart into a loose area of clouds.  The number for extratropical cyclones indicate the number of days when a tropical storm or hurricane was functioning as a storm outside of the tropics.  Since this histogram is connected to the scatterplot of maximum wind speed against minimum pressure for all the tropical storms and hurricanes in the 2005 Atlantic hurricane season, whenever the audience selects a named storm on the scatterplot on the left, the histogram changes to show the number of days that the selected storm spent in each of the aforementioned classifications in order to give the audience a breakdown of how the storm progressed through its lifecycle in the North Atlantic.  

In [4]:
selection = alt.selection(type='multi', fields=['Name'])

scatter = alt.Chart(data).mark_circle().encode(
    x = 'Minimum Pressure',
    y = 'Maximum Wind Speed',
    tooltip = ['Name', 'Minimum Pressure', 'Maximum Wind Speed'],
    color = alt.condition(selection, 'Name', alt.value('white'))
).add_selection(selection).interactive()

bar = alt.Chart(data).mark_bar().encode(
    x = 'Storm Classification',
    y = 'count()',
    tooltip = ['count()']
).transform_filter(selection)

scatter | bar

### Visualization 1 Analysis

Notice in the scatterplot shown above that there is a clear linear relationship between maximum sustained wind speed and minimum pressure for storms in the 2005 Atlantic hurricane season.  This linear relationship is also negative, which means that as the minimum pressure increases, the maximum wind speed decreases.  This trend in the data is expected because as the tropical storm or hurricane strengthens, the minimum pressure in the eye of the storm decreases.  Since the tropical storm or hurricane is strengthening, then the wind speeds increase overall for the tropical storm or hurricane, which increases the maximum sustained wind speed for that tropical storm or hurricane.  Due to the clear negative linear relationship between maximum sustained wind speed and minimum pressure for the tropical storms and hurricanes in the 2005 Atlantic hurricane season, the scatterplot shows that there is a strong correlation between the maximum sustained wind speed and the minimum pressure of tropical storms and hurricanes during the 2005 Atlantic hurricane season.

Observe that even if we select a single tropical storm or hurricane from the scatterplot, we still see a clear negative linear relationship between the maximum sustained wind speed and minimum pressure of the tropical storm or hurricane, wherein as the minimum pressure for the tropical storm or hurricane increases, the maximum sustained wind speed of the tropical storm or hurricane decreases.  Hence, the scatterplot provides evidence that there is a strong correlation between the maximum sustained wind speed and the minimum pressure for the individual tropical storms and hurricanes during the 2005 Atlantic hurricane season.  

### Visualization 2 Analysis

For the histogram, observe that tropical storms accounted for the largest number of days out of all the classifications of these storms, followed by hurricanes, then tropical depressions, and finally extratropical cyclones.  In the 2005 Atlantic hurricane season, we see that lows, subtropical depressions, subtropical storms, and tropical waves were rare events, and with subtropical depressions, subtropical storms, and tropical waves being very rare events.  Overall, this suggests that the majority of storms in the 2005 Atlantic hurricane season spent only a few days as a tropical depression before strengthening into a tropical storms.  From there, these storms either remained tropical storms through the end of their lifecycle or strengthened again to form hurricanes.  Notice that once the storms weakened to become extratropical cyclones, their remaining lifecycles were shorter because they continue to move north, they encounter progressively colder sea surface temperatures.  As the sea surface temperatures become colder, the extratropical cyclones have less fuel and eventually, the clouds begin to separate and the storm is no longer classified as a cyclone.  

When we select a storm on the scatterplot and the histogram is correspondingly filtered to show the distribution of days spent in each classification for that storm's lifecycle, we see that all of the storms achieve a classification of tropical storm due to the filtering in our dataset and that the tropical storm classification usually occupies the largest portion of that storm's lifecycle.  Further notice that most of the storms spent at least some time as a tropical depression because storms usually begin their lifecycle as a tropical depression before strengthening into a tropical storm.  The few exceptions to this trend are storms that formed in the subtropics and then traveled into the tropics to become a tropical storm.  These storms start their lifecycle as a subtropical depression and either become a subtropical storm and then a tropical storm or directly become a tropical storm.  A smaller number of storms spend some number of days in their lifecycle as a low and even fewer storms are a tropical wave for some amount of time during their lifecycle.  Since the 2005 Atlantic hurricane season was a fairly active season, many of the storms spend some time during their lifecycle as a hurricane.  Some storms rapidly strengthen from a tropical depression to a hurricane and spend the majority of their lifecycle as a hurricane, while other storms spend the majority of their lifecycle as a tropical storm and take more time to strengthen into a hurricane.  A subset of these hurricanes eventually become extratropical cyclones and this is because some hurricanes make landfall in the tropics and lose enough organization to no longer be classified as a cyclone at all while still in the tropics.  Since these storms are still in the tropics when their clouds break apart, the storms are never classified as an extratropical cyclone.  Based on these observations, we see that the majority of storms in the 2005 Atlantic hurricane season spent a large portion of their lifecycle as either tropical storms or hurricanes, with most of these storms strengthening from tropical depressions before achieving tropical storm classification. Overall, this histogram demonstrates how the changes in the strength of a storm over its lifecycle relate to the changes seen between the maximum sustained wind speed and the minimum pressure of these storms.

### Visualization 3

The second two visualizations are again grouped as a pair of interacting visualizations.  Here the first visualization in the pair is a bubblechart where the maximum wind speed of the storm is shown on the y-axis and the mixed layer depth is shown on the x-axis.  Mixed layer depth is the depth of the top-most layer in the ocean where the temperatures in this layer of water are the warmest of all the layers in the ocean.  Mixed layer depth is important because warm water provides fuel to tropical storms and hurricanes in the same way that wood provides fuel for a fire, wherein more warm water results in a stronger tropical storm or hurricane, usually denoted by higher categories for hurricanes, in the same way that more wood results in a larger fire.  The larger that the mixed layer depth is, the more warm water is located at the top of the ocean to fuel a tropical storm or hurricane and the more likely the storm is to strengthen.  Notice that when a tropical storm or hurricane strengthens, its maximum wind speed increases, so we would expect deeper mixed layer depths to be correlated with higher maximum wind speed values.  An exception to this theory is shallow, coastal waters where all of the water is very warm but the water is also very shallow because of its proximity to the coast.  

Each of the bubbles in the bubblechart is assigned a color based on the classification of the storm into the categories described in the description of the histogram shown in the previous pair of visualizations.  This allows the audience to examine how the maximum wind speed of a storm is related to the classification of that storm.  In order to further reinforce this connection, the bubbles are assigned a size based on a numerical value related to the classification of that storm.  One unfortunate caveat here is that the largest sizes correspond to storms that occur infrequently.  However, the size of the bubble increases as the storm progresses through its lifecycle to allow the audience to trace the storm through its life.  This plot is also interactive which allows the audience to zoom in and pan around in the plot in order to see specific bubbles more clearly.  If a user hovers their cursor over a specific bubble on the plot, a window will pop up to the right of their cursor displaying the name of the storm that the bubble corresponds to as well as the maximum wind speed, mixed layer depth, and status value of the storm at that point in its lifecycle.  Moreover, the interactive features allow the audience to select a bubble of any color or size, which is to say that the audience can select a storm classification which turns all the other bubbles light grey so that it is easier to see the trends between maximum wind speed and mixed layer depth for storms with a specific classification.

### Visualization 4

For the second visualization in the pair, we have a scatterplot of the maximum sustained wind speed on the y-axis and the sea surface temperature delta, meaning the difference in sea surface temperature 3 days before the storm passed over a section of the ocean surface and 3 days after the storm passed over a section of the ocean surface, on the x-axis.  The sea surface temperature delta is important because as tropical storms or hurricanes pass over a section of the ocean surface, the force of the winds in the tropical storm or hurricane disturb the ocean surface, causing the warm surface water in that section of the ocean to move outwards, away from the eye of the tropical storm or hurricane due to fluid dynamics.  As the warm surface water moves outwards, colder water from deeper in the ocean moves upwards to take its place, resulting in colder sea surface temperatures on the section of the ocean surface that the tropical storm or hurriane passed over.  These colder sea surface temperatures remain for several days after the tropical storm or hurricane passed over that region of the ocean surface.  Tropical storms or hurricanes with stronger maximum sustained wind speeds will distrub the ocean surface more than tropical storms or hurricanes with weaker maximum sustained wind speeds.  This means that tropical storms or hurricanes with stronger maximum sustained wind speeds will move more of the warm ocean surface water away from the tropical storm or hurricane, which means that the water which replaces the warm surface water is from deeper in the ocean.  Since the water is from deeper in the ocean, the sea surface temperatures will be colder after a tropical storm or hurricane with stronger maximum sustained wind speeds passes over that section of the ocean surface than if the tropical storm or hurricane had weaker maximum sustained wind speeds.  

Returning to the scatterplot, each of the points in the plot is given a color that corresponds to the classification of the storm at that point in its lifecycle.  Furthermore, this scatterplot is interactive so the audience can zoom in and out of the plot and pan around the plot in order to examine the distribution of points more clearly.  Additionally, if the user puts their cursor over a specific point on the scatterplot, a window will pop up next to the cursor that displays the name of the storm as well as its sea surface temperature delta and maximum sustained wind speed values at that point in its lifecycle.  Since this scatterplot is connected to the bubblechart through the interaction feature, when a user selects a storm classification by choosing a bubble of a specific size/color on the bubblechart, the points on the scatterplot that represent storms with a different classification disappear, so that the user only sees the points that correspond to storms with the selected storm classification.  This makes it easier for the user to see how the correlation between the maximum sustained wind speed and sea surface temperature delta changes based on the classification of the storm.

In [9]:
selection = alt.selection(type='multi', fields=['Storm Classification'])

mld_viz = alt.Chart(data).mark_circle().encode(
    x = 'Mixed Layer Depth',
    y = 'Maximum Wind Speed',
    color = alt.condition(selection, 'Storm Classification', alt.value('lightgrey')),
    size = 'Status Values',
    tooltip = ['Name', 'Maximum Wind Speed', 'Mixed Layer Depth', 'Status Values']
).add_selection(selection).interactive()

sst_viz = alt.Chart(data).mark_circle().encode(
    x = 'Delta SST',
    y = 'Maximum Wind Speed',
    color = 'Storm Classification',
    tooltip = ['Name', 'Delta SST', 'Maximum Wind Speed']
).transform_filter(selection).interactive()

mld_viz | sst_viz

### Visualization 3 Analysis

Observe that in the bubblechart shown above, there is no clear correlation or relationship between the maximum sustained wind speed of a tropical storm or hurricane and the mixed layer depth of the ocean.  Instead, we see that the maximum sustained wind speed can vary widely between tropical storms or hurricanes that encounter regions of the ocean with the same mixed layer depth.  Notice that storms that are weaker, meaning they are classified as tropical waves, lows, subtropical depressions, subtropical storms, or tropical depressions, have relatively uniform maximum sustained wind speed values across several different values of mixed layer depth.  For stronger storms, meaning storms that are classified as tropical storms, hurricanes, or extratropical cyclones, there is larger variability in the maximum sustained wind speed values across various values of mixed layer depth.  However, there is not clear relationship or correlation between the maximum sustained wind speed and the mixed layer depth, even for these stronger storms.  This indicates that there is no correlation or relationship between the maximum sustained wind speed of a tropical storm or hurricane and the mixed layer depth.  Recall that this can occur because there are shallower mixed layer depths close to coastal areas that contain warm water, which helps these storms strengthen and have sustain higher maximum wind speed values but these storms also tend to form in shallower, coastal regions and these storms have lower wind speeds when they are first forming in the shallow mixed layer.  Similarly, on the open ocean, storms can strengthen at different rates which means that a storm might have a higher maximum sustained wind speed due to encountering a warmer region of sea surface temperature with a large mixed layer depth and there might be another storm which encounters slightly cooler regions of sea surface temperature with a large mixed layer depth which would cause the first storm to have a higher maximum sustained wind speed and the second storm to have a lower maximum sustained wind speed despite being in areas of roughly equal mixed layer depth.  


### Visualization 4 Analysis

As in the bubblechart shown above, the scatterplot shown above shows no clear correlation between the maximum sustained wind speed of a tropical cyclone or hurricane and the sea surface temperature delta.  There is large variability in the maximum sustained wind speed among storms with the same sea surface temperature delta.  Even when the bubblechart is filtered by storm classification and hence the scatterplot is filtered by storm classification, there is no clear relationship or correlation between the maximum sustained wind speed of a storm and the sea surface temperature delta.  Notice that as in the bubblechart, storms with weaker classifications (tropical wave, low, subtropical storm, subtropical depression, tropical depression) have lower variability in the maximum sustained wind speed at the same value of sea surface temperature delta even though there is no relationship between the two quantities.  Similarly, for storms with stronger classifications (tropical storm, hurricane, extratropical cyclone) have a larger variability in maximum sustained wind speed despite having the same value of sea surface temperature delta.  This can result from the differing levels of mixed layer depth throughout the ocean and the differing rates of storm strengthening.  Notice that over the middle of the North Atlantic, the mixed layer depths tend to be shallower so a storm with a stronger maximum sustained wind speed will cause a larger sea surface temperature delta than a storm with a weaker maximum sustained wind speed.  However, when a storm with a strong maximum sustained wind speed encounters a coastal region which has a deeper mixed layer depth, the water that replaces the displaced warm surface water is still relatively warm so the sea surface temperature delta is still relatively low despite the high maximum sustained wind speed.  

### Conclusions

Overall, we see a strong negative linear relationship between the maximum sustained wind speed and minimum pressure of tropical storms and hurricanes, no relationship between the maximum sustained wind speed of tropical storms and hurricanes and mixed layer depth, and no relationship between the maximum sustained wind speed of tropical storms and hurricanes and the sea surface temperature delta.  

### Code Citations

For this module, I used the code from the Altair lectures to guide in designing the bubblechart and scatterplots.  

For the histogram, I referenced the code from the Altair gallery, at: https://altair-viz.github.io/gallery/simple_histogram.html

For the interactions between the charts, I referenced the suggested notebook in the description for Module 2 on Canvas, available at: https://infovis.fh-potsdam.de/tutorials/infovis3interaction.html