### CSC110 Final Project | University of Toronto

Source code distributed under the MIT License.    
Datasets distributed under their respective original licenses.

```
MIT License

Copyright (c) 2020 Mu "Samm" Du

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

```

# The Effect of Global Warming on Hurricane and Typhoon Occurrence

### 0. Install & Import Dependencies

In [1]:
!pip install numpy
!pip install scikit-learn
!pip install plotly

You should consider upgrading via the '/home/samm/Documents/UofT/Year1/Courses/2020_Fall/CSC110/csc110_env/bin/python3.8 -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/home/samm/Documents/UofT/Year1/Courses/2020_Fall/CSC110/csc110_env/bin/python3.8 -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/home/samm/Documents/UofT/Year1/Courses/2020_Fall/CSC110/csc110_env/bin/python3.8 -m pip install --upgrade pip' command.[0m


In [2]:
import csv
import pprint as pp
import math
import datetime as dt
import numpy as np
import sklearn as skl
import plotly.graph_objects as go
from plotly.subplots import make_subplots

### 1. Preprocess Data

#### 1.1. Preprocess Berkeley Earth Temperature Data

Note from the data file:
```
Temperatures are in Celsius and reported as anomalies 
relative to the Jan 1951-Dec 1980 average. Uncertainties represent the 95% confidence 
interval for statistical and spatial undersampling effects as well as ocean biases.
```

```
Estimated Jan 1951-Dec 1980 global mean temperature (C)
    Using air temperature above sea ice:   14.168 +/- 0.045
    Using water temperature below sea ice: 14.714 +/- 0.045
```

There are 2 sets of data within this file, one where "Sea Ice Temperature Inferred from Air Temperatures", the other where "Sea Ice Temperature Inferred from Water Temperatures". 

I will only use the dataset where sea ice temperatures are inferred from water temperatures, the specific choice is not relevant for my analysis, and I do not have the domain-specific knowledge to weigh the benefits of each one.

I will also trim off columns not relevant for my analysis

In [3]:
# reference temperature
temp_ref = 14.714
temp_ref_unc = 0.045

# only retain relevant data
temp_data_trimmed = []
with open('data/berkeley-earth-global-temperature.txt', 'r') as temp_file:
    correct_line = False
    for line in temp_file:
        # only use data where Sea Ice Temperature Inferred from Water Temperatures
        if "Water Temperatures" in line:
            correct_line = True
        # skip header, empty lines, and data where Sea Ice Temperature Inferred from Air Temperatures,
        # and only use the year, month, temperature anomaly, and uncertainty values
        if correct_line and line != " \n" and line[0] != "%":
            temp_data_trimmed.append(line.strip().split()[:4])
        # print out omitted header
        elif correct_line:
            print("[HEADER]> ", line)

[HEADER]>  % Global Average Temperature Anomaly with Sea Ice Temperature Inferred from Water Temperatures

[HEADER]>  % 

[HEADER]>  %                  Monthly          Annual          Five-year        Ten-year        Twenty-year

[HEADER]>  % Year, Month,  Anomaly, Unc.,   Anomaly, Unc.,   Anomaly, Unc.,   Anomaly, Unc.,   Anomaly, Unc.

[HEADER]>   



The trimmed temperature data is organized as follows:
```
[year: Str, month: Str, temperature_anomaly: Str, uncertainty: Str]
```

In [4]:
# print out processed data for visual inspection
pp.pprint(temp_data_trimmed[:10])
pp.pprint(temp_data_trimmed[-10:])

[['1850', '1', '-0.698', '0.385'],
 ['1850', '2', '-0.196', '0.372'],
 ['1850', '3', '-0.420', '0.355'],
 ['1850', '4', '-0.602', '0.311'],
 ['1850', '5', '-0.611', '0.218'],
 ['1850', '6', '-0.352', '0.240'],
 ['1850', '7', '-0.217', '0.248'],
 ['1850', '8', '-0.214', '0.244'],
 ['1850', '9', '-0.425', '0.210'],
 ['1850', '10', '-0.584', '0.223']]
[['2020', '1', '1.060', '0.051'],
 ['2020', '2', '1.084', '0.059'],
 ['2020', '3', '0.992', '0.056'],
 ['2020', '4', '0.955', '0.058'],
 ['2020', '5', '0.820', '0.055'],
 ['2020', '6', '0.761', '0.058'],
 ['2020', '7', '0.801', '0.055'],
 ['2020', '8', '0.798', '0.054'],
 ['2020', '9', '0.794', '0.052'],
 ['2020', '10', '0.714', '0.065']]


Print out the number of entries:

In [5]:
print(len(temp_data_trimmed))

2050


I will convert values to more appropriate datatypes, apply the reference temperature to the monthly anomaly data, and compute the combined uncertainties.   

The processed temperature data would be organized as follows:
```
[year: Int, month: Int, monthly_temerature_celsius: Float, uncertainty: Float]
```

In [57]:
temp_data = []

for line in temp_data_trimmed:
    
    line_proc = []

    # convert the year value to an integer
    line_proc.append(int(line[0]))

    # convert the month value to an integer
    line_proc.append(int(line[1]))

    # apply the reference temperature to the monthly anomaly
    line_proc.append(float(line[2]) + temp_ref)

    # compute the combined uncertainties between the monthly uncertainty and the reference uncertainty
    line_proc.append(
        math.sqrt(
            float(line[3]) ** 2 + temp_ref_unc ** 2
        )
    )

    temp_data.append(line_proc)
    

In [58]:
# print out processed data for visual inspection
pp.pprint(temp_data[:10])
pp.pprint(temp_data[-10:])

[[1850, 1, 14.016, 0.3876209488662861],
 [1850, 2, 14.518, 0.3747118893229837],
 [1850, 3, 14.294, 0.3578407467016578],
 [1850, 4, 14.112, 0.31423876272668844],
 [1850, 5, 14.103, 0.22259604668547012],
 [1850, 6, 14.362, 0.2441823089414956],
 [1850, 7, 14.497, 0.2520495982936692],
 [1850, 8, 14.5, 0.24811489274124598],
 [1850, 9, 14.289, 0.21476731594914528],
 [1850, 10, 14.13, 0.22749505489130967]]
[[2020, 1, 15.774000000000001, 0.0680147042925278],
 [2020, 2, 15.798, 0.07420242583635658],
 [2020, 3, 15.706, 0.07184010022264724],
 [2020, 4, 15.669, 0.07340980860893181],
 [2020, 5, 15.534, 0.07106335201775947],
 [2020, 6, 15.475, 0.07340980860893181],
 [2020, 7, 15.515, 0.07106335201775947],
 [2020, 8, 15.512, 0.07029224708315988],
 [2020, 9, 15.508000000000001, 0.06876772498781678],
 [2020, 10, 15.428, 0.07905694150420949]]


Print out the number of entries (which should be the same as before):

In [59]:
print(len(temp_data))

2050


Plotting the Berkeley Earth Global Temperature Data

In [62]:
# procure x- and y-values for the plot

xvals = [
    dt.datetime(year=line[0], month=line[1], day=1)
    for line in temp_data
]

yvals = [line[2] for line in temp_data]

In [72]:
fig = go.Figure()

# produce a scatter plot of all temperature measurements
fig.add_trace(
    go.Scatter(
        x=xvals,
        y=yvals,
        mode='markers',
        name="Monthly Temperature",
        marker=dict(
            size=5,
            cmax=max(yvals),
            cmin=min(yvals),
            color=[line[2] for line in temp_data],
            colorscale="Inferno"
        ),
    )
)

# plot the reference temperature used to record anomaly values in the original dataset
fig.add_trace(
    go.Scatter(
        x=[dt.datetime(year=1850, month=1, day=1), dt.datetime(year=2020, month=10, day=1)],
        y=[temp_ref, temp_ref],
        mode="lines",
        line=go.scatter.Line(color="gray"),
        name="Jan 1951-Dec 1980 <br>global mean temp."
    )
)

fig.update_layout(
    title="Berkeley Earth | Monthly Global Temperature 1850-2014",
    xaxis_title="Year and Month",
    yaxis_title="Global Temperature (°C)",
)

fig.show()

#### 1.2 Preprocess NOAA Hurricane and Typhoon Data

Raw HURDAT2 format references:
* Atlantic: https://www.nhc.noaa.gov/data/hurdat/hurdat2-format-nov2019.pdf
* Pacific: https://www.nhc.noaa.gov/data/hurdat/hurdat2-format-nencpac.pdf

I will read the NOAA HURDAT2 data into the following format for each unique hurricane and typhoon event:
```
[ID: Str, [(year: Int, month: Int)]]
```

In [67]:
def read_noaa_data(filename):
    """Read the NOAA Hurricane and Typhoon data given by the filename."""
    
    noaa_data = []
    
    with open(filename, 'r') as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')

        curr_event = ['', 0, 0, 0, 0]
        
        for line in csv_reader:

            # identify event header
            if len(line) <= 10:
                # if current event is already populated, append it to the
                # dataset and clear the current event
                if curr_event != ['', 0, 0, 0, 0]:
                    noaa_data.append(curr_event)
                    curr_event = ['', 0, 0, 0, 0]
                # record the event ID to the current event
                curr_event[0] = line[0]

            else:
                # populate start year and month if not populated
                if curr_event[1] == 0 and curr_event[2] == 0:
                    curr_event[1] = int(line[0][:4])
                    curr_event[2] = int(line[0][4:6])
                # update the end year and month if the current line's value is graeter than the
                #https://gitlab.com/sammdu/csc110-final-project/-/raw/master/main.ipynb existing values
                if int(line[0]) > int(str(curr_event[3]) + str(curr_event[4])):
                    curr_event[3] = int(line[0][:4])
                    curr_event[4] = int(line[0][4:6])

    return noaa_data

In [68]:
# read and process NOAA data
ht_data_atl = read_noaa_data("data/noaa-atlantic.txt")
ht_data_pac = read_noaa_data("data/noaa-pacific.txt")

In [69]:
# print out processed atlantic data for visual inspection
pp.pprint(ht_data_atl[:10])
pp.pprint(ht_data_atl[-10:])

[['AL011851', 1851, 6, 1851, 6],
 ['AL021851', 1851, 7, 1851, 7],
 ['AL031851', 1851, 7, 1851, 7],
 ['AL041851', 1851, 8, 1851, 8],
 ['AL051851', 1851, 9, 1851, 9],
 ['AL061851', 1851, 10, 1851, 10],
 ['AL011852', 1852, 8, 1852, 8],
 ['AL021852', 1852, 9, 1852, 9],
 ['AL031852', 1852, 9, 1852, 9],
 ['AL041852', 1852, 9, 1852, 9]]
[['AL102019', 2019, 9, 2019, 9],
 ['AL112019', 2019, 9, 2019, 9],
 ['AL122019', 2019, 9, 2019, 9],
 ['AL132019', 2019, 9, 2019, 10],
 ['AL142019', 2019, 10, 2019, 10],
 ['AL152019', 2019, 10, 2019, 10],
 ['AL162019', 2019, 10, 2019, 10],
 ['AL172019', 2019, 10, 2019, 10],
 ['AL182019', 2019, 10, 2019, 10],
 ['AL192019', 2019, 10, 2019, 11]]


In [70]:
# print out processed pacific data for visual inspection
pp.pprint(ht_data_pac[:10])
pp.pprint(ht_data_pac[-10:])

[['EP011949', 1949, 6, 1949, 6],
 ['EP021949', 1949, 6, 1949, 6],
 ['EP031949', 1949, 9, 1949, 9],
 ['EP041949', 1949, 9, 1949, 9],
 ['EP051949', 1949, 9, 1949, 9],
 ['EP061949', 1949, 9, 1949, 9],
 ['EP011950', 1950, 6, 1950, 6],
 ['EP021950', 1950, 7, 1950, 7],
 ['EP031950', 1950, 7, 1950, 7],
 ['CP011950', 1950, 8, 1950, 8]]
[['EP102019', 2019, 8, 2019, 8],
 ['EP112019', 2019, 9, 2019, 9],
 ['EP122019', 2019, 9, 2019, 9],
 ['EP132019', 2019, 9, 2019, 9],
 ['EP142019', 2019, 9, 2019, 9],
 ['EP152019', 2019, 9, 2019, 9],
 ['EP162019', 2019, 9, 2019, 10],
 ['EP182019', 2019, 10, 2019, 10],
 ['EP192019', 2019, 10, 2019, 10],
 ['EP202019', 2019, 11, 2019, 11]]


I will aggregate the hurricane and typhoon data into number of events per month, into the following format:
```
[year: Int, month: Int, event_count: Int]
```

In [None]:
def count_ht_data(data):
    """Return a dictionary of the number of hurricane events for every month from the input data."""
    event_counts = {}
    
    for event in data:
        pass
    
    return None