<a id=toc></a>
# MSDS 7333 - Week 14 Case Study: Analyzing Airline Flight Delays

### Investigators
- [Matt Baldree](mailto:mbaldree@smu.edu?subject=lab14)
- [Ben Brock](bbrock@smu.edu?subject=lab14)
- [Tom Elkins](telkins@smu.edu?subject=lab14)
- [Austin Kelly](ajkelly@smu.edu?subject=lab14)


<div style='margin-left:10%;margin-right:10%;margin-top:15px;background-color:#d3d3d3;padding:5px;'>
    <h3>Instructions</h3>
    <p>Work with the airline data set (use R or Python to manage out-of-core).</p>
     <p>Answer the following questions by using the split-apply-combine technique</p>
    <ol>
        <li>Which airports are most likely to be delayed flying out of or into?</li>
        <li>Which flights with same origin and destination are most likely to be delayed?</li>
        <li>Can you regress how delayed a flight will be before it is delayed?</li>
        <li>What are the most important features for this regression?
            <ul>
            <li>Remember to properly cross-validate models.
            <li>Use meaningful evaluation criteria.
            <li>Create at least one new feature variable for the regression.
            </ul>
            
    </ol> 
            

    <p>Report Sections:</p>
    <ol>
        <li>[Introduction](#introduction) <b>(5 points)</b></li>
        <li>[Background](#background) <b>(10 points)</b></li>
        <li>[Methods](#methods) <b>(30 points)</b></li>
        <li>[Results](#results) <b>(30 points)</b></li>
        <li>[Conclusion](#conclusion) <b>(5 points)</b></li>
        <li>[Bibliography and Citation](#biblio) <b>(5 points)</b></li>
        <li>[Code](#code) <b>(5 points)</b></li>
    </ol>
     <p>Other Grading Criterium:</p>
    <ol>
        <li>Grammar and Organization <b>(10 points)</b></li>
    </ol>
</div>

<a id='introduction'></a>
## 1 - Introduction
<div style='margin-left:10%;margin-right:10%;margin-top:15px;background-color:#d3d3d3;padding:10px;'>
<h3>Introduction (<b>5 points total</b>)</h3>
</div>

### Explain what  we will be working with and how we intend to approach the issue

For this case study, we are tasked with acquiring and combining airline data from 22 separate years of airline history. Once the data is downloaded, it will be parsed and appended to a data frame in which we will be able to determine the statistics of said data. With such a large amount of data, it will be notably difficult to be able to use conventional methods to aggregate and perform calculations with conventional methods. 

The data in question totals just over 123.5 Million records and sizes up to be about 14 Gigabytes **uncompressed** of just csv data. 

That's a lot of data.

In order to be able to not only handle the data but also perform calculations over the dataframe, we will need to utilize more than just a single core of the (current) 4-core processors embedded within our machines. When more than a single processor core is utilized, we venture into the realm of parallel computing. As we parse and sift through the data, parallel computig allows for a rather novel idea: break the data down into even parts and process all three parts at the same time. Many titans of industry use platforms such as Hadoop Distributed File System (HDFS) to manage massive amounts of data relatively quickly with clusters of commodity servers. When a massive datafile comes through (in our case, 12-14 Gb), instead of just using a single core to process all of the data, we will use three cores to process 4-5 Gb of data _each_, leaving a spare core (the master) to manage all three cores.

For this case study, we were met with many roadblocks such as software compatibility with hardware along with version control. We found it was quite difficult to manage older versions of R alongside the newest version of Python, all in the same Jupyter notebook. To minimize these roadblocks, our team utilized the Python 3.4 package [Dask](https://dask.pydata.org/en/latest/) along with Python 2.7's [Graphlab-Create](https://turi.com/). Once these processes were executed in their entirety, we decided to cross-validate our findings by generating an equivalent Javascript environment to independently test our findings. 

[&uarr; ToC](#toc)

<a id="background"></a>
## 2 - Background

<div style='margin-left:10%;margin-right:10%;margin-top:15px;background-color:#d3d3d3;padding:10px;'>
<h3>Background (<b>10 points total</b>)</h3>
</div>

The dataset our group acquired was comprised of just over 123 Million records with 29 attributes. These attributes are described in this table:

### Variable descriptions of original data set
|Item|Name|Description|
|:--:|:--|:--|
|1|	Year	|1987-2008|
|2|	Month	|1-12|
|3|	DayofMonth	|1-31|
|4|	DayOfWeek	|1 (Monday) - 7 (Sunday)|
|5|	DepTime	|actual departure time (local, hhmm)|
|6|	CRSDepTime	|scheduled departure time (local, hhmm)|
|7|	ArrTime	actual |arrival time (local, hhmm)|
|8|	CRSArrTime	|scheduled arrival time (local, hhmm)|
|9|	UniqueCarrier	|unique carrier code|
|10|	FlightNum	|flight number|
|11|	TailNum	plane |tail number|
|12|	ActualElapsedTime	|in minutes|
|13|	CRSElapsedTime	|in minutes|
|14|	AirTime	|in minutes|
|15|	ArrDelay	|arrival delay, in minutes|
|16|	DepDelay	|departure delay, in minutes|
|17|	Origin	|origin IATA airport code|
|18|	Dest	|destination IATA airport code|
|19|	Distance	|in miles|
|20|	TaxiIn	|taxi in time, in minutes|
|21|	TaxiOut	|taxi out time in minutes|
|22|	Cancelled	|was the flight cancelled?|
|23|	CancellationCode	|reason for cancellation (A = carrier, B = weather, C = NAS, D = security)|
|24|	Diverted	|1 = yes, 0 = no|
|25|	CarrierDelay	|in minutes|
|26|	WeatherDelay	|in minutes|
|27|	NASDelay	|in minutes|
|28|	SecurityDelay	|in minutes|
|29|	LateAircraftDelay	|in minutes|

The three most-important (and required) questions are:

(click on each question to navigate to the section of the notebook)

<li>[Q1.What airports have the most delayed departures and arrivals?](#Question1)</li> 
<li>[Q2. What flights are most frequently delayed with same origin and destination?](#Question2)</li>
<li>[Q3. Can you predict a flight's delayed time in minutes?](#Question3)</li>

While these questions seem obvious to us, it is important to clearly identify our intent of what we are looking to explore in order to discover an appropriate answer to the proper questions. 

First and foremost, we will want to investigate (using the basic aggregation functions) just which airports are the main culprits for delayed departures and which are subject to the late arrivals. It must be declared a flight is considered to be delayed if it leaves or arrives more than 15 minutes from it's scheduled time. Something to be investigated at a later date (when adequate resources are available) is whether or not the late departures influence the late arrivals more than the late arrivals affect the late departures.

The second question begs investigation into whether or not there is a specific route plagued with said delays. With so many unique routes, it will be interesting to see whether or not one route really sticks out over the rest. Since our analysis is limited to our data, we will not be seeing mnay "entire" routes. This is attributed to the simple fact that many entire routes (e.g. New York to Los Angeles) are typically comprised of multiple sub-routes. Thus, we will be focusing on the routes which comprise the longer routes. 

With all of the data we have at our disposal, we will explore the possibility of being able to predict just _how_ delayed a flight will be based on the many factors involved. While we do have numerous factors to possibly influence the outcome of our predictions, there are also several factors outside of the scope of this study that will be considered to be confounding variables. One such variable is the weather of the locations involved. As difficult as it may be to predict the delay of a particular flight based on the day of the week coupled with the carrier, it will be far more difficult to predict exact snowfall along with wind speed for the area in question, ultimately grounding unsuspecting travelers. 

[&uarr; ToC](#toc)

<a id="methods"></a>
## 3 - Methods

<div style='margin-left:10%;margin-right:10%;margin-top:15px;background-color:#d3d3d3;padding:10px;'>
<h3>Methods (<b>30 points total</b>)</h3>

### Disuss Dask approach and our decision to go this route

In [2]:
# general stuff
import locale
# locale.setlocale(locale.LC_ALL, 'en_US')

import numpy as np

In [3]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('compute.use_bottleneck', True)
pd.set_option('compute.use_numexpr', True)

In [4]:
import os

In [5]:
from matplotlib import pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

In [6]:
import dask.dataframe as dd
from dask import compute, persist
from dask.distributed import Client, progress

# !pwd

In [7]:
# flags

PURGE_DATA = True

<div style='color:red'>
<h3>&darr; consider turning the results of this into a screenshot of whatever machine we use to do our final pass. </h3></div>

In [8]:
# start Dask distributed client and print out stats

c = Client()
c

tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x0000022991DEC730>, <tornado.concurrent.Future object at 0x0000022991DE7518>)
Traceback (most recent call last):
  File "C:\Users\austi\Anaconda3\lib\site-packages\bokeh\server\tornado.py", line 437, in _start_async
    signal.signal(signal.SIGTERM, self._sigterm)
  File "C:\Users\austi\Anaconda3\lib\signal.py", line 47, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\austi\Anaconda3\lib\site-packages\tornado\ioloop.py", line 604, in _run_callback
    ret = callback()
  File "C:\Users\austi\Anaconda3\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\austi\Anaconda3\lib\site-packages\tornado\ioloop.py", line

0,1
Client  Scheduler: tcp://127.0.0.1:50358  Dashboard: http://127.0.0.1:8787,Cluster  Workers: 4  Cores: 4  Memory: 2.53 GB


In [9]:
# if loading data, then purge existing data directory.

if PURGE_DATA:
    # delete /data directory
    from shutil import rmtree

    path = 'data'
    if os.path.exists(path):
        rmtree(path)
    
    # make /data if it doesn't exist
    path = 'data'
    if not os.path.exists(path):
        os.mkdir(path)

In [11]:
# if loading data, download files and decompress them in parallel.

DOWNLOAD_ONE_FILE_ONLY = False

if PURGE_DATA:
    import urllib.request
    import shutil
    import bz2
    
    def download_file(baseurl, yr):
        file_name = ''

        url_of_data_file = baseurl%(yr)
        file_name = 'data/%d.csv'%(yr)
        size = 0

        print('downloading', url_of_data_file, 'to', file_name)
        decompressor = bz2.BZ2Decompressor()

        # download file and decompress it
        with urllib.request.urlopen(url_of_data_file) as response, open(file_name, 'wb') as out_file:
                data = decompressor.decompress(response.read())
                out_file.write(data)
                size = len(data)
                print('file size (MB)', locale.format('%.1f', size/1000000, grouping=True))

        return(file_name, size)
    
    def print_files(files):
        totalSize = 0        
        for f in files:
            size = f[1]/1000000
            totalSize += size
            print('downloaded file:', f[0], ', of size (MB):', 
                  locale.format('%.1f', size, grouping=True))
            
        print('Number of files downloaded:', len(files), 'for a total size (MB):', 
              locale.format('%.1f', totalSize, grouping=True))


    if DOWNLOAD_ONE_FILE_ONLY:
        # testing
        download_file('http://stat-computing.org/dataexpo/2009/%d.csv.bz2', 1988)
    else:    
        # download airline data from 1987 to 2009
        yrs = range(1987, 2009)
        baseurl = 'http://stat-computing.org/dataexpo/2009/%d.csv.bz2'

        from dask import delayed
        download_file = delayed(download_file)

        files = [download_file(baseurl, yr) for yr in yrs]
        files = delayed(files)

        %time files = files.compute()   
      
        print_files(files)

Wall time: 11min 3s
downloaded file: data/1987.csv , of size (MB): 127.2
downloaded file: data/1988.csv , of size (MB): 501.0
downloaded file: data/1989.csv , of size (MB): 486.5
downloaded file: data/1990.csv , of size (MB): 509.2
downloaded file: data/1991.csv , of size (MB): 491.2
downloaded file: data/1992.csv , of size (MB): 492.3
downloaded file: data/1993.csv , of size (MB): 490.8
downloaded file: data/1994.csv , of size (MB): 501.6
downloaded file: data/1995.csv , of size (MB): 530.8
downloaded file: data/1996.csv , of size (MB): 533.9
downloaded file: data/1997.csv , of size (MB): 540.3
downloaded file: data/1998.csv , of size (MB): 538.4
downloaded file: data/1999.csv , of size (MB): 552.9
downloaded file: data/2000.csv , of size (MB): 570.2
downloaded file: data/2001.csv , of size (MB): 600.4
downloaded file: data/2002.csv , of size (MB): 530.5
downloaded file: data/2003.csv , of size (MB): 626.7
downloaded file: data/2004.csv , of size (MB): 669.9
downloaded file: data/2005

In [12]:
# print the head of a csv file

print('csv file format')
# !head data/1987.csv

# load csv files into a dataframe in parallel

filename = os.path.join('data', '*.csv')
print('Loading', filename, 'files')

import dask.dataframe as dd
%time df_csv = dd.read_csv(filename, assume_missing=True, \
                           dtype={'TailNum':np.object, 'CancellationCode':np.object}, \
                           storage_options={'anon': True}).rename(columns=str.lower)

print(df_csv.dtypes)
df_csv.head()

csv file format
Loading data\*.csv files
Wall time: 523 ms
year                 float64
month                float64
dayofmonth           float64
dayofweek            float64
deptime              float64
crsdeptime           float64
arrtime              float64
crsarrtime           float64
uniquecarrier         object
flightnum            float64
tailnum               object
actualelapsedtime    float64
crselapsedtime       float64
airtime              float64
arrdelay             float64
depdelay             float64
origin                object
dest                  object
distance             float64
taxiin               float64
taxiout              float64
cancelled            float64
cancellationcode      object
diverted             float64
carrierdelay         float64
weatherdelay         float64
nasdelay             float64
securitydelay        float64
lateaircraftdelay    float64
dtype: object


Unnamed: 0,year,month,dayofmonth,dayofweek,deptime,crsdeptime,arrtime,crsarrtime,uniquecarrier,flightnum,tailnum,actualelapsedtime,crselapsedtime,airtime,arrdelay,depdelay,origin,dest,distance,taxiin,taxiout,cancelled,cancellationcode,diverted,carrierdelay,weatherdelay,nasdelay,securitydelay,lateaircraftdelay
0,1987.0,10.0,14.0,3.0,741.0,730.0,912.0,849.0,PS,1451.0,,91.0,79.0,,23.0,11.0,SAN,SFO,447.0,,,0.0,,0.0,,,,,
1,1987.0,10.0,15.0,4.0,729.0,730.0,903.0,849.0,PS,1451.0,,94.0,79.0,,14.0,-1.0,SAN,SFO,447.0,,,0.0,,0.0,,,,,
2,1987.0,10.0,17.0,6.0,741.0,730.0,918.0,849.0,PS,1451.0,,97.0,79.0,,29.0,11.0,SAN,SFO,447.0,,,0.0,,0.0,,,,,
3,1987.0,10.0,18.0,7.0,729.0,730.0,847.0,849.0,PS,1451.0,,78.0,79.0,,-2.0,-1.0,SAN,SFO,447.0,,,0.0,,0.0,,,,,
4,1987.0,10.0,19.0,1.0,749.0,730.0,922.0,849.0,PS,1451.0,,93.0,79.0,,33.0,19.0,SAN,SFO,447.0,,,0.0,,0.0,,,,,


In [13]:
# drop columns we don't need or want
df_csv = df_csv.drop(['tailnum', 'actualelapsedtime', 'crselapsedtime', 'airtime', \
             'taxiin', 'taxiout', 'cancelled', 'cancellationcode', \
             'diverted', 'carrierdelay', 'weatherdelay', 'nasdelay', \
             'securitydelay', 'lateaircraftdelay'], axis=1)

In [None]:
# categorize appropriate columns

object_columns = ['uniquecarrier', 'origin', 'dest', 'year', 'month', 'dayofmonth', 'dayofweek']
for i in object_columns:
    df_csv[i] = df_csv[i].astype('category')

df_csv = df_csv.categorize()
df_csv.dtypes

In [1]:
# length of dataframe

number_of_items = len(df_csv)
locale.format('%d', number_of_items, grouping=True)

NameError: name 'df_csv' is not defined

In [None]:
# Top 10 origin airports

origin_counts = df_csv.origin.value_counts().head(10)
print('Top origin airports')
print(origin_counts)

origin_counts.plot(kind='barh', figsize=(8,4), title='Top Origin Airports')

[&uarr; ToC](#toc)

<a id=Question1></a>
## Q1. What airports have the most delayed departures and arrivals?

A flight is delayed if it leaves or arrives more than 15 minutes after its scheduled time.

In [None]:
# Filter departure flights

df_delayed_departure = df_csv[df_csv.depdelay > 15]
delayed_counts = df_delayed_departure.origin.value_counts().head(10)
print('Top delayed origin airports')
print(delayed_counts)

delayed_counts.plot(kind='barh', figsize=(8,4), title='Top Delayed Origin Airports')

In [None]:
# Filter arrival flights

df_delayed_arrival = df_csv[df_csv.arrdelay > 15]
delayed_counts = df_delayed_arrival.dest.value_counts().head(10)
print('Top delayed destination airports')
print(delayed_counts)

delayed_counts.plot(kind='barh', figsize=(8,4), title='Top Delayed Destination Airports')

[&uarr; ToC](#toc)

<a id = Question2></a>
## Q2. What flights are most frequently delayed with same origin and destination?

A flight is delayed if it leaves or arrives more than 15 minutes after its scheduled time.

In [None]:
# Filter the dataset to delayed flights

df_filtered = df_csv.loc[(df_csv.arrdelay > 15) or (df_csv.depdelay > 15)].compute()
df_filtered.head()

In [None]:
# categorize appropriate columns

object_columns = ['origin', 'dest', 'flightnum']
for i in object_columns:
    df_filtered[i] = df_filtered[i].astype('category')

df_filtered = df_filtered.categorize()

In [None]:
df_filtered.dtypes

In [None]:
# Group filtered dataset by origin, dest, and flightnum

grp = df_filtered.groupby(['origin', 'dest', 'flightnum']) \
.flightnum.count().reset_index(name='count').sort_values(['count'], ascending=False)

grp.head(15)

In [None]:
# Plot results

grp.head(15).plot(kind='barh', figsize=(8,4), \
                  title='Top Delayed Flights with Same Origin and Destination Airports')

[&uarr; ToC](#toc)

<a id=Question3></a>
## Q3. Can you predict a flight's delayed time in minutes?

Create a prediction model to predict flight delays. Dependent features like weather were not added because you don't know the weather accurately for the day. A new feature was added named `hdays` to indicate how many days the flight was from a holiday. Holidays have a significant impact on travel. Other key features might be helpful, but time did not permit further exploration.

[&darr; we should probably add this to the end &darr;](#biblio)

The work is borrowed from 
> `https://jessesw.com/Air-Delays/`,  
> `https://gist.github.com/mrocklin/19c89d78e34437e061876a9872f4d2df`, and 
> `https://github.com/dmlc/xgboost/blob/master/demo/guide-python/sklearn_examples.py`.

In [None]:
# Function to determine the difference between flight date and nearest holiday.
# see https://jessesw.com/Air-Delays/

from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
holidays = cal.holidays(start='1987-01-01', end='2008-12-31').to_pydatetime()

from datetime import date, datetime, timedelta

def days_until_holiday(d):
    ans = timedelta(100000)
    for i in range(len(holidays)):
        candidate = abs(holidays[i]-d)
        if candidate < ans:
            ans = candidate
    return(float(ans.days))

#assert(days_until_holiday(datetime(2001, 1, 1))==0)
#assert(days_until_holiday(datetime(2009, 1, 1))==7)

In [None]:
# random sample of data (features)
df = df_csv.sample(frac=0.05)

# drop rows with NaN
prev_count = len(df)
df = df.dropna()
number_of_items = len(df)
print('dropped %s percent of rows with NA' % locale.format('%.2f', prev_count/number_of_items))

# target data
y = df.depdelay

df = df.drop(['depdelay'], axis=1)

df, y = persist(df, y)
progress(df, y)

In [None]:
# Using apply is slow. Need to see if there is a faster way.

# create days from nearest holiday column
df_csv['hdays'] = df_csv.apply(lambda r: days_until_holiday(datetime(int(r.year), int(r.month), int(r.dayofmonth))),
                               meta=float, axis=1)

# convert scheduled time into hrs
df_csv['depthr'] = df_csv.apply(lambda r: np.trunc(r.crsdeptime / 100.), meta='category', axis=1)
df_csv['arrhr'] = df_csv.apply(lambda r: np.trunc(r.crsarrtime / 100.), meta='category', axis=1)
df_csv.depthr = df_csv.depthr.astype('category')
df_csv.arrhr = df_csv.arrhr.astype('category')

df_csv.head()

In [None]:
# drop columns we don't need or want
df_csv = df_csv.drop(['deptime', 'arrtime', 'flightnum', 'crsarrtime', 'crsdeptime'], axis=1)
df_csv.dtypes

In [None]:
# categorize appropriate columns

object_columns = ['uniquecarrier', 'origin', 'dest', 'year', 'month', 'dayofmonth', 'dayofweek']
for i in object_columns:
    df[i] = df[i].astype('category')

df = df.categorize()

In [None]:
df.dtypes

In [None]:
# random sample of data (features)
df = df_csv.sample(frac=0.2)

# drop rows with NaN
prev_count = len(df)
df = df.dropna()
number_of_items = len(df)
print('dropped %s percent of rows with NA' % locale.format('%.2f', prev_count/number_of_items))

# target data
y = df.depdelay

df = df.drop(['depdelay'], axis=1)

df, y = persist(df, y)
progress(df, y)

In [None]:
df.head()

In [None]:
y.head()

## One hot encode

In [None]:
X = dd.get_dummies(df.categorize())

In [None]:
X.describe().compute()

In [None]:
len(X.columns)

In [None]:
X.head()

[&uarr; ToC](#toc)

## Split and Train

In [None]:
data_train, data_test = X.random_split([0.9, 0.1], random_state=1234)
labels_train, labels_test = y.random_split([0.9, 0.1], random_state=1234)

## Model

In [None]:
# from sklearn.preprocessing import StandardScaler
# scaler = StandardScaler()
# scaler.fit(data_train)
# data_train = scaler.transform(data_train)
# data_test = scaler.transform(data_test)

In [None]:
from sklearn.linear_model import BayesianRidge
model = BayesianRidge()
model.fit(data_train, labels_train)

In [None]:
# This cell was interrupted by keyboard. Likely where we're going to leave it for the night.
from sklearn.linear_model import Lasso
model = Lasso(alpha=0.1)
model.fit = delayed(model.fit)
model.fit(data_train, labels_train).compute()

In [None]:
from sklearn.linear_model import SGDRegressor

model = SGDRegressor(random_state = 0) 
model.fit(data_train, labels_train)

In [None]:
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0)
model.fit = delayed(model.fit)
model.fit(data_train, labels_train).compute()

In [None]:
from sklearn.metrics import mean_absolute_error
    
y_true, y_pred = labels_test, model.predict(data_test)
    
print('Mean absolute error of SGD regression was:')
print(locale.format('%.1f', mean_absolute_error(y_true, y_pred), grouping=True))

### Discuss graphlab create approach

### Discuss javascript approach

<a id="results"></a>
## 4 - Results

<div style='margin-left:10%;margin-right:10%;margin-top:15px;background-color:#d3d3d3;padding:10px;'>
<h3>Results (<b>30 points total</b>)</h3>

### Discuss results of Dask

### Discuss results of gl-create

### Discuss results of javascript

### Talk about how they all tie in (cross validation)

[&uarr; ToC](#toc)

<a id="conclusion"></a>
## 5 - Conclusion

<div style='margin-left:10%;margin-right:10%;margin-top:15px;background-color:#d3d3d3;padding:10px;'>
<h3>Conclusion (<b>5 points total</b>)</h3>

### what did we find and how did we answer all of the questions? 

[&uarr; ToC](#toc)

<a id="biblio"></a>
## 6 - Bibliography and Citation

<div style='margin-left:10%;margin-right:10%;margin-top:15px;background-color:#d3d3d3;padding:10px;'>
<h3>Bibliography and Citation (<b>5 points total</b>)</h3>
</div>

[&uarr; ToC](#toc)

<a id="code"></a>
## 7 - Code

<div style='margin-left:10%;margin-right:10%;margin-top:15px;background-color:#d3d3d3;padding:10px;'>
<h3>Code (5 points)</h3>
</div>

[&uarr; ToC](#toc)