# Lab 5, Part 2 – Conclusion
## CSS Summer Bootcamp, Week 1 🥾

In [None]:
# Run this cell, and don't change it!
!pip install otter-grader

In [None]:
# Run this cell, and don't change it!
import otter
grader = otter.Notebook()

import numpy as np
import pandas as pd
import json

## Question 1: Round the World ✈️

In this section, we will work with data about large airports around the world. Specifically, we're going to work towards **finding the distance between a sequence of airports**. When travelers fly to multiple airports around the world on a single itinerary, they take what is known as a "round-the-world" (RTW) trip.

<img src=https://www.ana.co.jp/www2/amc/partner-flight-awards/roundtheworld_en.jpg width=40%>

This question has many subparts – think of it as a mini-project – but the end result is really cool!

### Question 1a

Along with this lab, we've uploaded a file with the extension `.json` that contains information about airports around the world. Your job in this question is to figure out what that file is called and to load it into your notebook as a Python dictionary!

To see what the file is called, click the "Jupyter" button in the top left of your notebook window. Then, navigate in the file explorer to the folder for this lab, and check in the data folder. You'll see a file that ends in `.json`. Below, assign `file_name` to the name of this file (including the extension) as a string. **Make sure to include `'data/'` at the start of the file name!**

<!--
BEGIN QUESTION
name: q1a
-->

In [None]:
file_name = ...
file_name

In [None]:
grader.check("q1a")

### Question 1b

Now, use the `with open` structure we learned about in lecture to load the file from the previous subpart as a **dictionary**. Store it in the variable `airports`.

***Hint:*** We've already run `import json`.

<!--
BEGIN QUESTION
name: q1b
-->

In [None]:
airports = ...
airports

In [None]:
grader.check("q1b")

### Question 1c

The dictionary `airports` contains four pieces of information for each airport:
- Code (stored as the key)
- Name
- Latitude
- Longitude

In [None]:
airports['SAN']

As mentioned at the start of the question, we're eventually going to write a function that takes in a list of airport codes and returns the total distance between them all. To do this, we'll need an easy way of accessing the latitudes and longitudes of an airport, given its code.

Below, complete the implementation of the function `lat_lon`, which takes in a string `code` and returns the latitude and longitude of the airport whose code is `code` **as a tuple**. Example behavior is given below.

```py
>>> lat_lon('SAN')
(32.7336006165, -117.190002441)

>>> lat_lon('DTW')
(42.212398529052734, -83.35340118408203)
```

<!--
BEGIN QUESTION
name: q1c
-->

In [None]:
def lat_lon(code):
    ...
    
# Uncomment this, but test the function out on your own inputs!
# lat_lon('SAN')

In [None]:
grader.check("q1c")

### Question 1d

Great! Now we're able to easily access the latitude and longitude of an airport.

Next, we need to be able to find the **distance** between two locations on the surface of Earth, given their latitudes and longitudes. As you know, Earth is a sphere. To find the distance between two points on Earth's surface, we must use the **Haversine distance** formula.

<img src='https://miro.medium.com/max/564/1*c6YJw_Cv8u3O42CaAOrLRw.png' width=20%>

Buckle up! Per [Wikipedia](https://en.wikipedia.org/wiki/Haversine_formula), the Haversine distance $d$ between two points is

$$d = 2 r \arcsin \left( \sqrt{ \sin^2 \left( \frac{\text{lat}_2 - \text{lat}_1}{2} \right) + \cos \left( \text{lat}_1 \right) \cdot \cos \left( \text{lat}_2 \right) \cdot \sin^2 \left(  \frac{\text{lon}_2 - \text{lon}_1}{2} \right)          } \right)$$
where...
- $(\text{lat}_1, \text{lon}_1)$ are the latitude and longitude of the first point, and $(\text{lat}_2, \text{lon}_2)$ are the latitude and longitude of the second point **in radians**
- $r$ is the radius of the Earth (in your calculation, use 3958.8, the Earth's radius in miles)
- $\sin^2 (x)$ means the same as $(\sin(x))^2$

Below, complete the implementation of the function `haversine_distance`, which takes in the latitudes and longitudes of two points on Earth's surface and returns the distance between them in miles, as per the above formula. Example behavior is given below.

```py
# Test your function on this exact set of arguments!
>>> haversine_distance(32.7336006165, -117.190002441, 42.212398529052734, -83.35340118408203)
1952.4829738365934
```

***Note:*** The formula above only works when the latitudes and longitudes are in **radians**, but the information we have is in terms of **degrees**. As such, we've added in a line at the top of `haversine_distance` that converts the input arguments from degrees to radians; don't change this.

It's a good idea to break the calculation into smaller pieces, and save those smaller pieces to variables. Also, **don't spend more than 10 minutes on this subpart!** If you're stuck, ask us for help! 😄

<!--
BEGIN QUESTION
name: q1d
-->

In [None]:
def haversine_distance(lat1, lon1, lat2, lon2):
    # Note: the latitudes and longitudes in our data are stored in degrees
    # but like most mathematical formulas, Haversine distance requires them to be in radians
    lat1, lon1, lat2, lon2 = np.radians(lat1), np.radians(lon1), np.radians(lat2), np.radians(lon2)
    
    ...
    
# Uncomment the following line once you've finished
# The result should be close to 1952.4829738365934
# haversine_distance(32.7336006165, -117.190002441, 42.212398529052734, -83.35340118408203)

In [None]:
grader.check("q1d")

### Question 1e

Now, complete the implementation of the function `airport_distance`, which takes in the codes of two airports as strings and returns the Haversine distance between the two airports. Example behavior is given below.

```py
>>> airport_distance('SAN', 'DTW')
1952.4829738365934

>>> airport_distance('LAX', 'SYD')
7494.520943620303

>>> airport_distance('JFK', 'AMS')
3633.5227886711236
```

***Hint:*** Use the work you did in Questions 1c and 1d. Our solution is only three lines long, and uses two of the previous functions you defined!

<!--
BEGIN QUESTION
name: q1e
-->

In [None]:
def airport_distance(code1, code2):
    ...

In [None]:
grader.check("q1e")

### Question 1f

We're finally ready to put everything together and solve our original problem. Recall, our goal at the start of this section was to find the distance between a sequence of airports. That's what you'll do here.

Below, complete the implementation of the function `total_ticket_distance`, which takes in a list of airport codes and returns the total distance between all airports, if one were to fly from airport 0 to airport 1, then airport 1 to airport 2, then airport 2 to airport 3, and so on.

For example, `total_ticket_distance(['EWR', 'SFO', 'LHR', 'HKG'])` should return the distance between `'EWR'` and `'SFO'`, plus the distance between `'SFO'` and `'LHR'`, plus the distance between `'LHR'` and `'HKG'`. Example behavior is shown below.

```py
>>> total_ticket_distance(['EWR', 'SFO', 'LHR', 'HKG'])
13897.246208798144

>>> total_ticket_distance(['SAN', 'LAX', 'OAK', 'SEA'])
1118.577286339682

```

Remember to rely on your previously defined functions – don't repeat yourself!

<!--
BEGIN QUESTION
name: q1f
-->

In [None]:
def total_ticket_distance(codes):
    ...

In [None]:
grader.check("q1f")

Nice job! We can verify our work by using the site [Great Circle Mapper](http://www.gcmap.com/mapui?P=EWR-SFO-LHR-HKG), which does the same thing. The aforementioned link computes the distance EWR -> SFO -> LHR -> HKG; its result should be ~1% off from yours above (which is to be expected).

### Question 1g

Now that you're able to calculate the distance traveled on a **route** consisting of several cities (this is what `total_ticket_distance` does), your friend has come to you with a favor to ask. They want to give you a list of cities that they're interested in visiting, and want you to come up with the order that they should visit these cities in **in order to minimize travel distance**.

For example, they may say they're interested in visiting BOS, DEN, and SAN. There are 6 possible orders in which they could fly between these cities (ignore what happens before their trip and after their trip):
- BOS -> DEN -> SAN
- BOS -> SAN -> DEN
- DEN -> BOS -> SAN
- DEN -> SAN -> BOS
- SAN -> BOS -> DEN
- SAN -> DEN -> BOS

Some of these routes are further than others. The longest possible route is SAN -> BOS -> DEN (and the equivalent reverse route DEN -> BOS -> SAN), at 4332.02 miles. The shortest possible route is SAN -> DEN -> BOS (and the equivalent reverse route BOS -> DEN -> SAN), at 2601.74 miles. In this case, you would advise your friend to fly SAN -> DEN -> BOS (and in our case, to visit us first 😊).

Before we set you off to compute the **best route**, there's a module (and function) you need to know about: `itertools`. Below, we import the `permutations` function from `itertools`. `permutations` takes in a collection and returns a **list of tuples** containing all possible ways of ordering the elements in the collection.

In [None]:
from itertools import permutations

In [None]:
list(permutations([1, 2, -3]))

Below, complete the implementation of the function `optimal_route`, which takes in a **set** containing airport codes and returns a **tuple** of airport codes whose total travel distance is as short as possible. Example behavior is given below. 

```py
>>> optimal_route({'SAN', 'DEN', 'BOS'})
('BOS', 'DEN', 'SAN')

>>> optimal_route({'SAN', 'BOM', 'LHR', 'CDG', 'SLC', 'MSP'})
('SAN', 'SLC', 'MSP', 'LHR', 'CDG', 'BOM')
```

***Note:*** It's possible that your function returns the desired tuples in reverse order, and that's fine. (After all, the distance from A to B to C in the air is the same as the distance from C to B to A.)

<!--
BEGIN QUESTION
name: q1g
-->

In [None]:
def optimal_route(codes_set):
    # Don't change these two lines.
    # You will need to update min_route and min_dist
    # within your loop.
    min_route = None
    min_dist = np.inf
    
    ...
    
    return min_route

In [None]:
grader.check("q1g")

Note that the algorithm you (almost certainly) used is very _inefficient_. In fact, adding just a single new airport increases the amount of time it takes to find the optimal route dramatically! Take a look:

In [None]:
%%time
optimal_route({'SIN', 'DTW', 'LAX', 'EWR', 'DPS'})

In [None]:
%%time
optimal_route({'SIN', 'DTW', 'LAX', 'EWR', 'DPS', 'SAN'})

What you likely saw above is that it took ~4x longer to find the optimal route with 6 airports than with 5. What if we add 1 more?

In [None]:
%%time
optimal_route({'SIN', 'DTW', 'LAX', 'EWR', 'DPS', 'SAN', 'YYZ'})

Woah! It took ~7x longer to find the optimal route with 7 airports than with 6. If we had even just 10 airports, this process would take quite a while! This may be one of those instances where it's easier to open up a map and look for the optimal solution ourselves than it is to have Python figure it out for us.

That's it!