# Homework 2
### DSCI 510 / USC / Hermjakob section / Spring 2025

Hello and welcome to HW2.

Guidelines:

Please write and submit the programs and data files below by the deadline: Friday, March 28, at 6:00pm Pacific time

You must complete the assignments individually. If you have trouble completing the assignment, please let one of the teaching assistants (TAs) know, during the lab or their office hours. They will help and guide you, but they will not write code for you and no one else should :) !!!

The submission will include several components. Submissions to HW2 will contain **3** files:
* HW2_[YOUR FIRSTNAME]_[YOUR LASTNAME].ipynb (for all your code)
* HW2_[YOUR FIRSTNAME]_[YOUR LASTNAME]_airports.tsv (result of Q2)
* HW2_[YOUR FIRSTNAME]_[YOUR LASTNAME]_airport-coordinates.tsv (result of Q3)

You may look up resources online like python docs and stackoverflow. You may look up topics, but not the questions themselves.

You can submit only one time. Your grade will be based on this submission.

This Homework2 requires a substantial amount of work. You can almost certainly **not** do it in a single day, so please start early.

# Q1 [20 points]

### Country population data

**Python topics:** BeautifulSoup, requests, dictionaries

The task is about extracting relevant population data from a Wikipedia table into a Python dictionary.

Write the function __load_population_dict__ to load the content of https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population into a dictionary,
using the country as a key, and the tuple (population_count, population_percentage) as a value.

Strategy
* Use the _requests_ library to read the content of the data file on the internet.
* Use the _BeautifulSoup_ library to first __parse__ the content you downloaded into a _soup_ object.
* Use the _BeautifulSoup_ library to navigate the _soup_ object and extract the info you need.

Tips
- One way to gain a quick overview if the content of an HTML webpage is to visit it in a browser (such as Chrome) and inspect the HTML code on that page through "View Page Source" or similar.

- Note that the HTML file contains several tables, of which only one is of interest to us; one way to find the correct table is to check its caption for tell-tale signs.

FAQs: <br>
Q. How to read/parse a website?  
A. For that, as taught in class you would need to use libraries like BeautifulSoup and requests.  

Q. I don't know html, what should I do?  
A. No worries. For this lab, you only need to understand that html is another type of a language made up of tags. For example, it has heading info in the *head* tag, paragraph has *p* tag, and html links have *a* tag. So, using the tag, you can get the info you need!  

Q. How do I get the tag info?  
A. When you get your BeautifulSoup object you can do something like object.find_all('tag') or object('tag') to get all 'tag' tags.  

Q. What are tag attributes and how can I access that?  
A. HTML attributes are special words used inside the tag to control the element's behaviour. One can access attributes using the *get* function.

Q. I got the tag, but what I got seems cryptic. What now?  
A. So, once you get the info from a particular tag, you get the whole html code inside that tag. If you want some particular information from that, you would need to call functions like _get_ and _get_text_ to get the information directly without messing with the html code.  

Q. What if the number of `<td>` elements inside a `<tr>` varies from row to row?<br>
A. This might be due to some `<td>` element being shared between multiple rows or for other reasons. To address this, you can include checks on the number of `<td>` elements inside a `<tr>` and/or use a `<td>` with a recognizable pattern (of its text, of an attribute, or of a sub-element) as a reference point.

Q. I am not sure how to use these functions. Help?  
A. You can read from the official documentation and study the examples that we have been providing.
* Documentation link: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ 
* Study the BeautifulSoup section in https://www.isi.edu/~ulf/teaching/python.html 
* Study the examples in week9-1_Requests.ipynb (Blackboard -> Lectures -> Notebook section)

Good Feature: Use the .prettify() feature of BeautifulSoup. You'll get a better insight of what is that you are scrapping. See example below.

Before you begin, make sure you have _BeautifulSoup_ installed:
```
pip install bs4
```

In [615]:
# Example for .prettify().  Applicable to full HTML text or parts thereof.
from bs4 import BeautifulSoup
s = '<tr><td><a href="https://www.france.fr">France</a></td><td cat="capital city" pop="2,102,650">Paris</td></tr>'
table_row = BeautifulSoup(s, 'html.parser')
print(table_row.prettify())

<tr>
 <td>
  <a href="https://www.france.fr">
   France
  </a>
 </td>
 <td cat="capital city" pop="2,102,650">
  Paris
 </td>
</tr>



In [617]:
import requests
from bs4 import BeautifulSoup

def load_population_dict(url: str) -> dict:
    html = requests.get(url).text
    if soup := BeautifulSoup(html,'html.parser'):
        table = soup.find('table')
        headers = []
        countries_dict = dict()
        
        for th in table.find('tr').find_all('th'):
            headers.append(th.text.strip())

        location_index = headers.index('Location')
        pop_index = headers.index('Population')
        percent_index = headers.index('% ofworld')

        for row in table.find_all('tr'):
            data =  row.find_all('td')
            if len(data) > percent_index:
                country = data[location_index].text.strip()
                population = data[pop_index].text.strip()
                percent = data[percent_index].text.strip()
                countries_dict[country] = (population,percent)
            
        return countries_dict



In [618]:
# open test
def print_population_info(country: str, population_dict: dict) -> str:
    if pop_tuple := population_dict.get(country, None):
        pop_count, pop_percent = pop_tuple
        print(f'{country} has a population of {pop_count}, which is {pop_percent} of the world population.')
    else:
        print(f'Sorry, no info available for {country}.')
       
pop_dict = load_population_dict("https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population")
print(f'Loaded {len(pop_dict)} entries.')
for country in ['Australia', 'Brazil', 'China', 'England', 'India', 'Liechtenstein', 'Vietnam']:
    print_population_info(country, pop_dict)

Loaded 240 entries.
Australia has a population of 27,309,396, which is 0.3% of the world population.
Brazil has a population of 212,583,750, which is 2.6% of the world population.
China has a population of 1,408,280,000, which is 17.2% of the world population.
Sorry, no info available for England.
India has a population of 1,413,324,000, which is 17.3% of the world population.
Liechtenstein has a population of 40,900, which is 0.0005% of the world population.
Vietnam has a population of 101,343,800, which is 1.2% of the world population.


# Q2 [20 points]

### Overview Q2-Q5

The remaining questions (Q2-Q5) will guide you to build a simple __Airport Information System (AIS)__.
* Build a file airports.tsv with basic airport info, scraped from Wikipedia. (Q2)
* Build a file airport-coordinates.tsv with airport geo-coordinates, also scraped from Wikipedia. (Q3)
* Define a GeoCoordinates class and functions that convert between DMS and plain latitude/longitude format. (Q4)
* Define Airport and GeoAirport classes and write a Python program to answer the type of queries below. (Q5)
   * show airport info given an airport's IATA code
   * show airport info given part of an airport's name
   * show nearest airport(s) given geo-coordinates in either format
* Libraries/tools/examples for Q2 and Q3
    * To download a file from the internet, use the *requests* library.
    * To extract information from a downloaded HTML file, use *BeautifulSoup*.
    * For examples, see *week9-1_Requests.ipynb* and *week9-1_BeautifulSoup.ipynb* (on Brightspace -> lectures)
   
### Q2   
   
Wikipedia lists airports at https://en.wikipedia.org/wiki/List_of_airports_by_IATA_airport_code:_A
through https://en.wikipedia.org/wiki/List_of_airports_by_IATA_airport_code:_Z .

Write code that scrapes core airport info from those 26 Wikipedia pages: 
(1) IATA code, (2) name, (3) location, (4) URL,<br>
creating a file __airports.tsv__ with the 4 fields separated by tab (not comma). E.g. 
```
AAA	Anaa Airport	Anaa, Tuamotus, French Polynesia	https://en.wikipedia.org/wiki/Anaa_Airport
AAB	Arrabury Airport	Arrabury, Queensland, Australia	https://en.wikipedia.org/wiki/Arrabury_Airport
AAC	El Arish International Airport	El Arish, Egypt	https://en.wikipedia.org/wiki/El_Arish_International_Airport
...
```
We recommend that you first download the 26 Wikipedia webpages to files on your computer (caching),
and then extract the relevant data from those cached files. This avoids you having to download the same files from
the Web several times as you develop and test your extraction code. 
To further avoid bursts of requests, you might want to space out your requests
by "sleeping" (waiting) for a few seconds between requests.

Make sure to excludes footnotes such as "[1]" from IATA codes etc.<br>
IATA codes should be coomposed of three uppercase ASCII letters [A-Z].<br>
You will need the URLs to look up geo-coordinates for Californian airports in Q3.

In [None]:
import time
import requests
from bs4 import BeautifulSoup
import csv
import re

#Remove any captions from the text (e.g. [1])
def remove_captions(data: list):
    new_list = []
    for text in data:
        no_caption = re.sub(r'\[.*?\]','',text)
        new_list.append(no_caption)
    return new_list

#Create list of wikipedia pages        
wiki_pages = []
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
for letter in alphabet:
    link = f'https://en.wikipedia.org/wiki/List_of_airports_by_IATA_airport_code:_{letter}'
    wiki_pages.append(link)

for url in wiki_pages:    
    airport_data = []
    html = requests.get(url).text

    if soup := BeautifulSoup(html,'html.parser'):
        table = soup.find('table')
        for row in table.find_all('tr')[1:]:

            #obtain the airport wikipedia link for the specifed row
            anchors = row.find_all('a')
            if len(anchors) > 0:
                anchor = anchors[0].get('href')
                wiki_link = f"https://en.wikipedia.org{anchor}"

            #Add row data into a clean list, remove any captions from the strings
            data = row.find_all('td')
            if len(data) > 3:
                row_data = [data[0].text.strip(), data[2].text.strip(), data[3].text.strip(), wiki_link]
                row_data_captionless = remove_captions(row_data)
                airport_data.append(row_data_captionless)
        
    # Writing to a TSV file
    with open("airports.tsv", "a", newline="") as file:
        writer = csv.writer(file, delimiter="\t")  # Use tab as the delimiter
        writer.writerows(airport_data)  # Write all rows
    
    time.sleep(2)

# Q3. [20 points]

For the airports in _airports.tsv_ that you created in Q2, create a second file __airport-coordinates.tsv__ with the 
airports' geo-coordinates from the airports' Wikipedia pages.<br>
Limit this to airports in the US state of California (based on _airports.tsv_ info).
This will greatly limit the number of requests to the web and processing time on your computer.<br>
Store the data in tab-separated format.<br>
Fields: (1) IATA code (2) geo-coordinates. E.g.:
```
ACV     40°58′40″N 124°06′30″W
AHC     40°15′57″N 120°09′02″W
APC     38°12′47.50″N 122°16′50.50″W
...
```
As for Q2, we recommend caching the web pages on your computer and waiting a few seconds between requests.<br>
For almost all California airports, Wikipedia lists a normal airport Wikipedia page that contains coordinates.<br>
For the very small number of exceptions, where Wikipedia does not list a page with coordinates, 
you don't need to write anything to _airport-coordinates.tsv_, i.e.
no need to find the coordinates elsewhere on the Web.

In [None]:
import time
import requests
from bs4 import BeautifulSoup
import csv
import sys

with open("airports.tsv",'r', newline = '') as airports_file,\
    open('airport_coordinates.tsv','a', newline = '') as coordinates_file:
    
    #initialize reader/writer/lists
    airport_reader = csv.reader(airports_file,delimiter = '\t')
    writer = csv.writer(coordinates_file,delimiter="\t") # use tab as delimiter
    california_airports = []

    #obtain list of only airports in california
    for row in airport_reader:
        location = row[2].lower()
        if 'california' in location and 'united states' in location:
            california_airports.append(row)

    for row in california_airports:
        url = row[3]
        IATA_code = row[0]
        
        try:
            html = requests.get(url, timeout = 5).text
            soup = BeautifulSoup(html,'html.parser')

            # Find span with the class "latitude" and "longitude"
            latitude_span = soup.find(class_="latitude")
            longitude_span = soup.find(class_="longitude")

            if latitude_span and longitude_span:
                coordinate_data = f'{latitude_span.text.strip()} {longitude_span.text.strip()}'
                writer.writerow([IATA_code,coordinate_data]) #write row to tsv
                coordinates_file.flush()
        except:
            sys.stderr.write(f'unable to fetch data from {row[3]}')
            


unable to fetch from https://en.wikipedia.org/wiki/Apple_Valley_Airport_(California)

# Q4. [20 points]

Define a class __GeoCoordinates__ that stores geo-coordinates in both DMS (degree/minute/second) format 
and plain latitude/longitude format. Include functions that convert between the formats. A complete
GeoCoordinates instance will store the coordinates in both formats. For example, the geo-coordinates for USC are
* 34°01′18″N 118°17′07″W
* which is equivalent to a latitude of 34.02167 and a longitude of -118.28528

34°01′18″N is short for 34 degrees, 1 minute, 18 seconds north.<br>
There are 60 minutes in a degree, and 60 seconds in a minute.<br>
34°01′18″ = 34 + 1/60 + 18/3600 = 34.02167<br>
Southern latitudes and Western longitudes are expressed as negative values.<br>

An instance of a new _GeoCoordinates_ can be created by either providing a DMS (of type str) 
or, alternatively, both latitude and longitude (both float).

Variables for _GeoCoordinates_ should include dms (str), lat (float), long (float).<br>
Methods for _GeoCoordinates_ should include at least \_\_init\_\_(), \_\_str\_\_(), distance().
We provide the distance() method to you below.<br>
You probably want to write separate functions that convert from DMS to lat/long and vice versa.

In [42]:
from __future__ import annotations
from math import sin, cos, sqrt, atan2, radians
from typing import Optional, Tuple
import re

class GeoCoordinates:

    def __init__(self, lat: Optional[float] = None, long: Optional[float] = None, dms: Optional[str] = None):
        #convert a float number to dms string format
        def float_to_dms(number: float, lat_long: str ) -> str:
            if lat_long == 'lat':
                direction = 'N' if number >= 0 else 'S' #negative values means South
            elif lat_long == 'long':
                direction = 'E' if number >= 0 else 'W' #negative values means West
            #calculate degrees/minutes/seconds
            d = int(number)
            m = int((number - d)*60) 
            s = round(((number - d)*60 - m) * 60)
            if number < 0:
                d = -d
                m = -m
                s = -s
            return f'{str(d)}°{str(m)}′{str(s)}″{direction}' 

        #Convert float value of Latitude and Longitude into DMS string format
        if lat and long:
            self.lat = lat
            self.long = long
            if not dms:
                lat_dms = float_to_dms(lat,'lat')
                long_dms = float_to_dms(long,'long')
                
                self.dms_lat = lat_dms
                self.dms_long = long_dms
                self.dms = f'{lat_dms} {long_dms}'

        #Create the Latitude and Longitude float value attributes from a DMS string
        if dms:
            self.dms = dms
            if not lat and not long:
                pattern = r'(-?\d+)°(\d+)?′?(\d+\.?\d+?)?″?([NESW])'
                coords = re.findall(pattern,dms)
                if coords:
                    lat_coords = coords[0]
                    long_coords = coords[1]
                    if len(coords) == 2:
                        lat_degree = int(lat_coords[0])
                        lat_min = int(lat_coords[1]) if lat_coords[1] else 0
                        lat_sec = float(lat_coords[2]) if lat_coords[2] else 0
                        
                        long_degree = int(long_coords[0])
                        long_min = int(long_coords[1]) if long_coords[1] else 0
                        long_sec = float(long_coords[2]) if long_coords[2] else 0

                        latitude = lat_degree + lat_min/60 + lat_sec/3600
                        longitude = long_degree + long_min/60 + long_sec/3600

                        self.lat = latitude if coords[0][3] == 'N' else -latitude
                        self.long = longitude if coords[1][3] == 'E' else -longitude
    
    def __str__(self) -> str:
        return f'{self.dms} (lat: {self.lat}, long: {self.long})'
   
    def distance(self, other: GeoCoordinates) -> float:
        earth_radius = 6373.0  # in kilometers
        lat1 = radians(self.lat)
        lon1 = radians(self.long)
        lat2 = radians(other.lat)
        lon2 = radians(other.long)
        dlon = lon2 - lon1
        dlat = lat2 - lat1
        a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
        c = 2 * atan2(sqrt(a), sqrt(1 - a))
        distance = earth_radius * c
        return distance  # in kilometers

In [43]:
# open test
gc_la = GeoCoordinates(dms='34°03′N 118°15′W')
gc_sf = GeoCoordinates(lat=37.7775, long=-122.41639)
print('Coordinates of Los Angeles:', gc_la)
print('Coordinates of San Francisco:', gc_sf)
print(f'Straight-line distance from Los Angeles to San Francisco: {round(gc_la.distance(gc_sf))}km')

Coordinates of Los Angeles: 34°03′N 118°15′W (lat: 34.05, long: -118.25)
Coordinates of San Francisco: 37°46′39″N 122°24′59″W (lat: 37.7775, long: -122.41639)
Straight-line distance from Los Angeles to San Francisco: 559km


should print
```
Coordinates of Los Angeles: 34°03′N 118°15′W (lat: 34.05, long: -118.25)
Coordinates of San Francisco: 37°46′39″N 122°24′59″W (lat: 37.7775, long: -122.41639)
Straight-line distance from Los Angeles to San Francisco: 559km
```

# Q5. [20 points]

Define two classes _Airport_ and its subclass _GeoAirport_ for those airports that have an entry in _airport-coordinates.tsv_ (true for almost all California airports).<br>
_Airport_ instances will have variables for at least their IATA codes, their names, and their locations.<br>
_GeoAirport_ instances will also have variables for their geo-coordinates as well as a list of nearby airports, 
within a 100km range, technically a list of tuples (distance: float, nearby_airport: GeoAirport).<br>
Both should have reasonable \_\_str\_\_() methods. _GeoAirport_ should also have a method show_nearby_airports() that returns
a string with nearby airports in pretty format.<br>

Define a function __load_airports()__ that will build a list of airports (Airport and GeoAirport) from the files airports.tsv and airport-coordinates.tsv that you built.
Define a function __ais()__ (Airport Information System), which given a user request (of type str), will return airport information (one or more lines).
Please make sure to use modular programming with meaningful functions (that have meaningful return values).
Your program should process the following 6 formats as input for __ais()__:

* ais("\<IATA-code\>")
* ais("\<DMS-lat\> \<DMS-long\>")
* ais("\<DMS-lat\> \<DMS-long\> \<number-of-nearest-airports\>")
* ais("\<latitude\> \<longitude\>")
* ais("\<latitude\> \<longitude\> \<number-of-nearest-airports\>")
* ais("\<airport-name-substring\>")

Apply the _substring_ format only if none of the above formats did not match.
The _substring_ can be anywhere in the airport's name, incl. in the middle.
    
## Examples

### Example 1 (IATA code of regular Airport)
Command: 
```
ais("PEK")
```
Returns:
```
PEK  Beijing Capital International Airport  (Beijing, China)
```

### Example 2 (IATA code of GeoAirport)
Command: 
```
ais("LAX")
```
Returns:
```
LAX  Los Angeles International Airport  (Los Angeles, California, United States; 33°56′33″N 118°24′29″W)
   Nearby:   7km HHR Hawthorne Municipal Airport (Jack Northrop Field)
   Nearby:   9km SMO Santa Monica Municipal Airport
   Nearby:  16km CPM Compton/Woodley Airport
   Nearby:  17km TOA Zamperini Field
   Nearby:  27km LGB Long Beach Airport
   ...
   Nearby:  92km RIR Flabob Airport
```

### Example 3 (substring of airport name)

Command: 
```
ais("Oak")
```
Returns:
```
OAK  Oakland International Airport  (Oakland, California, United States; 37°43′17″N 122°13′15″W)
ODC  Oakdale Airport (FAA: O27)  (Oakdale, California, United States; 37°45′23″N 120°48′01″W)
OKY  Oakey Army Aviation Centre  (Oakey, Queensland, Australia)
PTK  Oakland County International Airport  (Pontiac, Michigan, United States)
```
   
### Example 4 (USC coordinates, plain lat/long format, without number of nearest airports)

Command:
```
ais("34.02167 -118.28528")
```
Returns:
```
 12km HHR  Hawthorne Municipal Airport (Jack Northrop Field)  (Hawthorne, California, United States; 33°55′22″N 118°20′07″W)
```

### Example 5 (USC coordinates, DMS format, with number of nearest airports)

Command:
```
ais("34°01′18″N 118°17′07″W 4")
```
Returns:
```
 12km HHR  Hawthorne Municipal Airport (Jack Northrop Field)  (Hawthorne, California, United States; 33°55′22″N 118°20′07″W)
 14km LAX  Los Angeles International Airport  (Los Angeles, California, United States; 33°56′33″N 118°24′29″W)
 15km CPM  Compton/Woodley Airport  (Compton, California, United States; 33°53′24″N 118°14′37″W)
 15km SMO  Santa Monica Municipal Airport  (Santa Monica, California, United States; 34°00′57″N 118°27′05″W)
```

In [6]:
import csv
import re

all_airports = dict()
geo_airport_dict = dict()
airport_names = []

class Airport:
    def __init__(self,iata,name,location):
        self.IATA = iata
        self.name = name
        self.location = location

class GeoAirport(Airport):
    def __init__(self,iata,name,location,dms):
        super().__init__(iata,name,location)
        self.dms = dms
        self.geo_coordinates = GeoCoordinates(dms=dms)
        self.lat = self.geo_coordinates.lat
        self.long = self.geo_coordinates.long

    def __str__(self):
        return f'{self.name}: {self.lat} {self.long}'
    
    def show_nearby_airports(self) -> list:
        nearby_airports = []
        for iata,other_airport in geo_airport_dict.items():
            distance = round(self.geo_coordinates.distance(other_airport.geo_coordinates))
            if distance > 0 and distance <= 100:
                nearby_airports.append((distance,other_airport))
        
        nearby_airports_sorted = sorted(nearby_airports, key=lambda x:x[0])
        return nearby_airports_sorted


In [41]:
import sys

def load_airports():  
    """This function loads the airport information from the files built in Q2 and Q3
    and populates class objects for Airport and GeoAirport"""

    with open('airport_coordinates.tsv','r',newline='') as coordinates_file,\
        open('airports.tsv','r',newline='') as airports_file:

        #initialize reader/writer/lists
        airports_list = list(csv.reader(airports_file,delimiter = '\t'))
        coordinates_list = list(csv.reader(coordinates_file,delimiter = '\t'))
        
        iata_w_coords_dict = dict()
        for x in coordinates_list:
            iata_w_coords_dict[x[0]] = x[1]

        for airport in airports_list:
            iata, name, location = airport[:3]
            all_airports[iata] = Airport(iata,name,location)
            airport_names.append(name)

            if iata in iata_w_coords_dict:
                dms = iata_w_coords_dict[iata]
                geo_airport_dict[iata] = GeoAirport(iata,name,location,dms)


def get_nearby_airports(geo_coords: GeoCoordinates, print_length: Optional[str] = None):
    nearby_airports = []
    
    for iata, other_airport in geo_airport_dict.items():
        other_gc = other_airport.geo_coordinates
        distance = round(geo_coords.distance(other_gc))
        if distance <= 100:
            nearby_airports.append((distance,f'{distance}km {other_airport.name} ({other_airport.location}; {other_airport.dms} )'))
    nearby_airports_sorted = sorted(nearby_airports, key=lambda x:x[0])
    nearby_airports_sorted = [x[1] for x in nearby_airports_sorted]

    if len(nearby_airports) == 0:
        return "no nearby airports"
    elif print_length:
        num = int(print_length[0])
        max = len(nearby_airports_sorted)
        small = min(num,max)
        result = nearby_airports_sorted[:small]
        return '\n'.join(result)
    else: 
        return nearby_airports_sorted[0]

def ais(s: str) -> str:
    s = s.strip()

    #regex to find DMS and latitude/longitude values if present in argument string
    lat_long_pattern = r'-?\d+\.-?\d+'
    dms_pattern = r'(-?\d+°\d+?′?\d+\.?\d+?″?[NS])\s*(-?\d+°\d+?′?\d+\.?\d+?″?[EW])'
    number_pattern = r"\s+\d+$"
    coordinates_found = re.findall(dms_pattern,s)
    end_number_found = re.findall(number_pattern,s)
    lat_long_found = re.findall(lat_long_pattern,s)

    #If the string argument matches an IATA code and is located in california
    if s in all_airports and 'california' in all_airports[s].location.lower() and 'united states' in all_airports[s].location.lower():
        results = []
        airport = all_airports[s]
        geo_airport = geo_airport_dict[airport.IATA]
        nearby_airports = geo_airport.show_nearby_airports()

        results.append(f'{airport.IATA}   {airport.name}   ({airport.location})')
        for nearby in nearby_airports:
            distance = nearby[0]
            nearby_airport = nearby[1]
            results.append(f'    Nearby: {str(distance)}km {nearby_airport.IATA} {nearby_airport.name}')
   
        return '\n'.join(results)

    #Else-if the argument matches and IATA but is not in california        
    elif all_airports.get(s):
        airport = all_airports[s]       
        return f'{airport.IATA}    {airport.name}   ({airport.location})'

    #Else if the argument is a DMS with or w/o a number at the end of the string
    elif coordinates_found:
        try:
            dms = f'{coordinates_found[0][0]} {coordinates_found[0][1]}' 
            gc = GeoCoordinates(dms = dms)
            results = get_nearby_airports(gc,end_number_found)
            return results
        except:
            sys.stderr.write(f'Unable to get coordinates for {coordinates_found}')

    #Else if the argument is a string with Latitude and Longitude with or w/o a number at the end of the string
    elif len(lat_long_found) == 2:
        try:
            gc = GeoCoordinates(lat = float(lat_long_found[0]), long = float(lat_long_found[1]))
            results = get_nearby_airports(gc,end_number_found)
            return results
        except:
            sys.stderr.write(f'Unable to get coordinates for {lat_long_found}')

    #Else-if the argument is a substring somewhere in the airport name
    elif any(s in i for i in airport_names): #checks if argument string is in any of the airport name strings
        results =[]
        for iata, airport in all_airports.items():
            if s in airport.name:
                results.append(f'{iata} {airport.name} ({airport.location})')

        return '\n'.join(results)
    
    #If none of the conditions apply, notify user
    else:
        return f'{s} not found'
            

In [38]:
# open test

load_airports() 
for arg in ("PEK", "LAX", "Oak", "34.02167 -118.28528", "34°01′18″N 118°17′07″W 4"):
    print (f"Input: {arg}\n{ais(arg)}\n")

Input: PEK
PEK    Beijing Capital International Airport   (Beijing, China)

Input: LAX
LAX   Los Angeles International Airport   (Los Angeles, California, United States)
    Nearby: 7km HHR Hawthorne Municipal Airport (Jack Northrop Field)
    Nearby: 9km SMO Santa Monica Municipal Airport
    Nearby: 16km CPM Compton/Woodley Airport
    Nearby: 17km TOA Zamperini Field
    Nearby: 27km LGB Long Beach Airport
    Nearby: 29km BUR Bob Hope Airport
    Nearby: 31km VNY Van Nuys Airport
    Nearby: 35km WHP Whiteman Airport
    Nearby: 38km EMT San Gabriel Valley Airport
    Nearby: 40km FUL Fullerton Municipal Airport
    Nearby: 58km SNA John Wayne Airport (Orange County Airport)
    Nearby: 60km AVX Catalina Airport
    Nearby: 60km POC Brackett Field
    Nearby: 69km CCB Cable Airport
    Nearby: 69km NTD NAS Point Mugu (Naval Base Ventura County)
    Nearby: 71km CNO Chino Airport
    Nearby: 75km ONT Ontario International Airport
    Nearby: 75km SZP Santa Paula Airport
    Nearby: 

In [39]:
#private test
# Open test script for ais() function
load_airports()

test_cases = [
    ("PEK", "Valid IATA code - regular Airport"),
    ("LAX", "Valid IATA code - GeoAirport"),
    ("Oak", "Valid airport name substring"),
    ("34.02167 -118.28528", "Valid lat/long format, without number of nearest airports"),
    ("34°01′18″N 118°17′07″W 4", "Valid DMS format, with number of nearest airports"),
    ("ZZZ", "Non-existent IATA code"),
    ("RandomCity", "Non-existent airport name substring"),
    ("JFK", "Valid IATA code but not a GeoAirport"),
    ("60°00′00″N 160°00′00″W 5", "Valid DMS coordinates with no nearby airports"),
    ("ATL", "IATA code that is also a substring of another airport name"),
    ("34.02167 -118.28528 100", "Latitude/longitude with excessive nearest airports requested"),
    ("0.0000 0.0000", "Latitude/longitude with no nearby airports"),
    ("random text", "Invalid input format"),
    ("34°01′18″N118°17′07″W", "DMS format without a space between coordinates"),
    ("York", "Airport name substring matching multiple locations")
]

for arg, description in test_cases:
    print(f"Test: {description}\nInput: {arg}\nOutput:\n{ais(arg)}\n{'-'*50}\n")


Test: Valid IATA code - regular Airport
Input: PEK
Output:
PEK    Beijing Capital International Airport   (Beijing, China)
--------------------------------------------------

Test: Valid IATA code - GeoAirport
Input: LAX
Output:
LAX   Los Angeles International Airport   (Los Angeles, California, United States)
    Nearby: 7km HHR Hawthorne Municipal Airport (Jack Northrop Field)
    Nearby: 9km SMO Santa Monica Municipal Airport
    Nearby: 16km CPM Compton/Woodley Airport
    Nearby: 17km TOA Zamperini Field
    Nearby: 27km LGB Long Beach Airport
    Nearby: 29km BUR Bob Hope Airport
    Nearby: 31km VNY Van Nuys Airport
    Nearby: 35km WHP Whiteman Airport
    Nearby: 38km EMT San Gabriel Valley Airport
    Nearby: 40km FUL Fullerton Municipal Airport
    Nearby: 58km SNA John Wayne Airport (Orange County Airport)
    Nearby: 60km AVX Catalina Airport
    Nearby: 60km POC Brackett Field
    Nearby: 69km CCB Cable Airport
    Nearby: 69km NTD NAS Point Mugu (Naval Base Ventura Count

Unable to get coordinates for ['0.0000', '0.0000']