# Global Airports

Let us explore the dataset related to the global airports. Through this notebook, we will learn:
- Locate and download the dataset locally.
- Read the dataset and configure the `geodataframe` from lat/lon.
- Create and Customize the plots. We will also learn to color code the airports data from different continents separately.
- Create an interactive plot and customize the tooltip to make it user friendly.

Let's start with some imports.

In [None]:
import geopandas
import pandas as pd
import kagglehub  # For downloading datasets.

We will download two datasets from Kaggle. If you have the Kaggle API token set, you could download the datasets using `kagglehub` library. Otherwise, you can download the dataset locally and use it with `pandas` library.

In [None]:


# Download latest version
airport_path = kagglehub.dataset_download("samvelkoch/global-airports-iata-icao-timezone-geo")

# Download latest version
continent_country_path = kagglehub.dataset_download("hserdaraltan/countries-by-continent")

Now, initialize the dataframes from the dataset downloaded locally.

NOTE: `continenets.csv` dataset name was slightly different and contained spaces. I modified the dataset filename to be `continents.csv` to keep it simple.

In [None]:
airports_df = pd.read_csv(f"{airport_path}/airports.csv")
countries_df = pd.read_csv(f"{continent_country_path}/continents.csv")

Now, let's inspect the dataframes.

In [None]:
countries_df.head()

In [None]:
airports_df.head()

Later, we will merge these dataframes and color code the airports based on the continents they are part of. Notice that the `countries_df` had a naming mismatch between how contries are represented their and how they were presented in `airports_df`. The code below replaces these values. Also, some of the island countries were not part of `countries_df`. We will ignore that for now!

In [None]:
# Replace some of the non-matching values before join
airports_df.replace("United States of America", "United States", inplace=True)
airports_df.replace("United States Minor Outlying Islands", "United States", inplace=True)
airports_df.replace("Puerto Rico", "United States", inplace=True)
airports_df.replace("Chili", "Chile", inplace=True)
airports_df.replace("Somali", "Somalia", inplace=True)
airports_df.replace("American Samoa", "Samoa", inplace=True)
airports_df.replace("Russian Federation", "Russia", inplace=True)
airports_df.replace("Great Britain (United Kingdom)", "United Kingdom", inplace=True)
airports_df.replace("Federated States of Micronesia", "Micronesia", inplace=True)
airports_df.replace("Cote d'Ivoire", "Ivory Coast", inplace=True)
airports_df.replace("Korea (South)", "South Korea", inplace=True)
airports_df.replace("Korea (North)", "North Korea", inplace=True)
airports_df.replace("Netherlands (Holland)", "Netherlands", inplace=True)
airports_df.replace("Myanmar", "Burma (Myanmar)", inplace=True)
airports_df.replace("Viet Nam", "Vietnam", inplace=True)
airports_df.replace("Lao People's Democratic Republic", "Laos", inplace=True)
airports_df.replace("Timor-Leste", "East Timor", inplace=True)
airports_df.replace("Saint Kitts (Christopher) and Nevis", "Saint Kitts and Nevis", inplace=True)
airports_df.replace("Czech Republic", "Czechia", inplace=True)
airports_df.replace("Marocco", "Morocco", inplace=True)
airports_df.replace("Greenland", "Denmark", inplace=True)
airports_df.replace("Burkina-Faso", "Burkina", inplace=True)
airports_df.replace("Slovak Republic", "Slovakia", inplace=True)
airports_df.replace("Bosnia & Herzegovina", "Bosnia and Herzegovina", inplace=True)
airports_df.replace("Libyan Arab Jamahiriya", "Libya", inplace=True)
airports_df.replace("Congo, Democratic Republic of the", "Democratic Republic of Congo", inplace=True)

Now, we will create a `GeoDataframe` from `airports_df`. We will utilize the Latitude and Longitude from the `airports_df`, represented by `GeoPointLat` and `GeoPointLon` and initialize the [Point](https://shapely.readthedocs.io/en/stable/reference/shapely.Point.html) from `shapely` library. The `crs` argument within the `GeoDataframe` object helps us correctly interpret the (lat, lon) points.

In [None]:
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip(airports_df['GeoPointLong'], airports_df['GeoPointLat'])]
gdf = geopandas.GeoDataFrame(airports_df, geometry=geometry, crs="EPSG:4326")

Let's plot this `Geodataframe` using `viridis` colormap.

In [None]:
gdf.plot(markersize=.01, cmap="viridis")

Amazing! However, the interactive plot would provide us a better experience navigating the plot and locate the airports! Also, as added bonus, we will color code the airports based on their continent (which we can obtain using `countries_df`).

First, let's find out all unique continent names. We have total 6 continents (There is no airport in Antarctica!!)

In [None]:
countries_df["Continent"].unique()

Now, let's assign each continent a unique color.

In [None]:
colors = ['darkred', 'darkgreen', 'blue', 'purple', 'yellow', 'magenta']
colors_df = pd.DataFrame({'Continent': countries_df["Continent"].unique().tolist(), 'colors': colors})
colors_df

Now, let's merge the colors and contries dataframes to obtain a color mapping for each country, and finally merge that with the `GeoDataframe` to obtain the color mapping.

In [None]:
continent_colors_df = pd.merge(countries_df, colors_df, left_on='Continent', right_on='Continent', how='left')
merged_df = pd.merge(gdf, continent_colors_df, left_on='Country_Name', right_on='Country', how='left')

Now, let's inspect the merged dataframe.

In [None]:
merged_df.head()

Now, some of the island countries are not available within the `countries_df`, and so we can either drop these columns or we can assign a default color to all airports not found in `countries_df`.

In [None]:
merged_df["colors"] = merged_df["colors"].fillna('black')

In [None]:
merged_df

In [None]:
merged_df.plot(markersize=0.001, color=merged_df['colors'])

And we are now ready to do an interactive plot!!

In [None]:
merged_df.explore(markersize=0.001, color=merged_df['colors'])

Aah that looks awesome (barring few black dots). Notice when you hover over any airport, you will see all columns in tooltip. This is not ideal! Ideally we are only interested in country, airport code and airport name.

Let's customize the tooltip! First, we will inspect the column names on a dataframe using `columns`.

In [None]:
merged_df.columns

Using `list` of column names in `tooltip` argument to the `explore` function, we can exactly obtain what are we looking for!

In [None]:
merged_df.explore(markersize=0.001, color=merged_df['colors'], tooltip=["AirportName", "IATA","Country"])

Aah that looks much better!!!

Notice you will see more airports than you expected! This also shows the regional, private airports and helipads. For example, New York City shows you many more than 3 airports!!

In [None]:
airports_df[airports_df["City_IATA"] == 'NYC']