# Worldwide Airports Dataset Analysis

This analysis aims to explore the [Global Aviation Hub: Dataset of Airports Worldwide](https://www.kaggle.com/datasets/harshalhonde/global-aviation-hub-dataset-of-airports-worldwide/data), from Kagle, by Harshal H, for learning purposes only.
Here I practice data cleanup, datasets and series manipulation and insights visualization.
There are no pre-defined objectives, so, as I explore the data I will dive into whatever I stumble on along the way.

Dataset Columns:
- **Airport Identifiers**: Unique identifiers and codes for each airport, including ICAO, IATA, and local codes.
- **Geographical Coordinates**: Precise latitude and longitude coordinates for accurate mapping and analysis.
- **Elevation**: Elevation data in feet for each airport's location.
- **Geographical Region**: Information about the continent, country, and region where each airport is situated.
- **Municipality**: The city or municipality associated with each airport's location. 
- **Scheduled Service**: Indicates whether the airport offers scheduled commercial air services.
- **Useful Links**: Links to homepages and Wikipedia pages for further information.
- **Keywords**: Keywords that provide additional context and categorization for each airport.

# Importing libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import kagglehub

# Get the dataset and setup the pandas dataframe

In [10]:
# Download latest version, at the time of writing this it is version "1".
path = kagglehub.dataset_download("harshalhonde/global-aviation-hub-dataset-of-airports-worldwide")

df = pd.read_csv(f"{path}/airports .csv")
df.head()

Unnamed: 0,id,ident,type,name,latitude_deg,longitude_deg,elevation_ft,continent,iso_country,iso_region,municipality,scheduled_service,gps_code,iata_code,local_code,home_link,wikipedia_link,keywords
0,6523,00A,heliport,Total RF Heliport,40.070985,-74.933689,11.0,,US,US-PA,Bensalem,no,K00A,,00A,https://www.penndot.pa.gov/TravelInPA/airports...,,
1,323361,00AA,small_airport,Aero B Ranch Airport,38.704022,-101.473911,3435.0,,US,US-KS,Leoti,no,00AA,,00AA,,,
2,6524,00AK,small_airport,Lowell Field,59.947733,-151.692524,450.0,,US,US-AK,Anchor Point,no,00AK,,00AK,,,
3,6525,00AL,small_airport,Epps Airpark,34.864799,-86.770302,820.0,,US,US-AL,Harvest,no,00AL,,00AL,,,
4,506791,00AN,small_airport,Katmai Lodge Airport,59.093287,-156.456699,80.0,,US,US-AK,King Salmon,no,00AN,,00AN,,,


# Getting to know the dataset

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76367 entries, 0 to 76366
Data columns (total 18 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 76367 non-null  int64  
 1   ident              76367 non-null  object 
 2   type               76367 non-null  object 
 3   name               76367 non-null  object 
 4   latitude_deg       76367 non-null  float64
 5   longitude_deg      76367 non-null  float64
 6   elevation_ft       61969 non-null  float64
 7   continent          39372 non-null  object 
 8   iso_country        76108 non-null  object 
 9   iso_region         76367 non-null  object 
 10  municipality       71317 non-null  object 
 11  scheduled_service  76367 non-null  object 
 12  gps_code           41345 non-null  object 
 13  iata_code          8889 non-null   object 
 14  local_code         32792 non-null  object 
 15  home_link          3694 non-null   object 
 16  wikipedia_link     110

We can see that all airports have non-null date on important columns like: ident, type, name, latitude, longitude and iso_region.

Most of the iso_country column is filled. Would be good to also have the same data quantity on municipality. But we will see how those null values are distributed. Maybe on big and developed countries the data is all there for us to explore.