# `clean_airport()`: Clean Airport Codes or Names

Follow the given databases in `Resources` to validate and extract info with airport code or name.

# Features

1. Create databases of airport codes and their related detailed information. 

2. Check if the input airport code or name is valid (in your database).

3. Transfer invalid airport code or name into `NaN`

3. Standardize null values

4. User can specify the output detailed information they want. 

# Tentative design

In [None]:
def clean_airport(
    df: Union[pd.DataFrame, dd.DataFrame],
    column: str,
    input_format: str = "auto",
    output_format: str = "name",
    output_detail: Optional[List[str]] = ['country_name', 'latitude', 'logtitude'],
    fuzzy: bool = False,
    fuzzy_dist: float = 0.0,
    inplace: bool = False,
    report: bool = True,
    progress: bool = True,
) -> pd.DataFrame:
    """
    Parameters
    ----------
    df
        A pandas or Dask DataFrame containing the data to be cleaned.
    column
        The name of the column containing language names.
    input_format
        - 'auto': infer the input format
        - 'name': airport name ('Beijing Capital International Airport')
        - 'identifier': identification code ('ZBAA')
        (default: 'auto')
    output_format
        - 'name': airport name ('Beijing Capital International Airport')
        - 'identifier': identification code ('ZBAA')
        (default: 'name')
    output_detail
        The output of detailed list of extracting information. There are several options:
        - 'country_name'
        - 'country_alpha_2'
        - 'country_alpha_3'
        - 'region'
        - 'city'
        - 'municipality'
        - 'latitude'
        - 'longtitude'
        - 'timezone'
        - 'type'
        - 'elevation_feet'
    fuzzy
        If False, matching for input formats 'name' is done by looking
        for a direct match. If True, matching is done by searching the input for a
        regex match.
        (default: False)
    fuzzy_dist
        The maximum edit distance (number of single character insertions, deletions
        or substitutions required to change one word into the other) between a airport value
        and input that will count as a match. Only applies to 'auto', 'name'
        input formats.
        (default: 0.0)
    inplace
        If True, delete the column containing the data that was cleaned. Otherwise,
        keep the original column.
        (default: False)
    report
        If True, output the summary report. Otherwise, no report is outputted.
        (default: True)
    progress
        If True, display a progress bar.
        (default: True)
    """

# Resources
1. [airportsdata](https://github.com/mborsetti/airportsdata)
2. [airport-codes](https://github.com/datasets/airport-codes/blob/master/data/airport-codes.csv)