# Creating PlaneFlight Input Files for GEOS-Chem

This notebook demonstrates how to use `planeflight_io` to create `Planeflight.dat.YYYYMMDD` input files for the GEOS-Chem planeflight diagnostic. These input files tell GEOS-Chem when, where, and what to sample along an aircraft flight track (or any set of observation points).

## Step 0: Learn About the Planeflight Diagnostic

Before creating input files, it's helpful to understand what the planeflight diagnostic does. Visit the GEOS-Chem documentation for a full description:

ðŸ”— https://geos-chem.readthedocs.io/en/stable/gcclassic-user-guide/planeflight.html

In this notebook, we use `pln.make_planeflight_inputs()` to create properly-formatted `Planeflight.dat` input files. See the function's docstring (or the companion `.py` script) for a full description of all arguments.

In [None]:
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import os
import numpy as np
import planeflight_io as pln

# Set this to the path of the 'examples/' folder in your local clone of the planeflight_io repo.
path_to_examples = '/path/to/planeflight_io/examples'  # <-- Update this!

## Step 1: Load Your Flight Data

`make_planeflight_inputs()` requires four arrays describing the observation points:

| Argument | Description | Type |
|---|---|---|
| `datetimes` | Times to sample (UTC) | `pd.Series` of `pd.Timestamp` |
| `lat_arr` | Latitudes (âˆ’90 to 90Â°) | 1-D array-like |
| `lon_arr` | Longitudes (âˆ’180 to 180Â°) | 1-D array-like |
| `vert_arr` | Pressure (hPa) **or** altitude (m above ground) | 1-D array-like |

Use pressure inputs whenever possible â€” pressure sampling is more accurate than altitude for aircraft data. Set `vert_is_pres=True` when passing pressure values.

This notebook uses example SENEX campaign data shipped with the repository.

In [None]:
# Load the example SENEX campaign dataset:
senex_pth = path_to_examples + '/datafiles_for_examples/SENEX.nc'
ds = xr.open_dataset(senex_pth)

# Use only the first 2 days so we generate just 2 input files:
unq_dates = np.unique(ds.time.dt.date)
ds = ds.where(((ds.time.dt.date == unq_dates[0]) | (ds.time.dt.date == unq_dates[1])), drop=True)

# 1.1 Times â€” must be a pd.Series of pd.Timestamps:
senex_time = pd.to_datetime(ds.time.values).to_series().reset_index(drop=True)
print(type(senex_time[0]))  # Should be <class 'pandas._libs.tslibs.timestamps.Timestamp'>

# 1.2 Latitude and longitude:
senex_lat = ds.GpsLat.values
senex_lon = ds.GpsLon.values
print(type(senex_lat), type(senex_lon))

# 1.3 Pressure (hPa) â€” use static pressure measured during the flight:
senex_pres = ds.StaticPrs
print(type(senex_pres))
print('Units:', ds.StaticPrs.attrs['Units'])  # Confirm hPa / mbar

## Step 2: Choose What to Sample

Beyond advected tracers, GEOS-Chem planeflight can output a variety of optional diagnostics (meteorological fields, aerosol optical depths, chemical families, etc.).

Use `pln.get_compatible_input_diags()` to see which diagnostics are compatible with your simulation type. You can request:
- All of them: `diags='?ALL?'`
- A specific subset by collection name via `these_collections`
- A hand-picked list: `diags=['NOy', 'RO2']`

Valid collection names include: `'aer_uptake'`, `'aodb'`, `'aodc'`, `'aq_aer'`, `'chem_fams'`, `'defaults'`, `'gmao_ice'`, `'gmao_met'`, `'hg'`, `'htep'`, `'isor'`, `'tomas'`.

The function also needs your `geoschem_config.yml` to determine the simulation type, list all advected species, and validate optional diagnostic compatibility.

In [None]:
# See all optional diagnostics compatible with a full-chemistry simulation:
diags = pln.get_compatible_input_diags(simtype='fullchem', display=True)

# Or retrieve only specific collections:
met_diags = pln.get_compatible_input_diags(simtype='fullchem',
                                            these_collections=['gmao_met', 'chem_fams'],
                                            display=True)

# Point to the geoschem_config.yml file for your GEOS-Chem run:
gc_config = path_to_examples + '/datafiles_for_examples/geoschem_config.yml'

## Example 1: Specific Tracers + Specific Diagnostics

Request a short list of advected species (`NO`, `O3`, `CO`) and two optional diagnostics (`NOy`, `RO2`). This produces the smallest input files and is the fastest option when you only need a few variables.

By default (`use_tracer_names=False`), advected species are written as tracer **numbers** in the input file. This causes GEOS-Chem to output concentrations in `mol/mol dry` â€” the most convenient unit for comparison with observations.

In [None]:
ex1_dir = path_to_examples + '/example1/'
if not os.path.isdir(ex1_dir):
    os.mkdir(ex1_dir)

pln.make_planeflight_inputs(
    savedir=ex1_dir,
    gc_config=gc_config,
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers=['NO', 'O3', 'CO'],   # List of specific tracers to sample
    diags=['NOy', 'RO2'],          # List of optional diagnostics to sample
    username='me',
    overwrite=True,
    use_tracer_names=False,        # Use tracer numbers â†’ outputs in mol/mol (recommended)
)

## Example 2: All Tracers + All Diagnostics (Wildcards)

Pass `tracers='?ALL?'` and `diags='?ALL?'` to request every advected species listed in your `geoschem_config.yml` and every compatible optional diagnostic. The resulting input files are larger, but this approach ensures nothing is missed.

In [None]:
ex2_dir = path_to_examples + '/example2/'
if not os.path.isdir(ex2_dir):
    os.mkdir(ex2_dir)

pln.make_planeflight_inputs(
    savedir=ex2_dir,
    gc_config=gc_config,
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers='?ALL?',               # Wildcard: request all advected species
    diags='?ALL?',                 # Wildcard: request all compatible optional diagnostics
    username='me',
    overwrite=True,
    use_tracer_names=False,
)

## Example 3: All Tracers/Diagnostics Minus Exclusions

Use wildcards but explicitly exclude certain species or diagnostics with `tracers_minus` and `diags_minus`. Useful when you want nearly everything but need to omit a few variables that are irrelevant to your study or that cause issues.

In this example, we also set `use_tracer_names=True` to write tracer **names** instead of numbers. Note: this causes GEOS-Chem to output advected species in `molec/cmÂ³` rather than `mol/mol`, which requires a conversion step when reading the output.

In [None]:
ex3_dir = path_to_examples + '/example3/'
if not os.path.isdir(ex3_dir):
    os.mkdir(ex3_dir)

tracers_minus = ['ClNO2', 'Cl2', 'ClO', 'HOCl', 'HCl', 'BrCl']

diags_minus = [
    "AODC_SULF", "AODC_BLKC", "AODC_ORGC", "AODC_SALA", "AODC_SALC",
    "AODC_DUST", "AODB_SULF", "AODB_BLKC", "AODB_ORGC", "AODB_SALA", "AODB_SALC",
    "AODB_DUST", "GMAO_ICE00", "GMAO_ICE10", "GMAO_ICE20",
    "GMAO_ICE30", "GMAO_ICE40", "GMAO_ICE50", "GMAO_ICE60", "GMAO_ICE70",
    "GMAO_ICE80", "GMAO_ICE90",
]

pln.make_planeflight_inputs(
    savedir=ex3_dir,
    gc_config=gc_config,
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers='?ALL?',
    tracers_minus=tracers_minus,   # Exclude these tracers
    diags='?ALL?',
    diags_minus=diags_minus,       # Exclude these diagnostics
    username='me',
    overwrite=True,
    use_tracer_names=True,         # Outputs in molec/cm3 (requires conversion when reading)
)