# Creating PlaneFlight Input Files for GEOS-Chem

This notebook demonstrates how to use `planeflight_io` to create `Planeflight.dat.YYYYMMDD` input files for the GEOS-Chem planeflight diagnostic. These input files tell GEOS-Chem when, where, and what to sample along an aircraft flight track (or any set of observation points).

## Step 0: Learn About the Planeflight Diagnostic

Before creating input files, it's helpful to understand what the planeflight diagnostic does. Visit the GEOS-Chem documentation for a full description:

ðŸ”— https://geos-chem.readthedocs.io/en/stable/gcclassic-user-guide/planeflight.html

In this notebook, we use `pln.make_planeflight_inputs()` to create properly-formatted `Planeflight.dat` input files from some sample campaign data. See the function's docstring (or the companion `.py` script) for a full description of all arguments.

In [1]:
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import os
import numpy as np
import planeflight_io as pln

# Set this to the path of the 'examples/' folder in your local clone of the planeflight_io repo.
path_to_examples = '/home/jhask/Code/planeflight_io/examples'  # <-- Update this!

## Step 1: Load Your Flight Data

`make_planeflight_inputs()` requires four arrays describing the observation points. Match these types, units, and conventions exactly â€” the function does not silently re-project or re-scale inputs:

| Argument | Required type | Units / convention |
|---|---|---|
| `datetimes` | `pd.Series` of `pd.Timestamp` | UTC (not local time) |
| `lat_arr` | 1-D `np.ndarray` | degrees North (âˆ’90 to 90) |
| `lon_arr` | 1-D `np.ndarray` | degrees East (âˆ’180 to 180, **not** 0â€“360) |
| `vert_arr` | 1-D `np.ndarray` | pressure in **hPa** (preferred) or altitude in **meters above ground** |

All four arrays must be the same length and NaN-free.

> **Common pitfalls:**
> - Longitudes must be in the range âˆ’180 to 180 (not 0â€“360).
> - Pressure must be in **hPa** (not Pa). 1 hPa = 1 mbar.
> - Timestamps must be **UTC** (not local time).

**`vert_is_pres` â€” Pressure vs. Altitude:** The planeflight diagnostic only natively supports altitude input for CCGG/tower-type observations. All aircraft data should use pressure (`vert_is_pres=True`). Using altitude for aircraft is technically possible â€” this code sets the TYPE string accordingly so that GEOS-Chem knows its altitude â€” but it is not advisable because of ambiguity between "above ground" and "above sea level" conventions. See [GH issue #320](https://github.com/geoschem/geos-chem/issues/320) for the full discussion.

This notebook uses some example SENEX campaign data shipped with the repository to walk you though different ways to use these functions.

In [15]:
# Load the example SENEX campaign dataset:
senex_pth = path_to_examples + '/datafiles_for_examples/SENEX.nc'
ds = xr.open_dataset(senex_pth)

# Use only the first 2 days so we generate just 2 input files
unq_dates = np.unique(ds.time.dt.date)
ds = ds.where(((ds.time.dt.date == unq_dates[0]) | (ds.time.dt.date == unq_dates[1])), drop=True)

# Extract Times â€” must be a pd.Series of pd.Timestamps:
senex_time = pd.to_datetime(ds.time.values).to_series().reset_index(drop=True)
print(f'Time:      [{senex_time[0]}, {senex_time[1]}, ... {senex_time.iloc[-1]}]')
print(f'           dtype: {type(senex_time[0]).__name__}\n')

# Extract Latitude and longitude as np arrays:
senex_lat = ds.GpsLat.values
senex_lon = ds.GpsLon.values
print(f'Latitude:  [{senex_lat[0]:.4f}, {senex_lat[1]:.4f}, ... {senex_lat[-1]:.4f}] ({ds.GpsLat.attrs["Units"]})')
print(f'           dtype: {senex_lat.dtype}, type: {type(senex_lat).__name__}\n')

print(f'Longitude: [{senex_lon[0]:.4f}, {senex_lon[1]:.4f}, ... {senex_lon[-1]:.4f}] ({ds.GpsLon.attrs["Units"]})')
print(f'           dtype: {senex_lon.dtype}, type: {type(senex_lon).__name__}\n')

# Extract Pressure (hPa) as np-array:
senex_pres = ds.StaticPrs.values
print(f'Pressure:  [{senex_pres[0]:.4f}, {senex_pres[1]:.4f}, ... {senex_pres[-1]:.4f}] ({ds.StaticPrs.attrs["Units"]})')
print(f'           dtype: {senex_pres.dtype}, type: {type(senex_pres).__name__}\n')

Time:      [2013-06-03 14:14:30, 2013-06-03 14:15:30, ... 2013-06-10 21:29:30]
           dtype: Timestamp

Latitude:  [27.8445, 27.8144, ... 36.0138] (deg)
           dtype: float64, type: ndarray

Longitude: [-82.5259, -82.5399, ... -86.5245] (deg)
           dtype: float64, type: ndarray

Pressure:  [1012.1700, 989.6800, ... 994.1100] (mb)
           dtype: float64, type: ndarray



## Step 2: Choose What to Sample

Beyond the advected tracers, the GEOS-Chem planeflight module can output a variety of optional diagnostics (meteorological fields, aerosol optical depths, chemical families, etc.).

Use `pln.get_compatible_input_diags()` to see which diagnostics are compatible with your simulation type. You can request:
- All of them: `diags='?ALL?'`
- A specific subset by collection name via `these_collections`
- A hand-picked list: `diags=['NOy', 'RO2']`

Valid collection names include: `'aer_uptake'`, `'aodb'`, `'aodc'`, `'aq_aer'`, `'chem_fams'`, `'defaults'`, `'gmao_ice'`, `'gmao_met'`, `'hg'`, `'htep'`, `'isor'`, `'tomas'`.

The function also needs your `geoschem_config.yml` to determine the simulation type, list all advected species, and validate optional diagnostic compatibility.

In [16]:
# See all optional diagnostics compatible with a full-chemistry simulation:
diags = pln.get_compatible_input_diags(simtype='fullchem', display=True)

--------------------------------------------------
Aerosol Uptake Diagnostics Collection
--------------------------------------------------
Diagnostics:
	GAMM_DHDN  = Uptake coefficient for DHDN
	GAMM_EPOX  = Uptake coefficient for EPOX
	GAMM_GLYX  = Uptake coefficient for GLYX
	GAMM_IMAE  = Uptake coefficient for IMAE
	GAMM_ISOPN = Uptake coefficient for ISOPN
Notes:
	Untested diagnostics. Will not work w/ v<12.8.0 prior to Bates et al.,2019 Isoprene Chemistry updates.
--------------------------------------------------
Column Aerosol Optical Depth (below aircraft) Diagonstics Collection
--------------------------------------------------
Diagnostics:
	AODB_BLKC  = Column aerosol optical depth for black carbon *below aircraft*
	AODB_DUST  = Column aerosol optical depth for dust *below aircraft*
	AODB_ORGC  = Column aerosol optical depth for organic carbon *below aircraft*
	AODB_SALA  = Column aerosol optical depth for accumulation mode sea salt *below aircraft*
	AODB_SALC  = Column aero

Or if you're only interested in the meterology outputs  or specific collection types you can do: 

In [18]:
# Or retrieve only specific collections:
met_diags = pln.get_compatible_input_diags(simtype='fullchem',
                                            these_collections=['gmao_met', 'chem_fams'],
                                            display=True)

--------------------------------------------------
GMAO Meterology Diagnostics Collection
--------------------------------------------------
Diagnostics:
	GMAO_ABSH  = Absolute humidity
	GMAO_PSFC  = Surface pressure
	GMAO_PSLV  = Sea level pressure
	GMAO_SURF  = Aerosol surface area
	GMAO_THTA  = Potential temperature
	GMAO_UWND  = Zonal winds
	GMAO_VWND  = Meridional winds
--------------------------------------------------
Chemical Family Diagnostics Collection
--------------------------------------------------
Diagnostics:
	AN         = Concentration of AN family
	NOy        = Concentration of NOy family
	RO2        = Concentration of RO2 family


## Why Does `make_planeflight_inputs()` Need Your `geoschem_config.yml`?

`make_planeflight_inputs()` reads your `geoschem_config.yml` for three reasons:

1. **Simulation type** â€” The file identifies whether your run is `fullchem`, `CH4`, `CO2`, etc. This determines which optional diagnostics (`diags`) are compatible with your simulation.
2. **Advected species list** â€” The full list of advected species in your run. This is what enables `tracers='?ALL?'` to work correctly without you having to enumerate every species by hand.
3. **Name â†’ number mapping** â€” `make_planeflight_inputs()` maps each species name to its tracer number so the input file is written with numbers (not names) by default. This is what causes the output to be in `mol/mol` rather than `molec/cmÂ³` (see the section below).

The file lives in your GEOS-Chem run directory (the directory where you ran GEOS-Chem). The examples in this notebook use a copy shipped with the repository data.

## Tracer Numbers vs. Tracer Names

The `use_tracer_names` argument controls both how advected species are listed in the `Planeflight.dat` input file and the units of the resulting output:

| `use_tracer_names` | What goes in the input file | Units in `plane.log` output |
|---|---|---|
| `False` (default, **recommended**) | Tracer numbers (e.g. `1`, `2`, `3`) | **mol/mol dry** â€” directly comparable to observations |
| `True` | Tracer names (e.g. `'NO'`, `'O3'`) | **molec/cmÂ³** â€” requires a conversion step before comparing |

Using tracer **numbers** (the default) produces output in `mol/mol dry`, which can be compared directly to aircraft observations without any further conversion. Using tracer **names** is more human-readable in the input file, but GEOS-Chem will output advected species concentrations in `molec/cmÂ³`, requiring an extra unit conversion step when reading. See [GH issue #796](https://github.com/geoschem/geos-chem/issues/796) for the full explanation of why this unit difference exists.

**Recommendation:** Leave `use_tracer_names=False` (the default) unless you have a specific reason to use names. If you do, pass `convert2_molmol=True` when reading output with `pln.read_and_concat_planelogs()`. The three examples below demonstrate all three selection approaches.

## Example 1: Specific Tracers + Specific Diagnostics

Request a short list of advected species (`NO`, `O3`, `CO`) and two optional diagnostics (`NOy`, `RO2`). This produces the smallest input files and is the fastest option when you only need a few variables.

By default (`use_tracer_names=False`), advected species are written as tracer **numbers** in the input file. This causes GEOS-Chem to output concentrations in `mol/mol dry` â€” the most convenient unit for comparison with observations.

In [19]:

# Point to the geoschem_config.yml file for your GEOS-Chem run:
gc_config = path_to_examples + '/datafiles_for_examples/geoschem_config.yml'

ex1_dir = path_to_examples + '/example1/'
if not os.path.isdir(ex1_dir):
    os.mkdir(ex1_dir)

pln.make_planeflight_inputs(
    savedir=ex1_dir,
    gc_config=gc_config,
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers=['NO', 'O3', 'CO'],   # List of specific tracers to sample
    diags=['NOy', 'RO2'],          # List of optional diagnostics to sample
    username='me',
    overwrite=True,
    use_tracer_names=False,        # Use tracer numbers â†’ outputs in mol/mol (recommended)
)

Output saved at: /home/jhask/Code/planeflight_io/examples/example1/Planeflight.dat.20130603
Output saved at: /home/jhask/Code/planeflight_io/examples/example1/Planeflight.dat.20130610


## Example 2: All Tracers + All Diagnostics (Wildcards)

Pass `tracers='?ALL?'` and `diags='?ALL?'` to request every advected species listed in your `geoschem_config.yml` and every compatible optional diagnostic. The resulting input files are larger, but this approach ensures nothing is missed.

In [20]:
ex2_dir = path_to_examples + '/example2/'
if not os.path.isdir(ex2_dir):
    os.mkdir(ex2_dir)

pln.make_planeflight_inputs(
    savedir=ex2_dir,
    gc_config=gc_config,
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers='?ALL?',               # Wildcard: request all advected species
    diags='?ALL?',                 # Wildcard: request all compatible optional diagnostics
    username='me',
    overwrite=True,
    use_tracer_names=False,
)

Output saved at: /home/jhask/Code/planeflight_io/examples/example2/Planeflight.dat.20130603
Output saved at: /home/jhask/Code/planeflight_io/examples/example2/Planeflight.dat.20130610


## Example 3: All Tracers/Diagnostics Minus Exclusions

Use wildcards but explicitly exclude certain species or diagnostics with `tracers_minus` and `diags_minus`. Useful when you want nearly everything but need to omit a few variables that are irrelevant to your study or that cause issues.

In this example, we also set `use_tracer_names=True` to write tracer **names** instead of numbers. Note: this causes GEOS-Chem to output advected species in `molec/cmÂ³` rather than `mol/mol`, which requires a conversion step when reading the output.

In [21]:
ex3_dir = path_to_examples + '/example3/'
if not os.path.isdir(ex3_dir):
    os.mkdir(ex3_dir)

tracers_minus = ['ClNO2', 'Cl2', 'ClO', 'HOCl', 'HCl', 'BrCl']

diags_minus = [
    "AODC_SULF", "AODC_BLKC", "AODC_ORGC", "AODC_SALA", "AODC_SALC",
    "AODC_DUST", "AODB_SULF", "AODB_BLKC", "AODB_ORGC", "AODB_SALA", "AODB_SALC",
    "AODB_DUST", "GMAO_ICE00", "GMAO_ICE10", "GMAO_ICE20",
    "GMAO_ICE30", "GMAO_ICE40", "GMAO_ICE50", "GMAO_ICE60", "GMAO_ICE70",
    "GMAO_ICE80", "GMAO_ICE90",
]

pln.make_planeflight_inputs(
    savedir=ex3_dir,
    gc_config=gc_config,
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers='?ALL?',
    tracers_minus=tracers_minus,   # Exclude these tracers
    diags='?ALL?',
    diags_minus=diags_minus,       # Exclude these diagnostics
    username='me',
    overwrite=True,
    use_tracer_names=True,         # Outputs in molec/cm3 (requires conversion when reading)
)

Output saved at: /home/jhask/Code/planeflight_io/examples/example3/Planeflight.dat.20130603
Output saved at: /home/jhask/Code/planeflight_io/examples/example3/Planeflight.dat.20130610


## Example 4: No `geoschem_config.yml` Available

Use `gc_config=None` when you don't have the GEOS-Chem run directory available but
know your simulation type and the species you want to sample. You must supply
`simtype=` explicitly and provide an explicit list of tracer names.

**Limitations vs. supplying a config file:**
- `tracers='?ALL?'` is not available â€” no species list to expand it.
- Tracer names (not numbers) are always written to the input file, so GEOS-Chem
  outputs advected species in `molec/cmÂ³`. Pass `convert2_molmol=True` when
  reading the output with `pln.read_and_concat_planelogs()` to convert to `mol/mol`.
- Optional diagnostics (`diags`) still work, including `'?ALL?'`, since those
  only require the simulation type.

In [None]:
ex4_dir = path_to_examples + '/example4/'
if not os.path.isdir(ex4_dir): os.mkdir(ex4_dir)

pln.make_planeflight_inputs(
    savedir=ex4_dir,
    gc_config=None,                  # No config file â€” supply simtype manually
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers=['NO', 'O3', 'CO'],      # Must be explicit list â€” '?ALL?' not available
    diags=['NOy', 'RO2'],            # diags still work: only need simtype
    simtype='fullchem',              # Required when gc_config=None
    username='me',
    overwrite=True,
)