# Creating PlaneFlight Input Files for GEOS-Chem

This notebook shows how to use `pln.make_planeflight_inputs()` to create `Planeflight.dat.YYYYMMDD` input files for the GEOS-Chem planeflight diagnostic.

The only truly required inputs are your flight data (time, lat, lon, pressure) and an explicit list of species you want sampled. If you know your simulation type (e.g. 'fullchem', 'Hg', etc.) then you can also request additional compatible diagnostics be outputted (like meterology or chemical family concentrations). If you have already made a run directory have a `geoschem_config.yml`, you can also request that the files be made to output all the advected species.  

The four examples below start with the simplest possible call and progressively add capability.

## Background: What Is the Planeflight Diagnostic?

The GEOS-Chem planeflight diagnostic samples the model at arbitrary locations and times during a simulation — following an aircraft track, ship cruise, or any set of observation points you define. The sampled values are written to `plane.log` files (one per simulation day) that you can read back and compare directly to your observations.

You define *where*, *when*, and *what* to sample in `Planeflight.dat.YYYYMMDD` input files. This notebook creates those files.

Full documentation: https://geos-chem.readthedocs.io/en/stable/gcclassic-user-guide/planeflight.html

### What do you actually need?

| What you have | What you can do |
|---|---|
| Flight data + explicit tracer list | ✅ Create input files (Examples 1 & 2) |
| + simulation type string | ✅ Also add optional diagnostics (Example 2) |
| + `geoschem_config.yml` | ✅ Also get outputs in  `mol/mol` (not `molec/cm3`), use `tracers='?ALL?'`, validate species (Examples 3 & 4) |

In [1]:
import xarray as xr
import pandas as pd
import os
import numpy as np
import planeflight_io as pln

path_to_examples = os.getcwd()

## Step 1: Load Your Flight Data

`make_planeflight_inputs()` requires four arrays describing your observation points. These conventions must be matched exactly — the function does not silently re-project or re-scale inputs:

| Argument | Required type | Units / convention |
|---|---|---|
| `datetimes` | `pd.Series` of `pd.Timestamp` | UTC (not local time) |
| `lat_arr` | 1-D array-like | degrees North (−90 to 90) |
| `lon_arr` | 1-D array-like | degrees East (−180 to 180, **not** 0–360) |
| `vert_arr` | 1-D array-like | pressure in **hPa** (preferred) or altitude in **meters above ground** |

All four must be the same length and NaN-free.

> **Common pitfalls:** Longitudes must be −180 to 180. Pressure must be in **hPa** (not Pa). Timestamps must be **UTC**.

**Why pressure instead of altitude?** For aircraft data, always use pressure (`vert_is_pres=True`). The planeflight diagnostic only natively handles altitude for CCGG/tower-type sites. Using altitude for aircraft introduces ambiguity between "above ground" and "above sea level" conventions. See [GH #320](https://github.com/geoschem/geos-chem/issues/320).

This notebook uses SENEX campaign data shipped with the repository.

In [2]:
senex_pth = path_to_examples + '/datafiles_for_examples/SENEX.nc'
ds = xr.open_dataset(senex_pth)

# Use only the first 2 days so we generate just 2 input files:
unq_dates = np.unique(ds.time.dt.date)
ds = ds.where(((ds.time.dt.date == unq_dates[0]) | (ds.time.dt.date == unq_dates[1])), drop=True)

# Times — pd.Series of pd.Timestamps in UTC:
senex_time = pd.to_datetime(ds.time.values).to_series().reset_index(drop=True)
print(f'Time:  {type(senex_time[0])}')  # must be pandas Timestamp

# Lat/lon — 1-D arrays, lon must be in range -180 to 180:
senex_lat = ds.GpsLat.values
senex_lon = ds.GpsLon.values
print(f'Lat:   {senex_lat.dtype}, range [{senex_lat.min():.2f}, {senex_lat.max():.2f}] deg')
print(f'Lon:   {senex_lon.dtype}, range [{senex_lon.min():.2f}, {senex_lon.max():.2f}] deg')

# Pressure in hPa — confirm units before passing:
senex_pres = ds.StaticPrs.values
print(f'Pres:  {senex_pres.dtype}, units = {ds.StaticPrs.attrs["Units"]} (must be hPa / mb)')

Time:  <class 'pandas._libs.tslibs.timestamps.Timestamp'>
Lat:   float64, range [27.59, 36.17] deg
Lon:   float64, range [-95.01, -82.53] deg
Pres:  float64, units = mb (must be hPa / mb)


## Example 1: The Minimum Viable Call — No `gc_config.yml` Needed

You don't need your GEOS-Chem run directory at all for this. The call below only requires your flight data and an explicit list of tracer names. This is useful when you haven't setup your run directory yet and only want to output a few specific tracers (not so many you can't list them simply).  

**One consequence of not passing a `gc_config.yml`:**  When  tracer *names* are written to the plane.log input file instead of tracer *numbers*, GEOS-Chem will output advected species concentrations in `molec/cm³` rather than `mol/mol` (making it harder to compare to output in SpeciesConc files).  For more details on why this happens, see GitHub issue #796: 
https://github.com/geoschem/geos-chem/issues/796 .This module includes functionality to make this "feature" not too annoying. So, if you do pass tracer *names* then, when you read in your output files you'll need to pass `convert2_molmol=True` to `pln.read_and_concat_planelogs()` to get the output back in mol/mol. Example 3 shows how to get `mol/mol` directly if you do have `gc_config` available. 


                  

In [3]:
ex1_dir = path_to_examples + '/example1/'
if not os.path.isdir(ex1_dir):
    os.mkdir(ex1_dir)

pln.make_planeflight_inputs(
    savedir=ex1_dir,
    gc_config=None,              # No config file needed for a basic explicit-tracer call
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers=['NO', 'O3', 'CO'],  # Explicit list — required when gc_config=None
    username='me',
    overwrite=True,
)

Output saved at: /home/jhask/Code/planeflight_io/examples/example1/Planeflight.dat.20130603
Output saved at: /home/jhask/Code/planeflight_io/examples/example1/Planeflight.dat.20130610


## Example 2: Adding Optional Diagnostics — Still No Config File Needed

Beyond advected tracers, the planeflight diagnostic can output a variety of **optional diagnostics**: meteorological fields, aerosol optical depths, chemical family concentrations (`NOy`, `RO2`, `AN`), and more. These are requested via the `diags` argument.

Crucially, optional diagnostics **do not require `gc_config`** — they're organised by simulation type, not species list. You only need to tell the function what kind of simulation you're running via `simtype=`.

Use `pln.get_compatible_input_diags()` to explore what's available for your simulation type. You can request all of them with `diags='?ALL?'`, a named subset by collection, or a hand-picked list.

In [4]:
# See every optional diagnostic available for a fullchem simulation:
all_diags = pln.get_compatible_input_diags(simtype='fullchem', display=True)

# Or filter by collection to get just the ones you care about.
# Valid names: 'aer_uptake', 'aodb', 'aodc', 'aq_aer', 'chem_fams',
#              'defaults', 'gmao_ice', 'gmao_met', 'hg', 'htep', 'isor', 'tomas'
met_and_fam = pln.get_compatible_input_diags(
    simtype='fullchem',
    these_collections=['gmao_met', 'chem_fams'],
    display=True,
)

ex2_dir = path_to_examples + '/example2/'
if not os.path.isdir(ex2_dir):
    os.mkdir(ex2_dir)

pln.make_planeflight_inputs(
    savedir=ex2_dir,
    gc_config=None,              # Still no config file needed
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers=['NO', 'O3', 'CO'],
    diags=['NOy', 'RO2'],        # Optional diagnostics — only need simtype, not gc_config
    simtype='fullchem',          # Required when requesting optional diagnostics
    username='me',
    overwrite=True,
)

--------------------------------------------------
Aerosol Uptake Diagnostics Collection
--------------------------------------------------
Diagnostics:
	GAMM_DHDN  = Uptake coefficient for DHDN
	GAMM_EPOX  = Uptake coefficient for EPOX
	GAMM_GLYX  = Uptake coefficient for GLYX
	GAMM_IMAE  = Uptake coefficient for IMAE
	GAMM_ISOPN = Uptake coefficient for ISOPN
Notes:
	Untested diagnostics. Will not work w/ v<12.8.0 prior to Bates et al.,2019 Isoprene Chemistry updates.
--------------------------------------------------
Column Aerosol Optical Depth (below aircraft) Diagonstics Collection
--------------------------------------------------
Diagnostics:
	AODB_BLKC  = Column aerosol optical depth for black carbon *below aircraft*
	AODB_DUST  = Column aerosol optical depth for dust *below aircraft*
	AODB_ORGC  = Column aerosol optical depth for organic carbon *below aircraft*
	AODB_SALA  = Column aerosol optical depth for accumulation mode sea salt *below aircraft*
	AODB_SALC  = Column aero

## Example 3: With `geoschem_config.yml` — `mol/mol` Output and Species Validation

If you have your `geoschem_config.yml`, passing it as `gc_config=` unlocks three things that aren't possible without it:

1. **Tracer numbers instead of names** — the function reads the config to map each species name to its tracer number and writes those numbers to the input file. GEOS-Chem then outputs advected species in `mol/mol` dry, which is directly comparable to observations with no further conversion needed. (Without `gc_config`, names are written and output is in `molec/cm³`.)

2. **`tracers='?ALL?'` wildcard** — the config contains the full list of advected species in your run. Passing `'?ALL?'` samples all of them automatically, so you never have to maintain an explicit list. See Example 4.

3. **Species validation** — every tracer name you pass is checked against the config's species list, catching typos before they silently produce empty columns in your `plane.log` output.

If you don't have `gc_config` available, stay with Examples 1 & 2 and convert units when reading with `convert2_molmol=True`.

In [5]:
gc_config = path_to_examples + '/datafiles_for_examples/geoschem_config.yml'

ex3_dir = path_to_examples + '/example3/'
if not os.path.isdir(ex3_dir):
    os.mkdir(ex3_dir)

pln.make_planeflight_inputs(
    savedir=ex3_dir,
    gc_config=gc_config,         # Unlocks mol/mol output, species validation, and '?ALL?'
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers=['NO', 'O3', 'CO'],
    diags=['NOy', 'RO2'],
    username='me',
    overwrite=True,
    use_tracer_names=False,      # Default: write tracer numbers → output in mol/mol
)
# Open the files from this example and Example 1 side by side — the tracer
# entries will look different (numbers vs. names), and when you run GEOS-Chem
# the concentrations will be in different units as a result.

Output saved at: /home/jhask/Code/planeflight_io/examples/example3/Planeflight.dat.20130603
Output saved at: /home/jhask/Code/planeflight_io/examples/example3/Planeflight.dat.20130610


## Example 4: Wildcards with `geoschem_config.yml` — Everything, Minus Exclusions

With `gc_config` available you can use `'?ALL?'` to request every advected species and/or every compatible optional diagnostic at once. This is the most comprehensive option — useful for exploratory analysis or when you don't want to maintain a long explicit list.

Combining `'?ALL?'` with `*_minus` lists lets you say "everything except..." which is often more concise than a long include list when you only want to drop a handful of irrelevant variables. Here we exclude some halogen tracers (not relevant to this study) and aerosol/ice diagnostics (to keep file size manageable).

Note: `use_tracer_names=True` is set here so you can open these files alongside those from Example 3 and directly compare the tracer-number format (→ `mol/mol`) vs. the tracer-name format (→ `molec/cm³`).

In [6]:
tracers_minus = ['ClNO2', 'Cl2', 'ClO', 'HOCl', 'HCl', 'BrCl']  # exclude halogen tracers

diags_minus = [
    "AODC_SULF", "AODC_BLKC", "AODC_ORGC", "AODC_SALA", "AODC_SALC",
    "AODC_DUST", "AODB_SULF", "AODB_BLKC", "AODB_ORGC", "AODB_SALA", "AODB_SALC",
    "AODB_DUST", "GMAO_ICE00", "GMAO_ICE10", "GMAO_ICE20",
    "GMAO_ICE30", "GMAO_ICE40", "GMAO_ICE50", "GMAO_ICE60", "GMAO_ICE70",
    "GMAO_ICE80", "GMAO_ICE90",
]

ex4_dir = path_to_examples + '/example4/'
if not os.path.isdir(ex4_dir):
    os.mkdir(ex4_dir)

pln.make_planeflight_inputs(
    savedir=ex4_dir,
    gc_config=gc_config,
    datetimes=senex_time,
    lat_arr=senex_lat,
    lon_arr=senex_lon,
    vert_arr=senex_pres,
    vert_is_pres=True,
    tracers='?ALL?',              # Sample every advected species in the run
    tracers_minus=tracers_minus,  # ...except these
    diags='?ALL?',                # Sample every compatible optional diagnostic
    diags_minus=diags_minus,      # ...except these
    username='me',
    overwrite=True,
    use_tracer_names=True,        # Write names → output in molec/cm3 (compare to Example 3!)
)

Output saved at: /home/jhask/Code/planeflight_io/examples/example4/Planeflight.dat.20130603
Output saved at: /home/jhask/Code/planeflight_io/examples/example4/Planeflight.dat.20130610
