# How to Subset L1/L2 AIRS Data Using OPeNDAP and Python

### Authors: Owen Smith, Jennifer Adams, Chris Battisto
### Date Authored: 10-15-2025

### Timing

Exercise: 5 minutes

### Overview

The retired SUBSET_AIRS_L1L2 service provided dimension based channel
subsetting, in addition to both individual variable subsetting and
variable subsetting based on thematic groups.

This notebook demonstrates subsetting operations for all AIRS
collections using xarray and the curated subsetting groups from
`ancillary/subset_airs.json`. The groups and variables provided via
the json file provides 1:1 functionality with what the retired
SUBSET_AIRS_L1L2 provided.

All subsetting is performed via xarray. For more information on the xarray datatree structure please see [the xarray documentation on hierarchical data](https://docs.xarray.dev/en/latest/user-guide/hierarchical-data.html).

### Prerequisites

This notebook was written using Python 3.10, and requires:
- Valid [Earthdata Login credentials](https://urs.earthdata.nasa.gov), and the generation of [Earthdata Prerequisite Files](https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20Generate%20Earthdata%20Prerequisite%20Files) including the <code>.netrc</code> and `.dodsrc` files.
- [Xarray](https://docs.xarray.dev/en/stable/)
- [earthaccess](https://earthaccess.readthedocs.io/en/latest/)

#### Optional Anaconda Environment YAML:

This notebook can be run using the ['nasa-gesdisc-opendap' YAML file](https://github.com/nasa/gesdisc-tutorials/tree/main/environments/nasa-gesdisc-opendap.yml) provided in the 'environments' subfolder.

Please follow the instructions [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) to install and activate this environment. 



---

## Section 1: Setup

### 1. Import Required Libraries

In [None]:
import xarray as xr
import earthaccess
import json
import urllib.request
import os

auth = earthaccess.login()

### 2. Load `subset_airs.json`

The `subset_airs.json` file is stored in the `gesdisc-tutorials` repository, and is streamed to the `subset_airs` variable in the current notebook session.

The file contains channel and frequency values, and their associated dimensional indices, to assist with creating our subsets.

Initialize information for each AIRS subsetting collection

In [None]:
url = 'https://raw.githubusercontent.com/nasa/gesdisc-tutorials/refs/heads/194-add-subset_airs_l1l2-opendap-notebook/ancillary/subset_airs.json'
with urllib.request.urlopen(url) as response:
    subset_airs = json.load(response)

print(subset_airs)

{'AIRABRAD_005': {'channels': {'1': {'value': 1,
    'label': 'channel 1, frequency 23.800 ghz'},
   '2': {'value': 2, 'label': 'channel 2, frequency 31.400 ghz'},
   '3': {'value': 3, 'label': 'channel 3, frequency 50.300 ghz'},
   '4': {'value': 4, 'label': 'channel 4, frequency 52.800 ghz'},
   '5': {'value': 5, 'label': 'channel 5, frequency 53.596 ± 0.115 ghz'},
   '6': {'value': 6, 'label': 'channel 6, frequency 54.400 ghz'},
   '7': {'value': 7, 'label': 'channel 7, frequency 54.940 ghz'},
   '8': {'value': 8, 'label': 'channel 8, frequency 55.500 ghz'},
   '9': {'value': 9, 'label': 'channel 9, frequency 57.290344 ghz'},
   '10': {'value': 10, 'label': 'channel 10, frequency 57.290344 ± 0.217 ghz'},
   '11': {'value': 11,
    'label': 'channel 11, frequency 57.290344 ± 0.3224 ± 0.048 ghz'},
   '12': {'value': 12,
    'label': 'channel 12, frequency 57.290344 ± 0.3224 ± 0.022 ghz'},
   '13': {'value': 13,
    'label': 'channel 13, frequency 57.290344 ± 0.3224 ± 0.010 ghz'},
   '

### 3. Define Helper Functions

We use the OPeNDAP protocol to stream the data via `Xarray` and the `open_datatree` function. This is
prefered over reading the AIRS granules directly as the HDF4-EOS2
format is currently unable to be read via an `Xarray` compatible engine.

In [3]:
def parse_opendap(results):
    """Parse OPeNDAP URLs from earthaccess search results"""
    od_files = []
    for item in results:
        for urls in item['umm']['RelatedUrls']:
            if 'OPENDAP' in urls.get('Description', '').upper():
                url = urls['URL']
                url = url.replace("https", "dap4")
                od_files.append(url)
    print(f"Found {len(od_files)} OPeNDAP files")
    return od_files

def print_channel_ranges(ranges):
    for i, v in enumerate(ranges):
        print(f"Channel {i}")
        channel_i = ranges[str(i)]
        print(f"Range: {channel_i['value']}")
        print(f"Description: {channel_i['label']}")

---

## Section 2: Channel Range Subsetting Examples

Collections with channel ranges can be subset using index slicing on the
channel dimension. All AIRS data which supports channel range subsetting have the same
channels.

The example below demonstrates channel range subsetting on the [AIRS2CCF_006](https://disc.gsfc.nasa.gov/datasets/AIRS2CCF_006/summary) collection, and can be adapted for the following collections, when a channel range has been identified:
- [AIRIBRAD_005](https://disc.gsfc.nasa.gov/datasets/AIRIBRAD_005/summary?keywords=AIRIBRAD_005)
- [AIRS2CCF_7.0](https://disc.gsfc.nasa.gov/datasets/AIRS2CCF_7.0/summary?keywords=AIRS2CCF_7.0)
- [AIRH2CCF_006](https://disc.gsfc.nasa.gov/datasets/AIRH2CCF_006/summary?keywords=AIRH2CCF_006)


---

### Example 1. Subset [AIRS2CCF_006](https://disc.gsfc.nasa.gov/datasets/AIRS2CCF_006/summary)

**Channel Ranges Available:** 17 ranges (0-16)

#### Search for OPeNDAP URLs

In [8]:
# Search and retrieve data
results = earthaccess.search_data(
    short_name="AIRS2CCF",
    version='006',
    temporal=('2003-01-02', '2003-01-02'),
    bounding_box=(-180, 0, 180, 90)
)
od_files = parse_opendap(results)

Found 139 OPeNDAP files


#### Subset Water Vapor Channels (Channel Range 1263-1368, Frequency Range 1216.97/cm - 1272.59/cm)

In [9]:
# Open dataset
fp = od_files[0]
dt = xr.open_datatree(fp)

AIRS2CCF_006_channels = full_map["AIRS2CCF_006"]["channel ranges"]
channel_range_8 = AIRS2CCF_006_channels["8"]["value"]
dt_subset = dt.isel(Channel=slice(*channel_range_8))

print(f"Subset to: {AIRS2CCF_006_channels['8']['label']}")
print(dt_subset.dims)

Subset to: channel range 1263-1368, frequency range 1216.97/cm - 1272.59/cm (water vapor)
Frozen(ChainMap({'GeoTrack': 45, 'GeoXTrack': 30, 'Channel': 106, 'AIRSTrack': 3, 'AIRSXTrack': 3, 'Module': 17}))


#### Subset with Multiple Channel Ranges

We can also select multiple sequential channels via slicing.

In [10]:
ranges_to_combine = [
    AIRS2CCF_006_channels["8"]["value"],
    AIRS2CCF_006_channels["9"]["value"]
]

# Create combined slice (note: this assumes contiguous ranges)
combined_start = ranges_to_combine[0][0]
combined_end = ranges_to_combine[1][1]
dt_combined = dt.isel(Channel=slice(combined_start, combined_end))

print(f"Subset to: {AIRS2CCF_006_channels['8']['label']}")
print(dt_subset.dims)

Subset to: channel range 1263-1368, frequency range 1216.97/cm - 1272.59/cm (water vapor)
Frozen(ChainMap({'GeoTrack': 45, 'GeoXTrack': 30, 'Channel': 106, 'AIRSTrack': 3, 'AIRSXTrack': 3, 'Module': 17}))


---

## Section 3: Variable Group Subsetting Examples

Collections with variable groups allow subsetting by selecting specific
variables from thematic groups.

Available groups

-  Geolocation Fields
-  Attributes
-  Ancillary Per-Granule Data Fields
-  Ancillary Along-Track Data Fields
-  Ancillary Full-Swath Geolocation Fields
-  Ancillary Full-Swath Surface Information from Geolocation
-  Quality Indicator Pressure Boundary Variables
-  Surface Property Variables
-  Air Temperature Variables
-  Water Vapor Saturation Variables Derived from Temperature
-  Tropopause Variables Derived from Temperature
-  Water Vapor Variables
-  Relative Humidity and Geopotential Height Variables from Temperature
    and Water Vapor
-  Cloud Formation Variables on 3 by 3 AIRS Fields of View
-  Ozone Variables
-  Carbon Monoxide Variables
-  Methane Variables
-  Outgoing Longwave Radiation Variables
-  Geolocation QA Variables
-  Miscellaneous Variables
-  Microwave Dependent Variables

The example below will use [AIRH2RET_006](https://disc.gsfc.nasa.gov/datasets/AIRS2RET_006/summary?keywords=AIRS2RET_006) and can be adapted to the following collections once a group from the mapping has been identified:

- [AIRS2RET_006](https://disc.gsfc.nasa.gov/datasets/AIRS2RET_006/summary?keywords=AIRS2RET_006)
- [AIRS2RET_7.0](https://disc.gsfc.nasa.gov/datasets/AIRS2RET_7.0/summary?keywords=AIRS2RET_7.0)
- [AIRX2RET_006](https://disc.gsfc.nasa.gov/datasets/AIRX2RET_006/summary?keywords=AIRX2RET_006)
- [AIRX2RET_7.0](https://disc.gsfc.nasa.gov/datasets/AIRX2RET_7.0/summary?keywords=AIRX2RET_7.0)

### Example 1: Subset [AIRH2RET_006](https://cmr.earthdata.nasa.gov/search/concepts/C1243477376-GES_DISC/9)

#### Search for OPeNDAP URLs

In [15]:
results = earthaccess.search_data(
    short_name="AIRH2RET",
    version='006',
    temporal=('2002-08-30', '2002-08-30'),
    bounding_box=(-180, 0, 180, 90)
)
od_files = parse_opendap(results)

Found 9 OPeNDAP files


#### Open First File and Get Variable Group Configuration

In [16]:
# Open full datatree
AIRH2RET_006_fp = od_files[0]
AIRH2RET_006_dt = xr.open_datatree(AIRH2RET_006_fp)

# Get variable group configuration
AIRH2RET_006_map = full_map["AIRH2RET_006"]

#### Subset Relative Humidity Variables

In [17]:
rh_vars = list(AIRH2RET_006_map["Relative Humidity and Geopotential Height Variables from Temperature and Water Vapor"].keys())

# Navigate to the data group in the datatree
data_path = AIRH2RET_006_dt.L2_Standard_atmospheric_surface_product.Data_Fields

# Create subset - this subsets data variables but preserves coordinates
dt_rh_subset = data_path.copy()
dt_rh_subset.ds = data_path.ds[rh_vars]

print("Relative Humidity subset variables:")
print(list(dt_rh_subset.ds.data_vars))

Relative Humidity subset variables:
['RelHum', 'RelHum_QC', 'RelHumSurf', 'RelHumSurf_QC', 'RelHum_liquid', 'RelHum_liquid_QC', 'RelHumSurf_liquid', 'RelHumSurf_liquid_QC', 'GP_Tropopause', 'GP_Tropopause_QC', 'GP_Height', 'GP_Height_QC', 'GP_Surface', 'GP_Surface_QC']


#### Subset Methane Variables

In [18]:
cloud_vars = list(AIRH2RET_006_map["Methane Variables"].keys())
dt_cloud_subset = data_path.copy()
dt_cloud_subset.ds = data_path.ds[cloud_vars]
print("Relative Humidity subset variables:")
print(list(dt_rh_subset.ds.data_vars))

Relative Humidity subset variables:
['RelHum', 'RelHum_QC', 'RelHumSurf', 'RelHumSurf_QC', 'RelHum_liquid', 'RelHum_liquid_QC', 'RelHumSurf_liquid', 'RelHumSurf_liquid_QC', 'GP_Tropopause', 'GP_Tropopause_QC', 'GP_Height', 'GP_Height_QC', 'GP_Surface', 'GP_Surface_QC']


#### Subset Multiple Variable Groups

In [19]:
ozone_vars = list(AIRH2RET_006_map["Ozone Variables"].keys())
co_vars = list(AIRH2RET_006_map["Carbon Monoxide Variables"].keys())
combined_vars = ozone_vars + co_vars

dt_combined_subset = data_path.copy()
dt_combined_subset.ds = data_path.ds[combined_vars]

print(f"Combined subset has {len(dt_combined_subset.ds.data_vars)} variables")

Combined subset has 18 variables


---

## Section 4. Individual Channel Subsetting

Collections with discrete channel definitions can be subset by selecting
specific channels.

### Example 1. Subset [AIRABRAD_005](https://disc.gsfc.nasa.gov/datasets/AIRABRAD_005/summary?keywords=AIRABRAD_005)

#### Search for OPeNDAP URLs

In [32]:
results = earthaccess.search_data(
    short_name="AIRABRAD",
    version='005',
    temporal=('2002-08-30', '2002-08-30'),
    bounding_box=(-180, 0, 180, 90)
)
od_files = parse_opendap(results)

Found 131 OPeNDAP files


#### Open First File

In [33]:
fp = od_files[0]
dt = xr.open_datatree(fp)

#### Subset to channel 5 (53.596 ± 0.115 GHz)

In [34]:
AIRABRAD_005_channels = full_map["AIRABRAD_005"]["channels"]

channel_5 = AIRABRAD_005_channels["5"]["value"]
dt_subset = dt.isel(Channel=channel_5 - 1)  # Convert to 0-based index

print(f"Selected: {AIRABRAD_005_channels['5']['label']}")

Selected: channel 5, frequency 53.596 ± 0.115 ghz


#### Subset to multiple specific channels

In [35]:
channels_to_select = [1, 5, 9]  # Select channels 1, 5, and 9
channel_indices = [AIRABRAD_005_channels[str(ch)]["value"] - 1 for ch in channels_to_select]
dt_multi_channel = dt.isel(Channel=channel_indices)

print(f"Selected {len(channels_to_select)} channels")

Selected 3 channels


### Example 2. Subset [AIRVBRAD_005]()


In [39]:
# Search and retrieve data
results = earthaccess.search_data(
    short_name="AIRVBRAD",
    version='005',
    temporal=('2002-08-30', '2002-08-30'),
    bounding_box=(-180, 0, 180, 90)
)
od_files = parse_opendap(results)

Found 1 OPeNDAP files


In [40]:
# Open dataset
fp = od_files[0]
dt = xr.open_datatree(fp)

#### Subset to channel 2 (0.58-0.68 µm)

In [41]:
# Get channel configuration
AIRVBRAD_005_channels = full_map["AIRVBRAD_005"]["channels"]
channel_2 = AIRVBRAD_005_channels["2"]["value"]
dt_subset = dt.isel(Channel=channel_2 - 1)

print(f"Selected: {AIRVBRAD_005_channels['2']['label']}")

Selected: channel 2, wavelength 0.58-0.68 µm


#### Subset multiple channels

In [42]:
visible_channels = [1, 2]
channel_indices = [AIRVBRAD_005_channels[str(ch)]["value"] - 1 for ch in visible_channels]
dt_visible = dt.isel(Channel=channel_indices)
print(dt_visible)

<xarray.DataTree>
Group: /
    Dimensions:                             (GeoTrack: 135, GeoXTrack: 90,
                                             Channel: 2, SubTrack: 9, SubXTrack: 8,
                                             GeoLocationsPerSpot: 4)
    Coordinates:
        xtrack_err                          (Channel) float32 8B ...
        Latitude                            (GeoTrack, GeoXTrack) float64 97kB ...
        Longitude                           (GeoTrack, GeoXTrack) float64 97kB ...
      * SubTrack                            (SubTrack) int32 36B 0 1 2 3 4 5 6 7 8
      * SubXTrack                           (SubXTrack) int32 32B 0 1 2 3 4 5 6 7
      * GeoLocationsPerSpot                 (GeoLocationsPerSpot) int32 16B 0 1 2 3
    Dimensions without coordinates: GeoTrack, GeoXTrack, Channel
    Data variables: (12/199)
        radiances                           (GeoTrack, GeoXTrack, Channel, SubTrack, SubXTrack) float32 7MB ...
        offset                        