# How to Download a Spatial and Variable Subset of Level 1B Data using OPeNDAP

### Date Authored: 04-25-2024

### Timing

Exercise: 5 minutes

### Overview:

Downloading one full orbit of [OCO-2 Level 1B calibrated, geolocated science spectra, Retrospective Processing V11r (OCO2_L1B_Science)](https://disc.gsfc.nasa.gov/datasets/OCO2_L1B_Science_11r/summary?keywords=OCO2_L1B_Science.11r) data from the GES DISC can take more than 10 minutes even over a fast internet connection. However, a small spatial and variable subset of OCO-2 radiances can take seconds to download rather than minutes. This notebook describes how to access granules in a region of interest and read just the Longitude and Latitude from the file to find the indices for the desired geographic region, using OPeNDAP and Python.

### Prerequisites:

This notebook was written using Python 3.10, and requires:
- Valid [Earthdata Login credentials](https://urs.earthdata.nasa.gov)
- [Pydap >=3.5](https://github.com/pydap/pydap)
- [NumPy](https://numpy.org/)

#### Optional Anaconda Environment YAML:

This notebook can be run using the ['nasa-gesdisc' YAML file](https://github.com/nasa/gesdisc-tutorials/tree/main/environments/nasa-gesdisc.yml) provided in the 'environments' subfolder.

Please follow the instructions [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) to install and activate this environment. 


### 1. Import libraries

The first step is to import the required Python libraries. If any of the following import commands fail, check the local Python environment and install any missing packages. These lines will be necessary to run the rest of the cells:

In [1]:
from pydap.client import open_url
import numpy as np
import requests
import earthaccess
import os

### 2. Obtaining the geolocation data:

- A **"?"** can be appended to the OPeNDAP url which should be followed by a comma separated list of the variables to include in the subset. The indices to include must be specified for each dimension of each variable.  
- Each dimension requires a beginning index (starting from 0), a stride, and an ending index between square brackets (e.g., [beginning index:stride:ending index]). Downloading just the longitude and latitude is much faster than downloading the entire file.  

- The OPeNDAP url to obtain just the Longitude and Latitude in a compressed NetCDF-4 file is given below. Since this example returns the entire variable, the index ranges are optional. 

- A stride will define the subsampling along the corresponding dimension. A stride of 1 gets all the elements of the hyperslab/subset, and a stride of 2 gets every other element. Also, if the stride is omitted, it is assumed to be one. Thus, the following three urls will return equivalent subsetted files.

With stride 1:
`https://oco2.gesdisc.eosdis.nasa.gov/opendap/OCO2_L1B_Science.11r/2015/015/oco2_L1bScND_02865a_150114_B11006r_230217185540.h5.nc4?SoundingGeometry_sounding_latitude[0:1:8363][0:1:7],SoundingGeometry_sounding_longitude[0:1:8363][0:1:7]`

Default to stride 1:
`https://oco2.gesdisc.eosdis.nasa.gov/opendap/OCO2_L1B_Science.11r/2015/015/oco2_L1bScND_02865a_150114_B11006r_230217185540.h5.nc4?SoundingGeometry_sounding_latitude[0:8363][0:7],SoundingGeometry_sounding_longitude[0:1:8363][0:1:7]`

Full size data:
`https://oco2.gesdisc.eosdis.nasa.gov/opendap/OCO2_L1B_Science.11r/2015/015/oco2_L1bScND_02865a_150114_B11006r_230217185540.h5.nc4?SoundingGeometry_sounding_latitude,SoundingGeometry_sounding_longitude`

### 3. Reading Spatial Subset into Python:

- Rather than downloading a file containing the longitude and latitude it is possible to read just the Longitude and Latitude directly into a program (e.g., Python) which can be used to find the indices in the region of interest. 

- The following steps specifies the indices of all of the OCO-2 footprints in a box centered near Mauna Loa and reads in the longitude, latitude, and radiances in the specified region. This example can be modified to extract more complicated spatial selections for other variables also. 

- Note that one is added to the ending index when reading a variable directly into Python because of the Pythonic convention of terminating arrays at the beginning of the given index rather than including that index. Python also does not require the stride.

### 4. Open Earthdata Login Token

The Pydap library can use a pre-generated token to authenticate with Earthdata Login servers.

To create the `.edl_token`, please follow the steps in this how-to, or else you will experience an error when running the next cell: https://disc.gsfc.nasa.gov/information/howto?keywords=prerequisite&title=How%20to%20Generate%20Earthdata%20Prerequisite%20Files 

In [2]:
# Delete or comment the following 12 lines to use the username/password prompt instead
# Set file path to root
token_file_path = os.path.join(os.path.expanduser("~"), ".edl_token")

# Read the token from the .edl_token file
with open(token_file_path, 'r') as token_file:
    token = token_file.read().strip()  # Ensure to strip any newlines or extra spaces

# Enter the token into the request header
my_session = requests.Session()
my_session.headers = {
    'Authorization': f'Bearer {token}'
}

### 5. Identify the file(s) of interest:

Before accessing data at GES DISC, a  user must first register with Earthdata Login, then be authorized to access data at GES DISC by following steps at:  [data-access.](https://disc.gsfc.nasa.gov/data-access)

This can be done in a number of ways either by using OpenSearch or navigating the OPeNDAP directories of particular data sets. It is currently also possible to browse the http directories, however, the path to the file should be modified to match the OPeNDAP url which is unique for each data granule. One sample OPeNDAP url is given below:

`https://oco2.gesdisc.eosdis.nasa.gov/opendap/OCO2_L1B_Science.11r/2015/015/oco2_L1bScND_02865a_150114_B11006r_230217185540.h5.html`

Pasting this url into a web browser will download the full file in HDF5 format. This file can be downloaded in NetCDF-4 format by using the ".nc4" suffix as shown below:

`https://oco2.gesdisc.eosdis.nasa.gov/opendap/OCO2_L1B_Science.11r/2015/015/oco2_L1bScND_02865a_150114_B11006r_230217185540.h5.nc4`

To learn more about OPeNDAP constraint expressions and protocols, please visit: https://disc.gsfc.nasa.gov/information/tools?title=OPeNDAP%20In%20The%20Cloud#on-prem-cloud-differences

In [3]:
# Create search query for 1980-01-01 Cloud OPeNDAP URL
results = earthaccess.search_data(
    short_name="OCO2_L1B_Science",
    version='11r',
    temporal=('2015-01-15', '2015-01-15'), # This will stream one granule, but can be edited for a longer temporal extent
    bounding_box=(-175, -90, 180, 0)
)

# Parse out URL from request, add to OPeNDAP URLs list for querying multiple granules with constraint expressions
opendap_urls = []
for item in results:
    for urls in item['umm']['RelatedUrls']:  # Iterate over RelatedUrls in each request step
        if 'OPENDAP' in urls.get('Description', '').upper():  # Check if 'OPENDAP' is in the Description
            # Extract OPeNDAP URL, use DAP4 protocol
            url = urls['URL'].replace('https://', 'dap4://')
            # Add URL to list
            opendap_urls.append(url)

opendap_urls[0]

'dap4://oco2.gesdisc.eosdis.nasa.gov/opendap/OCO2_L1B_Science.11r/2015/015/oco2_L1bScND_02865a_150114_B11006r_230217185540.h5'

### 6. Access one of the granules using Pydap

Here, we will access a single OPeNDAP URL, without subsetting constraint expressions, or file type suffixes. We will use the token stored in our session to access the granule, and then use Pydap to handle the file request.

Please note that when accessing via Python, you must use `dap4://` instead of `https://`.

In [4]:
dataset = open_url(opendap_urls[0], session=my_session)

Print the list of attribute names. From this list 'SoundingGeometry_sounding_latitude' and 'SoundingGeometry_sounding_longitude' will be used in the following cells:

In [5]:
dataset

<DatasetType with children 'FootprintGeometry_footprint_altitude', 'FootprintGeometry_footprint_altitude_uncert', 'FootprintGeometry_footprint_aspect', 'FootprintGeometry_footprint_azimuth', 'FootprintGeometry_footprint_land_fraction', 'FootprintGeometry_footprint_latitude', 'FootprintGeometry_footprint_latitude_geoid', 'FootprintGeometry_footprint_longitude', 'FootprintGeometry_footprint_longitude_geoid', 'FootprintGeometry_footprint_los_surface_bidirectional_angle', 'FootprintGeometry_footprint_num_topo_points', 'FootprintGeometry_footprint_o2_qual_flag', 'FootprintGeometry_footprint_plane_fit_quality', 'FootprintGeometry_footprint_polarization_angle', 'FootprintGeometry_footprint_slope', 'FootprintGeometry_footprint_solar_azimuth', 'FootprintGeometry_footprint_solar_surface_bidirectional_angle', 'FootprintGeometry_footprint_solar_zenith', 'FootprintGeometry_footprint_stokes_coefficients', 'FootprintGeometry_footprint_strong_co2_qual_flag', 'FootprintGeometry_footprint_surface_roughn

Read data from the 'SoundingGeometry_sounding_latitude' and 'SoundingGeometry_sounding_longitude' attributes, and subset the data within a set of Longitude and Latitude bounds (in this example we use a box of coordinates centered over Mauna Loa).

**Note:** If you have not created the token in your current notebook session, you will experience an access error during this step. 

In [6]:
sounding_latitude = dataset['SoundingGeometry_sounding_latitude']
sounding_longitude = dataset['SoundingGeometry_sounding_longitude']
location = [-158,17,-153,22]
ialongtrack,iacrosstrack = np.where((sounding_longitude.data[:] > location[0]) & (sounding_latitude.data[:] < location[1]) & (sounding_longitude.data[:] < location[2]) & (sounding_latitude.data[:] < location[3]))

Read in spatial subsets directly from the Pydap dataset:

In [7]:
sounding_latitude_sel=dataset['SoundingGeometry_sounding_latitude'].data[ialongtrack.min():ialongtrack.max()+1,iacrosstrack.min():iacrosstrack.max()+1]
sounding_longitude_sel=dataset['SoundingGeometry_sounding_longitude'].data[ialongtrack.min():ialongtrack.max()+1,iacrosstrack.min():iacrosstrack.max()+1]

Note that “:” is given for the spectral dimension:

In [8]:
radiance_o2_sel=dataset['SoundingMeasurements_radiance_o2'].data[ialongtrack.min():ialongtrack.max()+1,iacrosstrack.min():iacrosstrack.max()+1,:]
radiance_strong_co2_sel=dataset['SoundingMeasurements_radiance_strong_co2'].data[ialongtrack.min():ialongtrack.max()+1,iacrosstrack.min():iacrosstrack.max()+1,:]
radiance_weak_co2_sel=dataset['SoundingMeasurements_radiance_weak_co2'].data[ialongtrack.min():ialongtrack.max()+1,iacrosstrack.min():iacrosstrack.max()+1,:]

Preview the radiance_o2_sel data:

In [9]:
print(radiance_o2_sel.shape)
print(radiance_o2_sel[0][0])
print(radiance_o2_sel[8][0])

(712, 8, 1016)
[1.7377791e+19 1.7470743e+19 1.7500829e+19 ... 1.5983857e+19 1.6740403e+19
 1.6812959e+19]
[5.4536375e+19 5.4693733e+19 5.4410481e+19 ... 4.9029805e+19 5.0783940e+19
 5.1521193e+19]


Preview the radiance_strong_co2_sel data:

In [10]:
print(radiance_o2_sel.shape)
print(radiance_o2_sel[0][0])
print(radiance_o2_sel[8][0])

(712, 8, 1016)
[1.7377791e+19 1.7470743e+19 1.7500829e+19 ... 1.5983857e+19 1.6740403e+19
 1.6812959e+19]
[5.4536375e+19 5.4693733e+19 5.4410481e+19 ... 4.9029805e+19 5.0783940e+19
 5.1521193e+19]


Preview the radiance_weak_co2_sel data:

In [11]:
print(radiance_weak_co2_sel.shape)
print(radiance_weak_co2_sel[0][0])
print(radiance_weak_co2_sel[8][0])

(712, 8, 1016)
[4.5389055e+18 4.3843411e+18 4.6281105e+18 ... 4.3437466e+18 4.2317193e+18
 4.7516995e+18]
[2.4819458e+19 2.4360206e+19 2.5948312e+19 ... 2.2327963e+19 2.1542456e+19
 2.4490685e+19]


### 7. Constraint expressions for subsetting

Rather than reading the spatially subsetted variables into a program, the indices can be used to construct a url to download a subsetted file containing just the selected portion of the given orbit. Using the previously mentioned file as an example, the OPeNDAP url to download a spatial and variable subset as a NetCDF4 file consisting of the radiances and coordinates near Mauna Loa is shown below, referred to as ["constraint expressions"](https://opendap.github.io/documentation/UserGuideComprehensive.html#Constraint_Expressions). Note that the stride, which is omitted, is assumed to be one:

`https://oco2.gesdisc.eosdis.nasa.gov/opendap/OCO2_L1B_Science.11r/2015/015/oco2_L1bScND_02865a_150114_B11006r_230217185540.h5.nc4?SoundingGeometry_sounding_latitude[5993:6242][0:7],SoundingGeometry_sounding_longitude[5993:6242][0:7],SoundingGeometry_sounding_time_tai93[5993:6242][0:7],SoundingMeasurements_radiance_o2[5993:6242][0:7][0:1015],SoundingMeasurements_radiance_strong_co2[5993:6242][0:7][0:1015],SoundingMeasurements_radiance_weak_co2[5993:6242][0:7][0:1015]`


The following Python code shows how this url could be constructed using Python. Note that the following snippet of code is just a string definition that could be performed in one line. The "\\" joins several lines of Python code to make each variable specification more readable. Also, note that the spectral dimension for the radiance variables is hard coded to 1015 (0 is the first index) in this example. A spectral subset could also be obtained by specifying the indices of the spectral range as was done in this recipe for the spatial range.

In [12]:
subset_url = opendap_urls[0] + "?dap4.ce="\
+"/SoundingGeometry_sounding_latitude"+"[{:d}:{:d}][{:d}:{:d}]".format(ialongtrack.min(),ialongtrack.max(),iacrosstrack.min(),iacrosstrack.max())\
+"%3B/SoundingGeometry_sounding_longitude"+"[{:d}:{:d}][{:d}:{:d}]".format(ialongtrack.min(),ialongtrack.max(),iacrosstrack.min(),iacrosstrack.max())\
+"%3B/SoundingGeometry_sounding_time_tai93"+"[{:d}:{:d}][{:d}:{:d}]".format(ialongtrack.min(),ialongtrack.max(),iacrosstrack.min(),iacrosstrack.max())\
+"%3B/SoundingMeasurements_radiance_o2"+"[{:d}:{:d}][{:d}:{:d}][{:d}:{:d}]".format(ialongtrack.min(),ialongtrack.max(),iacrosstrack.min(),iacrosstrack.max(),0,1015)\
+"%3B/SoundingMeasurements_radiance_strong_co2"+"[{:d}:{:d}][{:d}:{:d}][{:d}:{:d}]".format(ialongtrack.min(),ialongtrack.max(),iacrosstrack.min(),iacrosstrack.max(),0,1015)\
+"%3B/SoundingMeasurements_radiance_weak_co2"+"[{:d}:{:d}][{:d}:{:d}][{:d}:{:d}]".format(ialongtrack.min(),ialongtrack.max(),iacrosstrack.min(),iacrosstrack.max(),0,1015) 

print(subset_url)

dap4://oco2.gesdisc.eosdis.nasa.gov/opendap/OCO2_L1B_Science.11r/2015/015/oco2_L1bScND_02865a_150114_B11006r_230217185540.h5?dap4.ce=/SoundingGeometry_sounding_latitude[2591:3302][0:7]%3B/SoundingGeometry_sounding_longitude[2591:3302][0:7]%3B/SoundingGeometry_sounding_time_tai93[2591:3302][0:7]%3B/SoundingMeasurements_radiance_o2[2591:3302][0:7][0:1015]%3B/SoundingMeasurements_radiance_strong_co2[2591:3302][0:7][0:1015]%3B/SoundingMeasurements_radiance_weak_co2[2591:3302][0:7][0:1015]


Finally, we will open this subsetted URL using Pydap, and print its metadata:

In [13]:
dataset = open_url(subset_url, session=my_session)

dataset

<DatasetType with children 'SoundingGeometry_sounding_latitude', 'SoundingGeometry_sounding_longitude', 'SoundingGeometry_sounding_time_tai93', 'SoundingMeasurements_radiance_o2', 'SoundingMeasurements_radiance_strong_co2', 'SoundingMeasurements_radiance_weak_co2'>

## Additional Info:

The Python programs described here have been tested using Python 3.10 and can be modified to create similar spatial and variable subsets for other regions and data sets that are available through OPeNDAP

This data recipe is created by Thomas Hearty with contribution from Andrey Savtchenko, Fan Fang, Paul Huwe, Kyle MacRitchie, Tatiana DaSilva, Dana Ostrenga, Richard Strube, Chung-Lin Shie. It was edited by Chris Battisto in April 2024.

<font size="1">THE SUBJECT FILE IS PROVIDED "AS IS" WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SUBJECT FILE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR FREEDOM FROM INFRINGEMENT, ANY WARRANTY THAT THE SUBJECT FILE WILL BE ERROR FREE, OR ANY WARRANTY THAT DOCUMENTATION, IF PROVIDED, WILL CONFORM TO THE SUBJECT FILE. THIS AGREEMENT DOES NOT, IN ANY MANNER, CONSTITUTE AN ENDORSEMENT BY GOVERNMENT AGENCY OR ANY PRIOR RECIPIENT OF ANY RESULTS, RESULTING DESIGNS, HARDWARE, SOFTWARE PRODUCTS OR ANY OTHER APPLICATIONS RESULTING FROM USE OF THE SUBJECT FILE. FURTHER, GOVERNMENT AGENCY DISCLAIMS ALL WARRANTIES AND LIABILITIES REGARDING THIRD-PARTY SOFTWARE, IF PRESENT IN THE SUBJECT FILE, AND DISTRIBUTES IT "AS IS."