Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List available Harmony services for a dataset #447

Closed
nikki-t opened this issue Feb 6, 2024 · 5 comments
Closed

List available Harmony services for a dataset #447

nikki-t opened this issue Feb 6, 2024 · 5 comments

Comments

@nikki-t
Copy link
Collaborator

nikki-t commented Feb 6, 2024

As a first step to facilitating the use of services in earthaccess, we should modify earthaccess so that it can list the available services for a collection.

Link to Harmony Documentation: https://harmony.earthdata.nasa.gov/docs
Link to CMR API documentation on services: https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service

This would allow earthaccess to return a list of services for a collection so that we can integrate future work on service usage into the codebase. Related issue: #328

@asteiker
Copy link
Member

This is a (painful) way of determining available services for a given collection using graphql: https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/07_Harmony_Subsetting.html#discover-service-options-for-a-given-data-set

Harmony also provides a capabilities endpoint to determine available services: https://harmony.earthdata.nasa.gov/docs#available-services

@JessicaS11
Copy link
Collaborator

I was looking through python_cmr for something else and came across a set of functions for "Tool and Variable Service CMR Queries" starting on this line. Not sure if they'll help but figured it's worth a check to see if they've already done some of the work for us...

@andypbarrett
Copy link
Collaborator

I wonder if we should be thinking about how this kind of query might be used and by what/whom?

My current thinking is... If a python tool is returning information about services then that information should be able to be used by a tool (the same tool or a different one). I'm thinking of a pipeline...

result = earthaccess.search_datasets(...)
if "harmony" in result.services:
    subset = result.harmony.subsetter().to_file(name_of_file)  # uses spatial and temporal bounds from query for subsetting
else:
    earthaccess.download(result)

I can also see a case for a user querying services from a notebook: for example you have found the dataset you want and you want to know if you have to download/access a complete file or is you can use a service.

I think beyond that, a lot of discovery for services and options would be done via user guides and other web-hosted information.

@JessicaS11
Copy link
Collaborator

I think beyond that, a lot of discovery for services and options would be done via user guides and other web-hosted information.

This was part of our hope via a plugin interface (#328). Sort of like Xarray can discover and use whatever backends you have installed in your environment, earthaccess can discover and use whatever services/subsetters are available via your installed libraries, so long as those libraries have set up the required plugin functionality. This takes the onus off earthaccess to actually implement/maintain specific interfaces (except, perhaps, with a system like harmony) but makes it easy for users to access those other tools through earthaccess in a predictable way.

@nikki-t
Copy link
Collaborator Author

nikki-t commented Mar 4, 2024

@andypbarrett and @JessicaS11 - I think you both bring up some great points around the use of services so I put together a mini roadmap for implementing service information with an eye towards implementing a plugin interface.

Requirements analysis: How would a user approach searching for a service?

  1. Would a user want to pull the available services from the earthaccess.results.DataCollection object?
    • A user would search a collection for a specific service and then decide to use the service to subset data passing the required input data for the service to perform its operations on the data and return results.
  2. Would a user want to pull the available services from the earthaccess.results.DataGranule object?
    • A user would search for granules and then decide to use a service to subset them passing the granule data to the service.
  3. Would a user to be able to search for all services?
    • A user can search for a service by name and return data about that service. What data would that user be interested in?

Proposed code design focusing on bullet 1

  1. Create a earthaccess.results.DataCollection.service method that returns the services associated with a collection.
    • Retrieve service concept ID from the results of the CMR query. Requires modification to the earthaccess.results.DataCollection class to include service results.
    • Search for each service using a CMR query by concept ID to retrieve the name of the service and possibly other information that may be useful to know about the service.
    • Return results of service query.
  2. Create a plugin interface. Work belongs to Issue #328.
    • Create a plugin directory that holds a Plugin abstract class to serve as the parent for child classes that implement the use of various collection services like Harmony, OPeNDAP, HyP3, etc.
    • earthaccess can automatically discover and load plugins found in this directory. Here is a method that may prove useful.
    • This way anyone can add a plug-in and earthaccess can use it by working with the Plugin abstract class methods.
  3. Integrate the service results with a plugin interface.
    • The earthaccess.results.DataCollection.service method can be modified to search the names of available plugins.
    • The method can also check a list of plugins that do not have UMM-S records but are associated with collections.
    • The method can then return the plugins available for the specific dataset.

Nice to have or future work (based on user need)

  • Complete similar operations for the earthaccess.results.DataGranule class so that users can retrieve service info from granules and ultimately submit granule data to the plugin.
  • Create a earthdata.search.Service child class from cmr.queries.ServiceQuery (python_cmr) that can return a list of services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

4 participants