<a id="top"></a>
# Searching for Mission-Specific Data with Astroquery
***
## Learning Goals

By the end of this tutorial, you will:

- Understand how to use the `astroquery.mast` module to access mission dataset metadata from MAST.
- Run metadata queries based on coordinates, an object name, or non-positional criteria.
- Filter and download data products associated with datasets of interest.
- Search for datasets from multiple missions and among [High Level Science Products (HLSPs)](https://outerspace.stsci.edu/display/MASTDOCS/About+HLSPs).

<font color="red">Automated testing has found an error in this Notebook. The authors have been notified and are working on the issue; in the meantime, please use this as a reference only.</font>

## Table of Contents
* [Introduction](#introduction)

* [Imports](#imports)

* [Querying for Datasets from Missions-MAST](#querying-for-datasets-from-missions-mast)

  * [Search Parameters](#search-parameters)

  * [Query by Object Name](#query-by-object-name)

  * [Query by Region](#query-by-Region)

  * [Query by Criteria](#query-by-criteria)

* [Getting Data Products](#getting-data-products)

   * [Performing a Product Query](#performing-a-product-query)

   * [Filtering Data Products](#filtering-data-products)

* [Downloading Products](#downloading-products)

   * [Exclusive Data Access](#exclusive-data-access)

* [Switching Missions](#switching-missions)

* [Exercises](#exercises)

* [Exercise Solutions](#exercise-solutions)

* [Additional Resources](#additional-resources)

## Introduction

Welcome! This tutorial explores the capabilities of the `astroquery.mast.MastMissions` class, a versatile tool for accessing and working with datasets hosted by the [Mikulski Archive for Space Telescopes (MAST)](https://archive.stsci.edu/). `MastMissions` is a Python wrapper for the [MAST Search API](https://mast.stsci.edu/search/docs/), which allows you to search for mission-specific dataset metadata and data products. This data is also findable through the [MAST Search UI](https://mast.stsci.edu/search/ui/#/).

The following missions/products are available for search as of January 2025:

- [Hubble Space Telescope](https://www.stsci.edu/hst) (`hst`)
- [James Webb Space Telescope](https://www.stsci.edu/jwst) (`jwst`)
- [High Level Science Products](https://outerspace.stsci.edu/display/MASTDOCS/About+HLSPs)
  - [COS Legacy Archive Spectroscopic SurveY](https://archive.stsci.edu/hlsp/classy) (`classy`)
  - [Hubble UV Legacy Library of Young Stars as Essential Standards](https://archive.stsci.edu/hlsp/ullyses) (`ullyses`)

In this notebook, we will walk through the basic workflow for searching datasets, retrieving data products, and downloading data products. This workflow will look very similar to the one used with the [`astroquery.mast.Observations`](https://astroquery.readthedocs.io/en/latest/mast/mast_obsquery.html) class, detailed in our ["Searching MAST using astroquery.mast" notebook](https://spacetelescope.github.io/mast_notebooks/notebooks/multi_mission/beginner_search/beginner_search.html). There are a few key differences to note, and you should use the class that is best suited for your unique goals:

* *API*: `MastMissions` uses the [Mast Search API](https://mast.stsci.edu/search/docs/) while `Observations` uses the [MAST Portal API](https://mast.stsci.edu/api/v0/).
* *Collection*: `MastMissions` can only perform queries on a single collection, or "mission", at a time. `Observations` uses the [Common Archive Observation Model (CAOM)](https://mast.stsci.edu/vo-tap/api/v0.1/caom/) and can run queries across every available observational collection at the same time.
* *Filter Keywords*: `MastMissions` has an extensive selection of mission-specific keywords to use while writing queries. `Observations` is limited to the [fields described by the CAOM](https://mast.stsci.edu/api/v0/_c_a_o_mfields.html) and has no criteria with mission-specific meaning.

In summary, `MastMissions` is well-suited for fast, mission-specific queries that might require a more extensive selection of filter keywords. `Observations` is better for more broad, multi-mission searches.

## Imports
This notebook uses the following packages:

- *astropy* to handle astronomical units and coordinate systems
- *astroquery.mast* to query the MAST Archive

In [1]:
import astropy.units as u
from astropy.coordinates import SkyCoord
from astroquery.mast import MastMissions

***

## Querying for Datasets from Missions-MAST

In order to make queries on Missions-MAST metadata, we will have to perform some setup. We will initialize an object of the `astroquery.mast.MastMissions` class and assign its `mission` attribute. The object can be used to search mission dataset metadata by object name, sky position, or other criteria.

The default value for `mission` is `hst`, meaning that queries will be run on Hubble dataset metadata. The searchable metadata for Hubble encompasses all information that was previously accessible through the original HST web search form. The metadata for Hubble and all other available missions is also available through the [MAST Search UI](https://mast.stsci.edu/search/ui/#/).

Later in the tutorial, we will learn how to change the `mission` attribute to make queries on other missions.

In [None]:
# Create MastMissions object to search for Hubble datasets
missions = MastMissions(mission='hst')
missions.mission

### Search Parameters

When writing queries, keyword arguments can be used to specify output characteristics and filter on fields like instrument, exposure type, and proposal ID. The available column names for a mission are returned by the `get_column_list` function. Below, we will print out the name, data type, and description for the first 10 columns in HST metadata.

In [None]:
# Get available columns for HST mission
columns = missions.get_column_list()
columns[:10]

We can refine our results even further with optional keyword arguments. The following parameters are available:

- `radius`: For positional searches only. Only return results within a certain distance from an object or set of coordinates. Default is 3 arcminutes. 

- `limit`: The maximum number of results to return. Default is 5000.
- `offset`: Skip the first ***n*** results. Useful for paging through results.
- `sort_by`: A list of field names to sort by.
- `sort_desc`: A list of booleans (one for each field specified in `sort_by`), describing if each field should be sorted in descending order (`True`) or ascending order (`False`)
- `select_cols`: A list of columns to be returned in the response.

As we walk through different types of queries, we will see these parameters in action!

### Query by Object Name

We've reached our first query! We can use object names to perform metadata queries using the `query_object` function.

To start, let's query for the [Messier 1](https://science.nasa.gov/mission/hubble/science/explore-the-night-sky/hubble-messier-catalog/messier-1/) object, a supernova remnant in the Taurus constellation. You may know it better as the Crab Nebula!

In [None]:
# Query for Messier 1 ('M1')
results = missions.query_object('M1')

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

There were over 600 total results, meaning that hundreds of HST datasets were targeting the Crab Nebula. Now, let's try refining our search a bit more.

- Each dataset is associated with a celestial coordinate, given by `sci_ra` (right ascension) and `sci_dec` (declination). By default, the query returns all datasets that fall within 3 arcminutes from the object's coordinates. Let's set the `radius` parameter to be 1 arcminute instead.
- Say that we're not interested in the first 4 results. We can assign `offset` to skip a certain number of rows.
- By default, a subset of recommended columns are returned for each query. However, we can specify exactly which columns to return using the `select_cols` keyword argument. Certain columns are included automatically, depending on the mission.

In [None]:
# Refined query for Messier 1 ('M1')
results = missions.query_object('M1',
                                radius=1,  # Search within a 1 arcminute radius
                                offset=4,  # Skip the first 4 results
                                select_cols=['sci_start_time', 'sci_pi_last_name'])  # Select certain columns

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

### Query by Region

The `missions` object also allows us to query by a region in the sky. By passing in a set of coordinates to the `query_region` function, we can return datasets that fall within a certain `radius` value of that point. This type of search is also known as a cone search.

In [None]:
# Create coordinate object
coords = SkyCoord(210.80227, 54.34895, unit=('deg'))

# Query for results within 10 arcseconds of coordinates
results = missions.query_region(coords, 
                                radius=10 * u.arcsec)

# Display results
print(f'Total number of results: {len(results)}')
results[:5]

The above datasets fall within our cone search. In other words, their target coordinates are within 10 arcseconds of the coordinates that we defined.

### Query by Criteria

In some cases, we may want to run queries with non-positional parameters. To accomplish this, we use the `query_criteria` function.

For any of our query functions, we can filter our results by the value of columns in the dataset.

Let's say that we want observations from [HST's Wide Field Camera 3 (WFC3)](https://www.stsci.edu/hst/instrumentation/wfc3) instument that use the F555W filter. We are only interested in datasets connected to [proposal number 15879](https://www.stsci.edu/hst-program-info/program/?program=15879).

In [None]:
# Query with column criteria
results = missions.query_criteria(sci_instrume='WFC3',  # From Wide Field Camera 3
                                  sci_spec_1234='F555W',  # Uses F555W filter
                                  sci_pep_id=15879,  # Proposal number 15879
                                  select_cols=['sci_instrume', 'sci_spec_1234', 'sci_pep_id', 'sci_pi_last_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

To exclude and filter out a certain value from the results, we can prepend the value with `!`.

Let's run the same query as above, but this time, we will filter out datasets that use the F555W filter.

In [None]:
# Filtered query, excluding datasets using F555W filter
results = missions.query_criteria(sci_instrume='WFC3', 
                                  sci_spec_1234='!F555W',  # Excludes datasets that use F555W filter
                                  sci_pep_id=15879,
                                  select_cols=['sci_instrume', 'sci_spec_1234', 'sci_pep_id', 'sci_pi_last_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

We can also use wildcards on string criteria for more advanced filtering. Wildcards are special characters used in search patterns to represent one or more unknown characters, allowing for flexible matching of strings. The wildcard character is `*`: it replaces any number of characters preceding, following, or in between the existing characters, depending on its placement.

Let's use the same query from above, but we will add the condition that the target name must contain the string "GEM".

In [None]:
# Filtered query with wildcard
results = missions.query_criteria(sci_instrume='WFC3', 
                                  sci_spec_1234='!F555W',
                                  sci_pep_id=15879,
                                  sci_targname='*GEM*',  # Must contain the string 'GEM'
                                  select_cols=['sci_instrume', 'sci_spec_1234', 'sci_pep_id', 'sci_pi_last_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

To filter by multiple values for a single column, we use a string of the values delimited by commas.

To illustrate this, we will use a slightly different query. We query for WFC3 datasets from proposal 15879 that use either the F153M filter or the F160W filter.

In [None]:
# Filtered query with multiple values
results = missions.query_criteria(sci_instrume='WFC3', 
                                  sci_spec_1234='F153M, F160W',  # Uses either F153M filter OR F160W filter
                                  sci_pep_id=15879,
                                  select_cols=['sci_instrume', 'sci_spec_1234', 'sci_pep_id', 'sci_pi_last_name'])

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

For columns with numeric or date values, we can filter using comparison values:

- `<`: Return values less than or before the given number/date
- `>`: Return values greater than or after the given number/date
- `<=`: Return values less than or equal to the given number/date
- `>=`: Return values greater than or equal to the given number/date

As an example, let's write a query to return all datasets with an observation date before May 1, 1990. These were some of Hubble's first observations! We'll use the optional `sort_by` and `sort_desc` keywords to sort our results in reverse chronological order.

In [None]:
# Query using comparison operator
results = missions.query_criteria(sci_start_time='<1990-05-01',  # Must be observed before May 1, 1990
                                  select_cols=['sci_start_time', 'sci_pep_id'],
                                  sort_by=['sci_start_time'],  # Sort by observation start time
                                  sort_desc=[True])  # Sort in descending order

# Display the first 10 results
print(f'Total number of results: {len(results)}')
results[:10]

For numeric or date data types, we can also filter with ranges. This requires the following syntax: `'#..#'`.

Let's write a query that uses range syntax to return datasets that have an exposure time between 5000 and 5005 seconds.

In [None]:
# Query using range operator
results = missions.query_criteria(sci_actual_duration='5000..5005',  # Exposure duration is between 5000 and 5005 seconds
                                  select_cols=['sci_pep_id', 'sci_actual_duration'])

# Display results
print(f'Total number of results: {len(results)}')
results[:10]

Wow, there's a lot of tips and tricks for writing queries! Here's a quick summary:

* To exclude and filter out a certain value from the results, prepend the value with ``!``.

* Wildcards are special characters used in search patterns to represent one or more unknown characters, 
  allowing for flexible matching of strings. The wildcard character is ``*`` and it replaces any number
  of characters preceding, following, or in between existing characters, depending on its placement.

* To filter by multiple values for a single column, use a string of values delimited by commas.

* For columns with numeric or date data types, filter using comparison values (``<``, ``>``, ``<=``, ``>=``).

* For columns with numeric or date data types, select a range with the syntax ``'#..#'``.

## Getting Data Products

### Performing a Product Query

Each observation returned from a MAST query can have one or more associated data products. For example, a JWST observation might return an [uncalibrated file](https://outerspace.stsci.edu/display/MASTDOCS/Supplemental+Products), [a guide-star file](https://jwst-docs.stsci.edu/jwst-observatory-characteristics/jwst-guide-stars), and the actual science data.

For reproducibility, we'll run another criteria query for datasets that use Hubble's [Advanced Camera for Surveys (ACS)](https://www.stsci.edu/hst/instrumentation/acs) instrument. We are interested in datasets connected to [proposal number 12451](https://www.stsci.edu/hst-program-info/program/?program=12451) that are associated with at least one High Level Science Product.

In [None]:
# Query using range operator
datasets = missions.query_criteria(sci_pep_id=12451,  # Proposal number 12451
                                   sci_instrume='ACS',  # Use ACS instrument
                                   sci_hlsp='>1')  # Associated with at least one HLSP

# Display results
print(f'Total number of results: {len(datasets)}')
datasets[:5]

The `get_product_list` function accepts a table of datasets or a list of dataset IDs and returns a table containing the associated data products. Let's fetch the data products for the first three datasets in the table above.

In [None]:
# Get a list of data products
products = missions.get_product_list(datasets[:3])

# Display results
print(f'Total number of products: {len(products)}')
products[:5]

Some products can be associated with multiple datasets, and this table may contain duplicates. To return a list of products with only unique filenames, use the `get_unique_product_list` function.

In [None]:
# Get products with unique filenames
unique_products = missions.get_unique_product_list(datasets[:3])

# Display results
unique_products[:5]

### Filtering Data Products

These datasets returned quite a few products! We are not interested in all of them, and luckily, we have a handy function to filter them for us. `filter_products` allows you to filter based on file extension (`extension`) and any other of the product fields.

A quick note on filtering: the **AND** operation is performed for a list of filters, and the **OR** operation is performed within a filter set. For example, the filter below will return FITS products that are "science" type **and** have a `file_suffix` of ["ASN" (association files)](https://hst-docs.stsci.edu/acsdhb/chapter-2-acs-data-structure/2-1-types-of-acs-files#id-2.1TypesofACSFiles-2.1.22.1.2AssociationTables) **or** ["JIF" (jitter information files)](https://www.stsci.edu/hst/instrumentation/focus-and-pointing/pointing/jitter-file-format-definition).

In [None]:
# Filter products 
filtered = missions.filter_products(products,
                                    extension='fits',  # FITS file extension
                                    type='science',  # Science data
                                    file_suffix=['ASN', 'JIF'])  # Association files OR jitter information files

# Display results
filtered

## Downloading Products

The `download_products` function accepts a table of products like the one above and will download the products to your local machine. By default, products will be downloaded into the current working directory, in a subdirectory called `mastDownload`. The full local filepaths will have the form `mastDownload/<mission>/<Dataset ID>/file.` You can change the download directory using the `download_dir` parameter.

In [None]:
# Download products using filtered product Table
manifest = missions.download_products(filtered[:2])

# Display results
manifest

For a more streamlined workflow, the function also accepts dataset IDs and product filters.

In [None]:
# Download products using dataset IDs and product filters
manifest = missions.download_products(['JBTAA0010', 'JBTAA0020'],
                                      extension='fits',
                                      type='science',
                                      file_suffix=['ASN', 'JIF'])

# Display results
manifest

To download a single data product file, use the `download_file` function with a MAST URI as input. 
The default is to download the file to the current working directory, but you can specify the download 
directory or filepath with the `local_path` keyword argument.

In [None]:
# Download a single data product
result = missions.download_file('JBTAA0010/jbtaa0010_asn.fits')

# Display result
result

### Exclusive Data Access

Some data may not be publicly available and will require [authentication and authorization](https://outerspace.stsci.edu/display/MASTDOCS/Using+MAST+APIs#UsingMASTAPIs-authAuth.MAST). To [download proprietary data with Astroquery](https://astroquery.readthedocs.io/en/latest/mast/mast.html#accessing-proprietary-data), you will need a [MyST Account](https://proper.stsci.edu/proper/authentication/auth) with proper permissions. You will also need to provide an [API token](https://auth.mast.stsci.edu/info). 

You can use the `login` function to authenticate yourself. After uncommenting and executing the following cell, you should be prompted to enter your token.

In [20]:
# missions.login()

You can also provide a token to a `MastMissions` object upon initialization using the `mast_token` parameter. However, remember to be cautious with your API token. You should not share the token or check it into source control. For the best security, we recommend using the `login` method to authenticate yourself.

## Switching Missions

As mentioned previously, each `MastMissions` object can only make queries and download products from a single collection at a time. This collection can be modified with the `mission` class attribute, which is case-insensitive. This allows users to query multiple collections with the same object. 

To demonstrate, we'll create a new `MastMissions` object and initialize the `mission` to be `'JWST'`. This will perform queries on dataset metadata from the James Webb Space Telescope.

In [None]:
multi_mission = MastMissions(mission='JWST')
multi_mission.mission

Next, we'll query for JWST datasets around [NGC 346](https://science.nasa.gov/image-detail/young-stars-sculpt-gas-with-powerful-outflows/), a young star cluster in the [Small Magellanic Cloud](https://www.nasa.gov/image-article/taken-under-wing-of-small-magellanic-cloud/). We'll use a radius of 0.2 arcminutes.

In [None]:
# Query JWST for NGC 346
results = multi_mission.query_object('NGC 346',
                                     radius=0.2)  # Search within a 0.2 arcminute radius

# Display results
print(f'Total number of datasets: {len(results)}')
results[:5]

This query returned over 160 JWST datasets. Now, let's try it with a different data collection. We'll reassign the `mission` attribute on the `multi_mission` object to be `'ullyses'` and run the same query.

In [None]:
multi_mission.mission = 'ullyses'
multi_mission.mission

In [None]:
# Query ULLYSES for NGC 346
results = multi_mission.query_object('NGC 346',
                                     radius=0.2)  # Search within a 0.2 arcminute radius

# Display results
print(f'Total number of datasets: {len(results)}')
results[:5]

Notice that this query returned only a few datasets. The result tables also look very different in terms of data and column keywords. This is because each query is being performed on a different data collection!

## Exercises

**Exercise 1**: It's time to apply all that you've learned and try your hand at writing a `MastMissions` query! Write a non-positional query based on the following:

- Image observations
- Instrument should NOT include the [Cosmic Origins Spectrograph (COS)](https://www.stsci.edu/hst/instrumentation/cos)
- Filter used is F150W, F105W, or F110W
- Declination is greater than 0 degrees
- Exposure time is between 1000 and 2000 seconds
- Target name contains the string "GAL"
- Skip the first 5 entries
- Sort by exposure time in descending order
- Limit the results to 3 datasets

In [25]:
# A non-positional query with column criteria
# results = missions.query_criteria(...)  # Write your query here!

# Display results
# results

**Exercise 2**: Using your results from Exercise 1, download the association table data products for the 3 datasets (HINT: `file_suffix = 'ASN'`). You can fetch, filter, and download the products as three separate steps, or use the streamlined workflow built in to `download_products`.

In [26]:
# Fetch products from 3 datasets
# products = missions.get_product_list(...)

# Filter products
# filtered = missions.filter_products(...)

# Download products
# missions.download_products(...)

**Exercise 3**: Use a new `MastMissions` object and the `mission` attribute to search for datasets around the coordinate "22h57m39s -29d37m20s" from both HST and JWST. Use a radius of 0.1 arcminutes.

In [27]:
# Create new MastMissions object
#m = MastMissions()

# Create sky coordinate object
#coord = SkyCoord(...)

# Query HST metadata for region
#results = m.query_region(...)

# Display the first 5 results
#print(f'Total number of datasets: {len(results)}')
#results[:5]

In [28]:
# Switch mission to JWST
# ...

# Query JWST metadata for region
#results = m.query_region(...)

# Display the first 5 results
#print(f'Total number of datasets: {len(results)}')
#results[:5]

## Exercise Solutions

**Exercise 1:**

In [None]:
# A non-positional query with column criteria
results = missions.query_criteria(sci_obs_type='IMAGE',
                                  sci_instrume='!COS',
                                  sci_spec_1234='F150W, F105W, F110W',
                                  sci_dec='>0',
                                  sci_actual_duration='1000..2000',
                                  sci_targname='*GAL*',
                                  offset=5,
                                  sort_by=['sci_actual_duration'],
                                  sort_desc=[True],
                                  limit=3)

# Display results
results

**Exercise 2:**

In [None]:
# As 3 separate steps
# Fetch products from first 3 datasets
products = missions.get_product_list(results)

# Filter products
filtered = missions.filter_products(products,
                                    file_suffix='ASN')

# Download products
missions.download_products(filtered)

In [None]:
# Streamlined
missions.download_products(results['sci_data_set_name'].tolist(),
                           file_suffix='ASN')

**Exercise 3:**

In [None]:
# Create new MastMissions object
m = MastMissions()

# Create sky coordinate object
coord = SkyCoord('22h57m39s -29d37m20s')

# Query HST metadata for region
results = m.query_region(coord,
                         radius=0.1)

# Display the first 5 results
print(f'Total number of datasets: {len(results)}')
results[:5]

In [None]:
# Switch mission to JWST
m.mission = 'JWST'

# Query JWST metadata for region
results = m.query_region(coord,
                         radius=0.1)

# Display the first 5 results
print(f'Total number of datasets: {len(results)}')
results[:5]

## Additional Resources

- [MAST Search Form UI](https://mast.stsci.edu/search/ui/#/)
- [MAST Search API](https://mast.stsci.edu/search/docs/)
- [`astroquery.mast` Documentation for Mission Searches](https://astroquery.readthedocs.io/en/latest/mast/mast_missions.html#mission-specific-search-queries)

## Citations

If you use `astroquery` for published research, please cite the
authors. Follow these links for more information about citing `astroquery`:

* [Citing Astroquery](https://github.com/astropy/astroquery/blob/main/astroquery/CITATION)

## About this Notebook

**Author(s):** Sam Bianco <br>
**Keyword(s):** Tutorial, Astroquery, MastMissions <br>
**First published:** January 2025 <br>
**Last updated:** January 2025 <br>

***
[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/style-guides/master/guides/images/stsci-logo.png" alt="Space Telescope Logo" width="200px"/> 