# Challenge #2: Querying MAST Data with Astroquery

---

## Learning Goals

- Use `astroquery.mast` to search for observations across mission collections by coordinates, object name, and other criteria.
- Fetch and filter the data products associated with observations.
- Locate and access data products on the cloud.

## Table of Contents

- [Introduction](#Introduction)
- [Imports and Setup](#Imports-and-Setup)
- [Querying for MAST Observations](#Querying-for-MAST-Observations)
  - [Query for Observations](#Step-1.-Query-for-Observations)
  - [Fetch and Filter Product Files](#Step-2.-Fetch-and-Filter-Product-Files)
  - [Access Data on the Cloud](#Step-3.-Access-Data-on-the-Cloud)
  - [Streamlined Query](#Streamlined-Query)
- [Exercise](#Exercise)

## Introduction

Dr. Nefarious has scrambled the index to the MAST Archive! Without it, astronomers can't locate critical datasets, and the longer is stays scrambled, the closer he gets to wiping the archive forever. Your next objective is clear: restore access to the Archive's data using powerful querying tools.
Your goal is to track down the exact files needed to uncover Dr. Nefarious’s next move — and send him running back to his data void.

Let’s bring the archive back online. Query away, agent.

## Imports and Setup

This notebook uses the following packages:
- `astropy` to handle coordinates, time, and units.
- `astroquery.mast` to search for and select data from the MAST archive.

In [None]:
from astropy.coordinates import SkyCoord
from astropy.time import Time
import astropy.units as u
from astroquery.mast import Observations, discovery_portal

We will also enable cloud data access in `astroquery.mast`. This will allow us to fetch the cloud URIs for data products and access files directly without downloading them.

In [None]:
Observations.enable_cloud_dataset()

---

## Querying for MAST Observations

In this section, we will use the following workflow to access MAST data:

1. Query for observations in MAST using metadata criteria.
2. Fetch and filter the product files associated with each observation.
3. Access the data directly by fetching the location of each product file in S3 cloud storage.
   
### Step 1. Query for Observations

The [`astroquery.mast.Observations`](https://astroquery.readthedocs.io/en/latest/mast/mast_obsquery.html) class allows direct programmatic access to the [MAST Portal](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html) and is used to query MAST observational data.

Metadata queries can be done with three different functions:
- `query_region()`: Performs a cone search given target coordinates and a radius (default = 0.2 degrees)
- `query_object()`: Performs a cone search around an object by resolving the name of the object to coordinates.
- `query_criteria()`: Returns a list of observations that meet a given set of criteria. 

#### Region Search

First, let's write a query to search for observations in a certain region of the sky. We will pass in a set of coordinates and a radius in arcminutes to the `query_region` function. If no radius is specified, the default value is 0.2 degrees.

In [None]:
# Query for TESS observations around a coordinate
coordinates = SkyCoord('21h57m39.04625s -29d37m20.0533s')
obs = Observations.query_region(coordinates=coordinates,
                                radius='1 second')
print(f'Total number of observations: {len(obs)}')
obs[:5]

#### Object Search

We can query for observations around a certain object using the `query_object` function. We can query for observations around a certain object using the `query_object` function. We will query for observations around [M11](https://science.nasa.gov/mission/hubble/science/explore-the-night-sky/hubble-messier-catalog/messier-11/), an open star cluster in the Scutum constellation. We will set the search radius to 2 arcseconds. We will set the search radius to 2 arcseconds.

In [None]:
obs = Observations.query_object(objectname='M11',
                                radius=2 * u.arcsec)
print(f'Total number of observations: {len(obs)}')
obs[:5]

#### Criteria Search

To search for observations based on additional parameters, you can use the `query_criteria` function. `query_criteria()` is the most versatile of the three query functions. You can still search by `coordinates` or `objectname`, but you can also query by additional desired criteria. Keep in mind, however, that at least one non-positional criterion must be supplied to `query_criteria()`. Otherwise, you should use one of the other query functions.

To perform a search with `query_criteria()`, provide your criteria as keyword arguments. Valid criteria and their descriptions are provided as [CAOM Field Descriptions](https://mast.stsci.edu/api/v0/_c_a_o_mfields.html), or you can use the `get_metadata()` function:

In [None]:
Observations.get_metadata('observations')

Let's make another search around [Betelgeuse](https://science.nasa.gov/universe/what-is-betelgeuse-inside-the-strange-volatile-star/), a red supergiant star in the constellation of Orion.  This time, we will limit our results to timeseries observations from the [TESS mission](https://science.nasa.gov/mission/tess/).

In [None]:
obs = Observations.query_criteria(objectname='Betelgeuse',
                                  radius=2 * u.arcsec,
                                  obs_collection='TESS',
                                  dataproduct_type='timeseries')
obs

##### Other Tips and Tricks

`query_criteria()` is a powerful function, and there are some neat tricks you can use to perform more advanced queries.

##### Value Range Search

To query on fields that have a float data type, the keyword argument value should be in the form [minVal, maxVal]. Let's perform a rectangular region search of TESS timeseries observations by providing a range of values to the `s_ra` and `s_dec` keywords.

In [None]:
# Perform a rectangular region search
obs = Observations.query_criteria(obs_collection='TESS',
                                  dataproduct_type='timeseries',
                                  s_ra=[320, 320.5],
                                  s_dec=[22, 22.2])
obs

We can also use ranges to query on time-based keywords, prefixed by `t_`. The `astropy.Time` module can be used to convert dates into Modified Julian Date (MJD) format. Let's query for all TESS timeseries that were released after May 1, 2025 by providing a value range to the `t_obs_release` keyword.

In [None]:
# Get dates in Modified Julian Date (MJD) format
may_time = Time('2025-05-01').mjd
curr_time = Time.now().mjd

# Query for all observations released since November 1, 2024
obs = Observations.query_criteria(obs_collection='TESS',
                                  dataproduct_type='timeseries',
                                  t_obs_release=[may_time, curr_time])
print(f'Number of Observations: {len(obs)}')
obs[:5]

##### Multi-Valued Search

For non-float criteria, you can pass in more than one value for a keyword by supplying the arguments as a list. The `sequence_number` keyword corresponds to the TESS sector. Below is a query that returns the TESS timeseries observations around the star Fomalhaut for sectors 2 and 29.

In [None]:
obs = Observations.query_criteria(objectname='Fomalhaut',
                                  obs_collection='TESS',
                                  dataproduct_type='timeseries',
                                  sequence_number=[2, 29])
obs

##### Wildcard Search

Non-float criteria also support the use of wildcards. These are special characters used in search patterns to represent one or more unknown characters, allowing for flexible matching of strings. The available wildcards are `%` and `*`: each replaces any number of characters preceding, following, or in between the existing characters, depending on its placement. Wildcards can only be inserted into `string` objects in Python. However, you can use wildcards with integer criteria by passing in the arguments as strings. Remember this important caveat: only one wildcarded value can be processed per criterion.

The following query demonstrates the use of wildcards. It returns TESS timeseries observations around Fomalhaut where the proposal ID string starts with "G0" and contains the character "5". It also selects for observations where the sector number begins with "2". Note that although `sequence_number` is an integer field, it accepts a wildcarded string.

In [None]:
obs = Observations.query_criteria(objectname='Fomalhaut',
                                  obs_collection='TESS',
                                  dataproduct_type='timeseries',
                                  proposal_id='G0*5*',
                                  sequence_number='2*')
obs

### Step 2. Fetch and Filter Product Files

Each observation returned by a MAST query can have one or more associated data products. For reproducibility, let's query the TESS timeseries observations for the target `375422201` in the [TESS Input Catalog (TIC)](https://tess.mit.edu/science/tess-input-catalogue/). We will also select for sectors 15, 16, and 17.

In [None]:
TESS_table = Observations.query_criteria(target_name=375422201,
                                         obs_collection="TESS",
                                         dataproduct_type='timeseries',
                                         sequence_number=[15, 16, 17]) 
TESS_table

We can use the `Observations.get_product_list()` function to return the underlying product files for the four observations above. As input, the function takes a table of observations or a list of observation IDs (`obs_id` column).

In [None]:
data_products = Observations.get_product_list(TESS_table)

print(f'Total Products: {len(data_products)}')
data_products[:5]

#### Filtering Products

This returned quite a few products! We are not interested in all of them, and luckily, we have a handy function to filter them for us. `Observations.filter_products` allows you to filter based on minimum recommended products (`mrp_only`), file extension (`extension`), and any other of the [product fields](https://mast.stsci.edu/api/v0/_productsfields.html).

A quick note on filtering: the **AND** operation is performed for a list of filters, and the **OR** operation is performed within a filter set. For example, the filter below will return products that are "SCIENCE" type **and** have a `productSubGroupDescription` of "LC" (light curves) **or** "TP" (target pixel files).

In [None]:
science_products = Observations.filter_products(data_products,
                                                productType='SCIENCE',
                                                productSubGroupDescription=['LC', 'TP'])
science_products

### Step 3. Access Data on the Cloud

Certain mission data is publicly available for free on [Amazon Web Services](https://registry.opendata.aws/collab/stsci/). Now that we have a table of filtered products, we can use the `Observations.get_cloud_uris` function to locate these product files in the S3 bucket.

In [None]:
Observations.get_cloud_uris(science_products)

The output is a list of S3 URIs: one for each product in the table that we passed into the function. We can now use these URIs to open the files and stream their data directly into system memory. No expensive downloading required!

Details on accessing cloud data is out of scope for this tutorial, but there are multiple methods available to you as a MAST user:
- Use `astropy.io.fits` to open FITS files directly from the cloud.
- Use the `s3fs` package to browse and access files in S3 buckets.
- Use the `lightkurve` package to read TESS/Kepler/K2 data products directly from the cloud.

### Streamlined Query

In this tutorial, we walked through a 3-step workflow to query MAST observations and locate data products on the cloud. You can streamline this process by providing query criteria and product filters directly to the `get_cloud_uris()` function! Query criteria are supplied as keyword arguments, and filters are supplied through the `filter_products` parameter.

Below is the streamlined version of the walkthrough example:

In [None]:
Observations.get_cloud_uris(target_name=375422201,
                            obs_collection="TESS",
                            dataproduct_type='timeseries',
                            sequence_number=[15, 16, 17],
                            filter_products={'productType': 'SCIENCE',
                                             'productSubGroupDescription': ['LC', 'TP']})

## Exercise

To unlock your clue for this challenge, you'll need to use the workflow described in this notebook to query for observations and fetch product files, and locate data on the cloud.

First, write a query that satisfies the following criteria:
- Cone search around the M11 object with a radius of 0.2 degrees.
- Image observations intended for science.
- Wavelength region is "OPTICAL" or "INFRARED".
- Observation description (title) contains the word "survey".
- Observation exposure time is between 785 and 794 seconds.

In [None]:
# Write your criteria query
   

Next, you'll need to get the product files for your query results and filter them to only include the following:
- FITS files.
- Publicly available files (HINT: `dataRights` field).

In [None]:
# Get data products for your observations


In [None]:
# Filter data products


Finally, you should get S3 URIs for your list of filtered products.

In [None]:
# Get cloud URIs for the filtered products

Excellent job, agent! Your clue is hidden in the final column of your filtered product results.

Continue on to the next challenge to gather more clues and thwart Dr. Nefarious's plans!

## Additional Resources

- [`astroquery.mast` Documentation](https://astroquery.readthedocs.io/en/latest/mast/mast.html)
- [MAST Portal API](https://mast.stsci.edu/api/v0/)
- [MAST Portal UI](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html)

## About this Notebook

**Author:** Sam Bianco <br>
**Keywords:** AAS 246, Astroquery, Cloud, Observations <br>
**Last Updated:** June 2025 <br>
***
[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 