# Querying for TESS Data in MAST with Astroquery

---

## Learning Goals

- Use `astroquery.mast` to search for observations across mission collections by coordinates, object name, and other criteria.
- Fetch and filter the data products associated with observations.
- Locate and access data products on the cloud.

## Table of Contents

- [Introduction](#Introduction)
- [What is Astroquery?](#What-is-Astroquery?)
- [Imports and Setup](#Imports-and-Setup)
- [Querying for MAST Observations](#Querying-for-MAST-Observations)
  - [Query for Observations](#Step-1.-Query-for-Observations)
  - [Fetch and Filter Product Files](#Step-2.-Fetch-and-Filter-Data-Products)
  - [Access Data on the Cloud](#Step-3.-Access-Data-on-the-Cloud)
  - [Streamlined Query](#Streamlined-Query)
- [Exercise](#Exercise)

## Introduction

Dr. Nefarious, an evil mastermind and lover of jazz, has scrambled the index to the MAST Archive! Without it, astronomers can't locate critical datasets, and the longer is stays scrambled, the closer he gets to wiping the archive forever. Your next objective is clear: restore access to the Archive's data using powerful querying tools.
Your goal is to track down the exact files needed to uncover Dr. Nefarious’s next move and send him running back to his data void.

Let’s bring the Archive back online. Query away, agent.

In this tutorial, we will learn how to search for, filter, and access TESS data on the cloud with the ``astroquery.mast`` module.

## What is Astroquery?

The [`astroquery`](https://astroquery.readthedocs.io/en/latest/) package is an astronomer-friendly way to programatically query online astronomical data sources. There are many modules, each of which is designed to access different datasets. The [`astroquery.mast`](https://astroquery.readthedocs.io/en/latest/mast/mast.html) module can be used to query the [MAST Archive](https://archive.stsci.edu/).

## Imports and Setup

We will import the following packages:
- `astropy` to handle units and coordinates
- `astroquery.mast` to search for and select data

In [None]:
import astropy.units as u
from astropy.coordinates import SkyCoord
from astropy.time import Time
from astroquery.mast import Observations

We will also enable cloud data access in `astroquery.mast`. This will allow us to fetch the cloud URIs for data products and access files directly without downloading them.

In [None]:
Observations.enable_cloud_dataset()

## Querying for MAST Observations

In this section, we will use the following workflow to access TESS data:

1. Query for TESS observations in MAST using metadata criteria.
2. Fetch and filter the product files associated with each observation.
3. Access the data directly by fetching the location of each product file in S3 cloud storage.

### Step 1. Query for Observations

The [`astroquery.mast.Observations`](https://astroquery.readthedocs.io/en/latest/mast/mast_obsquery.html) class allows direct programmatic access to the [MAST Portal](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html) and is used to query MAST observational data.

Metadata queries can be done with three different functions:
- `query_region()`: Performs a cone search given target coordinates and a radius (default = 0.2 degrees)
- `query_object()`: Performs a cone search around an object by resolving the name of the object to coordinates.
- `query_criteria()`: Returns a list of observations that meet a given set of criteria. 

`query_criteria()` is the most versatile function of the three, so we will be using this to query for TESS observations. You can still search by `coordinates` or `objectname`, but you can also query by additional desired criteria. Keep in mind, however, that at least one non-positional criterion must be supplied to `query_criteria()`. Otherwise, you should use one of the other query functions.

To perform a search with `query_criteria()`, provide your criteria as keyword arguments. Valid criteria and their descriptions are provided as [CAOM Field Descriptions](https://mast.stsci.edu/api/v0/_c_a_o_mfields.html), or you can use the `get_metadata()` function:

In [None]:
Observations.get_metadata('observations')

#### Region Search

First, let's write a query to search for TESS observations in a certain region of the sky. We will pass in a set of coordinates and a radius in arcminutes to the `query_criteria` function. If no radius is specified, the default value is 0.2 degrees.

To return only observations from the TESS mission, we will set the keyword `obs_collection` to equal `'TESS'`. We will also select for timeseries observations by setting the keyword `dataproduct_type` to `'timeseries'`. We will search for observations in a circular region around the coordinates, with a radius of 1 arcsecond.

In [None]:
# Query for TESS observations around a coordinate
coordinates = SkyCoord('22h57m39.04625s -29d37m20.0533s')
obs = Observations.query_criteria(coordinates=coordinates,
                                  radius='1 arcsec', 
                                  obs_collection='TESS',
                                  dataproduct_type='timeseries')
obs

The results are returned as an `astropy.Table` object. From here, you can view the table and its structure, access data, modify data, perform filtering, sort by a column, and more.

#### Object Search

We can query for observations around a certain object using the `objectname` keyword argument. We will query for observations around [Fomalhaut](https://www.stsci.edu/contents/media/images/2013/01/3130-Image?news=true), the brightest star in the southern constellation of [Piscis Austrinus](https://www.stsci.edu/contents/media/images/2005/10/1666-Image?client=ctns&news=true). We will set the search radius to 1 arcsecond, and we will limit our results to TESS timeseries observations.

In [None]:
obs = Observations.query_criteria(objectname='Fomalhaut',
                                  radius=___,
                                  obs_collection=___,
                                  dataproduct_type=___)
obs 

#### Other Tips and Tricks

`query_criteria()` is a powerful function, and there are some neat tricks you can use to perform more advanced queries.

##### Value Range Search

To query on fields that have a float data type, the keyword argument value should be in the form [minVal, maxVal]. Let's perform a rectangular region search of TESS timeseries observations by providing a range of values to the `s_ra` and `s_dec` keywords.

In [None]:
# Perform a rectangular region search
ra_range = 320, 320.5
dec_range = 22, 22.2
obs = Observations.query_criteria(obs_collection='TESS',
                                  dataproduct_type='timeseries',
                                  s_ra=___,
                                  s_dec=___)
obs

We can also use ranges to query on time-based keywords, prefixed by `t_`. The `astropy.Time` module can be used to convert dates into Modified Julian Date (MJD) format. 

Let's query for TESS timeseries observations for the target `375422201` in the [TESS Input Catalog (TIC)](https://tess.mit.edu/science/tess-input-catalogue/). We'll also provide a range of values to the `t_obs_release` keyword to return only observations that were released after August 1st, 2024.

In [None]:
# Get dates in Modified Julian Date (MJD) format
past_time = Time('2024-08-01').mjd
curr_time = Time.now().mjd

# Query for all observations released since a specific date
obs = Observations.query_criteria(target_name=375422201,
                                  obs_collection="TESS",
                                  dataproduct_type='timeseries',
                                  t_obs_release=___)
print(f'Number of Observations: {len(obs)}')
obs[:5]
obs

##### Multi-Valued Search

For non-float criteria, you can pass in more than one value for a keyword by supplying the arguments as a list. The `sequence_number` keyword corresponds to the TESS sector. Below is a query that returns the TESS timeseries observations around TIC ID `375422201` for sectors 15 and 83.

In [None]:
obs = Observations.query_criteria(target_name=375422201,
                                  obs_collection="TESS",
                                  dataproduct_type='timeseries',
                                  sequence_number=___)
obs

##### Wildcard Search

Non-float criteria also support the use of wildcards. These are special characters used in search patterns to represent one or more unknown characters, allowing for flexible matching of strings. The available wildcards are `%` and `*`: each replaces any number of characters preceding, following, or in between the existing characters, depending on its placement. Wildcards can only be inserted into `string` objects in Python. However, you can use wildcards with integer criteria by passing in the arguments as strings. Remember this important caveat: only one wildcarded value can be processed per criterion.

The following query demonstrates the use of wildcards. It returns TESS timeseries observations around TIC ID `375422201` where the proposal ID string starts with "G0" and contains the character "6". It also selects for observations where the sector number contains a "7". Note that although `sequence_number` is an integer field, it accepts a wildcarded string.

In [None]:
obs = Observations.query_criteria(target_name=375422201,
                                  obs_collection="TESS",
                                  dataproduct_type='timeseries',
                                  proposal_id='G0*6*',
                                  sequence_number=___)
obs

### Step 2. Fetch and Filter Data Products

Each TESS observation returned by a MAST query can have one or more [associated data products](https://archive.stsci.edu/missions-and-data/tess/data-products). For reproducibility, let's query the TESS timeseries observations for the target `375422201` in the [TESS Input Catalog (TIC)](https://tess.mit.edu/science/tess-input-catalogue/). We will also select for sectors 15, 16, and 17.

In [None]:
TESS_table = Observations.query_criteria(target_name=375422201,
                                         obs_collection="TESS",
                                         dataproduct_type='timeseries',
                                         sequence_number=[15, 16, 17]) 
TESS_table

We can use the `Observations.get_product_list()` function to return the underlying product files for the four observations above. As input, the function takes a table of observations or a list of observation IDs (`obs_id` column).

In [None]:
data_products = Observations.get_product_list(___)

print(f'Total Products: {len(data_products)}')
data_products[:5]

#### Filtering Products

This returned quite a few products! We are not interested in all of them, and luckily, we have a handy function to filter them for us. `Observations.filter_products` allows you to filter based on minimum recommended products (`mrp_only`), file extension (`extension`), and any other of the [product fields](https://mast.stsci.edu/api/v0/_productsfields.html).

A quick note on filtering: the **AND** operation is performed for a list of filters, and the **OR** operation is performed within a filter set. For example, the filter below will return products that are "SCIENCE" type **and** have a `productSubGroupDescription` of "LC" (light curves) **or** "TP" (target pixel files).

In [None]:
science_products = Observations.filter_products(data_products,
                                                productType='SCIENCE',
                                                productSubGroupDescription=[___, ___])
science_products

### Step 3. Access Data on the Cloud

TESS data is publicly available for free on [Amazon Web Services](https://registry.opendata.aws/collab/stsci/). Now that we have a table of filtered products, we can use the `Observations.get_cloud_uris` function to locate these product files in the S3 bucket.

In [None]:
Observations.get_cloud_uris(science_products)

The output is a list of S3 URIs: one for each product in the table that we passed into the function. We can now use these URIs to open the files and stream their data directly into system memory. No expensive downloading required!

We will explore some different ways to access cloud data in the afternoon session.

### Streamlined Query

In this tutorial, we walked through a 3-step workflow to query MAST observations and locate data products on the cloud. You can streamline this process by providing query criteria and product filters directly to the `get_cloud_uris()` function! Query criteria are supplied as keyword arguments, and filters are supplied through the `filter_products` parameter.

Below is the streamlined version of the walkthrough example:

In [None]:
Observations.get_cloud_uris(target_name=375422201,
                            obs_collection='TESS',
                            dataproduct_type='timeseries',
                            sequence_number=[15, 16, 17],
                            filter_products={'productType': 'SCIENCE',
                                             'productSubGroupDescription': ['LC', 'TP']})

## Exercise

To unlock your clue for this challenge, you'll need to use the workflow described in this notebook to query for observations and fetch product files, and locate data on the cloud.

First, write a query that satisfies the following criteria:
- Cone search around the M11 object with a radius of 0.2 degrees.
- Image observations.
- Wavelength region is "OPTICAL" or "INFRARED".
- Observation description (title) contains the word "survey".
- Observation exposure time is between 785 and 794 seconds.

Since this is a more complicated query, it may take up to a minute to return results.

In [None]:
# Write your criteria query
obs = Observations.query_criteria(objectname=___,
                                  radius=___,
                                  dataproduct_type=___,
                                  wavelength_region=[___, ___],
                                  obs_title=___,
                                  t_exptime=[___, ___])
obs

Next, you'll need to get the product files for your query results and filter them to only include the following:
- FITS files.
- Publicly available files (HINT: `dataRights` field).

In [None]:
# Get data products for your observations
prods = Observations.___(___)
prods

In [None]:
# Filter data products to get only FITS files with public access ("PUBLIC")
filtered = Observations.filter_products(prods, 
                                        extension=___, 
                                        dataRights=___)
filtered

Excellent job, agent! **Your clues are hidden in the second and final columns of your filtered product results. You should collect three different letters.**

If you finish this challenge early, feel free to experiment a bit more with the `astroquery.mast` module and your queries! See if you can locate any interesting datasets or data that would be relevant to your research. You can also visit our [MAST Notebooks GitHub page](https://spacetelescope.github.io/mast_notebooks/notebooks/multi_mission/astroquery.html) for more examples and inspiration.

## Additional Resources

- [`astroquery.mast` Documentation](https://astroquery.readthedocs.io/en/latest/mast/mast.html)
- [MAST Portal API](https://mast.stsci.edu/api/v0/)
- [MAST Portal UI](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html)

## About this Notebook:
If you have comments or questions on this notebook, please open a [GitHub issue on tike_content](https://github.com/spacetelescope/tike_content/issues/new) contact us through the [Archive Help Desk e-mail](mailto:archive@stsci.edu).

**Author:** Sam Bianco

**Last Updated:** September 2025

---

[Top of Page](#top)

<img style=float:right; src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/>