<a id="top"></a>
# Astroquery: Exploring Metadata from the Nancy Grace Roman Space Telescope
***
## Learning Goals

By the end of this tutorial, you will:

- Understand how to use the `astroquery.mast` module to access metadata from the Roman Space Telescope.
- Run metadata queries based on coordinates, an object name, or non-positional criteria.
- Use optional search parameters to further refine query results.

## Table of Contents
* [Introduction](#Introduction)
* [Querying MAST for Roman Metadata](#Querying-MAST-for-Roman-Metadata)
    * [Setup](#Setup)
    * [Optional Search Parameters](#Optional-Search-Parameters)
    * [Query by Criteria](#Query-by-Criteria)
    * [Query by Region](#Query-by-Region)
    * [Roman Product Downloads](#Downloading-and-Opening-Roman-Data-Products)
* [Additional Resources](#Additional-Resources)

## Introduction

Welcome! This tutorial focuses on using the `astroquery.mast` module to search for metadata from the [Nancy Grace Roman Space Telescope](https://roman.gsfc.nasa.gov/). Roman is an advanced survey telescope designed for observations in the infrared light spectrum.

The [Mikulski Archive for Space Telescopes (MAST)](https://archive.stsci.edu/) hosts publicly accessible data products from space telescopes like Roman. `astroquery.mast` provides access to a broad set of Roman metadata, including header keywords, proposal information, and observational parameters. The available metadata can also be found using the [MAST Roman Search](https://mast.stsci.edu/search/ui/#/roman) interface.

<div class="alert alert-info">
Please note that pre-launch, <b><code>astroquery.mast.MastMissions</code> and the MAST Roman Search API require authorization to search and download Roman data products.</b> Before we get started, please ensure that:
    
- ***you are authorized to search and download Roman engineering data from MAST.*** If you are not authorized but you think you should be, email the helpdesk at archive@stsci.edu
- ***you have a [MAST token](https://auth.mast.stsci.edu/token) set to the environment variable `MAST_API_TOKEN`***
</div>

<div class="alert alert-warning" style="color:black; background-color:#ffc5c5; border-color:red;">
<b>Note</b> that at this time, Roman data are not accessible from the cloud with <code>astroquery.mast</code>. Downloads will come from MAST servers and may be large. Download with caution.
</div>

## Imports

This notebook uses the following packages:
- `os` to get the `MAST_API_TOKEN` from the environment variables
- `astroquery.mast` to query the MAST Archive
- `astropy.coordinates` for astronomical coordinates
- `roman_datamodels` to read and interact with Roman data files
- `matplotlib` to visualize Roman WFI image data

In [None]:
import os
from astroquery.mast import MastMissions
from astropy.coordinates import SkyCoord

import roman_datamodels as rdm
import matplotlib.pyplot as plt

***

## Querying MAST for Roman Metadata

### Setup

In order to make queries on Roman metadata, we will have to perform some setup. First, we will instantiate an object of the `MastMissions` class, assign its `mission` to be `'roman'`, and login with our MAST token. Its `service` is set to the default of `'search'`.

In [None]:
# Create MastMissions object and assign mission to 'roman'
missions = MastMissions(mission='roman')

# Login to search and retrieve Roman data
missions.login(token=os.getenv("MAST_API_TOKEN"))
               
print(f'Mission: {missions.mission}')
print(f'Service: {missions.service}')

When writing queries, keyword arguments can be used to specify output characteristics (see the following section) and filter on values like instrument, exposure type, and proposal ID. The available column names for a mission are returned by the `get_column_list` function. Below, we will print out the name, data type, and description for the first 10 columns in Roman metadata.

In [None]:
# Get available columns for Roman mission
columns = missions.get_column_list()
columns[:10]

### Optional Search Parameters

Before we dive in to the actual queries, it's important to know how we can refine our results with optional keyword arguments. The following parameters are available:

- `limit`: The maximum number of results to return. Default is 5000.
- `offset`: Skip the first ***n*** results. Useful for paging through results.
- `select_cols`: A list of columns to be returned in the response.

### Query by Criteria

In some cases, we may want to run queries with non-positional parameters. To accomplish this, we use the `query_criteria` function.

For any of our query functions, we can filter our results by the value of columns in the dataset.

Let's say that we only want observations from the test Galactic Bulge Time Domain Survey (GBTDS), which was executed during Mission Readiness Test (MRT) 8. The test GBTDS run corresponds to program 163.

In [None]:
# Query with column criteria
results = missions.query_criteria(
    program=163,
    select_cols=[
        'fileSetName', 'detector', 'productLevel', 
        'product_type', 'exposure_type', 'instrument_name', 
        'optical_element', 'exposure_time', 'program',
    ],
)

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

To exclude and filter out a certain value from the results, we can prepend the value with `!`.

Let's run the same query as above, but this time, we will filter out datasets coming from the WFI02 detector.

In [None]:
# Query with exclude criteria
results = missions.query_criteria(
    program=163,
    detector='!WFI02',
    select_cols=[
        'fileSetName', 'detector', 'productLevel', 
        'product_type', 'exposure_type', 'instrument_name', 
        'optical_element', 'exposure_time', 'program',
    ],
)

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

For columns with numeric or date values, we can filter using comparison values:

- `<`: Return values less than or before the given number/date
- `>`: Return values greater than or after the given number/date
- `<=`: Return values less than or equal to the given number/date
- `>=`: Return values greater than or equal to the given number/date

As an example, let's write a query to return all Program 163 datasets with an exposure start time after 12 PM UTC on September 12, 2025.

In [None]:
# Query using comparison operator
results = missions.query_criteria(
    program=163,
    exposure_start_time='> 2025-09-12 12:00:00',
    select_cols=[
        'fileSetName', 'detector', 'productLevel', 
        'product_type', 'optical_element', 'exposure_start_time',
    ],
)

# Display the first 5 results
print(f'Total number of results: {len(results)}')
results[:5]

For numeric or date data types, we can also filter with ranges. This requires the following syntax: `'#..#'`.

Let's write a query that uses range syntax to return datasets that belong to a program number between 161 and 164. We will also select for exposure durations that are greater than or equal to 100 seconds.

In [None]:
# Query using range operator
results = missions.query_criteria(program='161..164', # Program number between 161 and 164
                                  exposure_time='>100',  # Exposure duration is greater than or equal to 100 seconds
                                  select_cols=['fileSetName', 'program', 'exposure_time'])

# Display results
print(f'Total number of results: {len(results)}')
results

### Query by Region

The `missions` object also allows us to query by a region in the sky. By passing in a set of coordinates to the `query_region` function, we can return datasets that fall within a certain `radius` value of that point. This type of search is also known as a cone search.

In [None]:
# Create coordinate object
coords = SkyCoord(0.72104898, -0.02830701, unit=('deg'))

# Query for results within 1 arcminute of coords
results = missions.query_region(coords, radius=1)

# Display results
print(f'Total number of results: {len(results)}')
results[:5]

369 Roman datasets fall within our cone search. In other words, their detector footprints fall within 1 arcminute of the coordinate that we defined.

### Query by Object Name

While not yet useful for Roman, the `MastMissions` class also allows you to query by object name, e.g., "M31". Here is an example, but note that this is just code rendering for now.

```python
# query a radius of 1 arcminute around the position of M31
result = missions.query_object(
    "M31",
    radius=1,
    select_cols=[
        'fileSetName', 'detector', 'productLevel', 
        'product_type', 'exposure_type', 'instrument_name', 
        'optical_element', 'exposure_time', 'program',
    ])
```

For more information on the kinds of searches you can do with the `MastMissions` class, see the [`astroquery.mast` documentation for Mission-Specific Searches](https://astroquery.readthedocs.io/en/latest/mast/mast_missions.html#mission-specific-search-queries).

### Downloading and Opening Roman Data Products

We can also use the `MastMissions` class to download data products. Note that during science operations, downloads are discouraged in favor of running notebooks on the Roman Research Nexus and streaming data directly from the cloud. For now we will demonstrate data downloads, but:

<div class="alert alert-warning" style="color:black; background-color:#ffc5c5; border-color:red;">
Roman files can be hefty, hundreds of MB to GB sizes. <b>Proceed with caution.</b>
</div>

First we will reuse a query from earlier, getting the top 5 results under program 163 (the test GBTDS).

In [None]:
# Query with column criteria
results = missions.query_criteria(
    program=163,
    select_cols=[
        'fileSetName', 'detector', 'productLevel', 
        'product_type', 'exposure_type', 'instrument_name', 
        'optical_element', 'exposure_time', 'program',
    ],
    limit=5,
)

results

Each row in a Roman `MastMissions` query result corresponds to a *dataset*, which is a set of files associated with a single Roman visit. Each dataset is expected to have several files across different Roman calibration levels.

Let's will query the data products associated with the first result of our query. 

In [None]:
products = missions.get_product_list(results[0])
products

This one dataset corresponds to 8 files, including:
- one Level 1 uncalibrated single-detector WFI image in ASDF format
- four Level 2 files:
  - a calibrated single-detector WFI image in ASDF format
  - the world coordinate system (WCS) information in ASDF format
  - preview and thumbnail images in PNG format
- two Level 4 files:
  - a source catalog, derived from the WFI image, in parquet format
  - a segmentation map in ASDF format
- one calibration reference file in ASDF format

As an example, let's take a look at the Level 2 calibrated image, which has the suffix `_cal`. We can filter the data products using `MastMissions.filter_products`.

In [None]:
filtered_products = missions.filter_products(products, file_suffix="_cal")
filtered_products

This table now contains the single file that we want. The file is 260 MB. We can download it using `MastMissions.download_products`, but note that **this will eventually be replaced with streaming directly from cloud storage.**

In [None]:
files = missions.download_products(filtered_products)
files

We can now use `roman_datamodels` to open the file and show its structure.

In [None]:
filepath = files[0]["Local Path"]
imfile = rdm.open(filepath)
imfile.info()

Finally, we can plot the WFI image data using `matplotlib`.

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))
sc = ax.imshow(imfile.data, origin='lower', vmin=0.5, vmax=1)
ax.set_xlabel('X Axis (pixels)')
ax.set_ylabel('Y Axis (pixels)')
ax.set_title(os.path.basename(filepath))
plt.colorbar(sc, ax=ax)
plt.tight_layout()

MRT8 was performed on the live observatory during Thermal Vacuum testing, so this is a REAL Roman image! This also means that the image does not contain on-sky data, so we don't expect to see any astronomical objects.

Note that the `jdaviz` package has much fancier data visualization utilities that are designed to work with Roman and other STScI-hosted missions! See [STScI Roman Notebooks](https://github.com/spacetelescope/roman_notebooks/tree/main/notebooks/data_visualization) for examples.

## Additional Resources

- [MAST Roman Search Form](https://mast.stsci.edu/search/ui/#/Roman)
- [MAST Roman Search API](https://mast.stsci.edu/search/docs/?urls.primaryName=roman_api)
- [`astroquery.mast` Documentation for Mission-Specific Searches](https://astroquery.readthedocs.io/en/latest/mast/mast_missions.html#mission-specific-search-queries)

## Citations

If you use `astroquery` for published research, please cite the
authors. Follow these links for more information about citing `astroquery`:

* [Citing `astroquery`](https://github.com/astropy/astroquery/blob/main/astroquery/CITATION)

## About this Notebook

**Authors:** Zach Claytor and Sedona Price, adapted from [JWST MAST Metadata Search](https://spacetelescope.github.io/mast_notebooks/notebooks/JWST/MAST_metadata_search/MAST_metadata_search.html) by Sam Bianco <br>
**Keywords:** Roman, Astroquery, MastMissions <br>

***
[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/style-guides/master/guides/images/stsci-logo.png" alt="Space Telescope Logo" width="200px"/> 