# **MAST Data Bulk Download through AWS**
Enhance the MAST user experience for astronomers and scientists such that the data download per mission is targeted and seamless!

# Learning Goals
By using this notebook, an astronomer/scientist will:
* Understand that downloading data and files in bulk from AWS is feasible.
* Make targeted queries to MAST using parameters such as: `right ascension`, `declination`, `observation` and more.
* Filter the resulting products by using parameters such as: `productType`, `productSubGroupDescription`, `productGroupDescription`, `mrp_only`, and more.
* Use this notebook to programmatically download *.fits* files locally, which may be much easier than the equivalent UI web tool.

# Table of Contents
* Introduction
* Imports
* Two Core Functions from Astropy: `query_criteria()` and `filter_products()`
* The 3-Step Data Download Process

# Introduction
This notebook contains some sample code to bulk download files from MAST, with examples provided for `GALEX` and  `Pan-STARRS (PS1)`. This notebook can be generalized to query data from other missions too such as: `SWIFT`, `HST`, or `IUE`. Please feel free to modify the code to your particular use case! If you have any questions, please don't hesitate to reach out to archive@stsci.edu.

Other links that maybe useful:
- [MAST Homepage](https://archive.stsci.edu/)
- [MAST Notebook Repository](https://spacetelescope.github.io/mast_notebooks/intro.html)
- [GALEX Homepage](https://galex.stsci.edu/GR6/)
- [Pan-STARRS Homepage](https://archive.stsci.edu/panstarrs/)


# Using **Observations** and the [Common Archive Observation Model (CAOM)](https://mast.stsci.edu/vo-tap/api/v0.1/caom/)
* The `Observations` API from *astroquery.mast* can be used to query the Barbara A. Mikulski Archive for Space Telescopes (MAST).

In [4]:
from astroquery.mast import Observations

import astropy.units as u
from astropy.coordinates import SkyCoord
from astroquery.mast import MastMissions

# Turning on access to the cloud dataset.
Observations.enable_cloud_dataset()

INFO: Using the S3 STScI public dataset [astroquery.mast.cloud]


# Two Core Functions from Astropy: `query_criteria()` and `filter_products()`

`query_criteria()` and `filter_products()` are two functions from Astropy that enable us to make queries and then filter the corresponding products.

All the parameters that we could use in `query_criteria()` are shown below.

In [3]:
Observations.get_metadata("observations").pprint(max_lines=-1, max_width=-1)

     Column Name             Column Label       Data Type   Units                                  Description                                                                               Examples/Valid Values                                                
--------------------- ------------------------- --------- ---------- ------------------------------------------------------------------------ --------------------------------------------------------------------------------------------------------------------
           intentType          Observation Type    string                                  Whether observation is for science or calibration.                                                                                   Valid values: science, calibration
       obs_collection                   Mission    string                                                                          Collection                                                                                  

All the filters that we could filter by in `filter_products()` is located on the **[MAST API](https://masttest.stsci.edu/api/v0/_productsfields.html)**.

# The 3-Step Data Download Process
* **STEP 1**: Get the products after making a specific query.
* **STEP 2**: Filter the products based on specific parameters.
* **STEP 3**: Download the files locally via Python.

**STEP 1**: When filtering an observation using the function `query_criteria()`, you must specify two coordinates for the right ascension and two coordinates for the declination. This forms a box to limit the search area. You must also supply a mission that you would want to search from such as 'GALEX' or 'PS1'.

If you would like to filter by other parameters, see the other filter parameters above. Please modify this code for your specific use case!

In [None]:
#    - Ex.: s_ra: 30.2,31.2
#           s_dec: -10.25,-9.25
#           obs_collection: GALEX, PS1, SWIFT, etc.

obs = Observations.query_criteria(s_ra=[30.2,31.2], s_dec=[-10.25,-9.25], obs_collection="GALEX")
prod = Observations.get_product_list(obs)
len(prod)


175841

**STEP 2**: Now we can use `filter_products()` to select specific products. Right now, this code is configured such that you can filter based on *productType*, *productSubGroupDescription*, *productGroupDescription*, and *mrp_only*. The valid filter parameters for GALEX and Pan-STARRS are outlined below as examples. Please use only these parameters + corresponding values, unless you see another parameter in the documentation (see above) that you would like to use. Please use the right filter products for your specific mission by referring to the documentation (see above)!

**GALEX Example**
* productType: *AUXILIARY*, *CATALOG*, *INFO*, *PREVIEW*, *SCIENCE*, *THUMBNAIL*
* productSubGroupDescription: *Catalog Only*, *Imaging Only*, *Spectra Only*, *Spectral Image Strips Only*, *Whole Field Images Only*
* productGroupDescription: *Minimum Recommended Products*
* mrp_only: *True*, *False*.

**Pan-STARRS (PS1) Example**
* productType: *AUXILIARY*, *CATALOG*, *INFO*, *SCIENCE*
* productSubGroupDescription: - 
* productGroupDescription: *Minimum Recommended Products*
* mrp_only: *True*, *False*

Note that *productSubGroupDescription* and *productGroupDescription* may not be needed when filtering for Pan-STARRS products. An example for 'GALEX' is provided below as well as an example for PS1. Please modify this code for your specific use case!


In [None]:
#    - Ex. (GALEX): productType: SCIENCE
#           productSubGroupDescription: Imaging Only
#           productGroupDescription: Minimum Recommended Products
#           mrp_only: True

#    - Ex. (PS1): productType: <skip>
#                 productSubGroupDescription: <skip>
#                 productGroupDescription: <skip>
#                 mrp_only: True

# Use this for the 'GALEX' example.
filt_prod = Observations.filter_products(
    prod,
    productType="SCIENCE",
    productSubGroupDescription="Imaging Only",
    productGroupDescription="Minimum Recommended Products",
    mrp_only=True
)

# Shows how many files are left after applying the filter.
display(len(filt_prod))

# Shows the first 5 files from the filtered table.
display(filt_prod[0:5])

1785

obsID,obs_collection,dataproduct_type,obs_id,description,type,dataURI,productType,productGroupDescription,productSubGroupDescription,productDocumentationURL,project,prvversion,proposal_id,productFilename,size,parent_obsid,dataRights,calib_level,filters
str7,str3,str5,str43,str32,str1,str71,str9,str28,str1,str1,str3,str3,str1,str54,int64,str7,str10,int64,str1
1971976,PS1,image,rings.v3.skycell.1062.040.stk.g,stack data image,C,mast:PS1/product/rings.v3.skycell.1062.040.stk.g.unconv.fits,SCIENCE,Minimum Recommended Products,--,--,3PI,pv3,--,rings.v3.skycell.1062.040.stk.g.unconv.fits,66807360,1971976,PUBLIC,3,g
1971977,PS1,image,rings.v3.skycell.1062.040.stk.i,stack data image,C,mast:PS1/product/rings.v3.skycell.1062.040.stk.i.unconv.fits,SCIENCE,Minimum Recommended Products,--,--,3PI,pv3,--,rings.v3.skycell.1062.040.stk.i.unconv.fits,67435200,1971977,PUBLIC,3,i
1971978,PS1,image,rings.v3.skycell.1062.040.stk.r,stack data image,C,mast:PS1/product/rings.v3.skycell.1062.040.stk.r.unconv.fits,SCIENCE,Minimum Recommended Products,--,--,3PI,pv3,--,rings.v3.skycell.1062.040.stk.r.unconv.fits,67639680,1971978,PUBLIC,3,r
1971979,PS1,image,rings.v3.skycell.1062.040.stk.y,stack data image,C,mast:PS1/product/rings.v3.skycell.1062.040.stk.y.unconv.fits,SCIENCE,Minimum Recommended Products,--,--,3PI,pv3,--,rings.v3.skycell.1062.040.stk.y.unconv.fits,67884480,1971979,PUBLIC,3,y
1971980,PS1,image,rings.v3.skycell.1062.040.stk.z,stack data image,C,mast:PS1/product/rings.v3.skycell.1062.040.stk.z.unconv.fits,SCIENCE,Minimum Recommended Products,--,--,3PI,pv3,--,rings.v3.skycell.1062.040.stk.z.unconv.fits,67216320,1971980,PUBLIC,3,z


**STEP 3**: Download the files to your local computer. The line below will download the first five files only. Please modify this code for your specific use case, especially if you need to download more than five files!

In [None]:
Observations.download_products(filt_prod[0:5], cloud_only=True)

# Using **MastMissions** and the [MAST Search API](https://mast.stsci.edu/search/docs/)
We can also query via `MastMissions`, which is a mission specific way to query MAST data.

In [None]:
# Modify this example from hst -> galex, ps1.

# First select the mission that you are interested in.
missions = MastMissions(mission='hst')
missions.mission

'hst'

# Query by Object Name (Target Search)

In [15]:
# Let's take a look at 'M16', which is the Eagle Nebula.
results = missions.query_object('M16')

display(len(results))

display(results[0:5])

265

search_pos,sci_data_set_name,sci_targname,sci_hapnum,sci_haspnum,sci_instrume,sci_aper_1234,sci_spec_1234,sci_actual_duration,sci_start_time,sci_pep_id,sci_pi_last_name,sci_ra,sci_dec,sci_refnum,sci_central_wavelength,sci_release_date,sci_stop_time,sci_preview_name,scp_scan_type,sci_hlsp,ang_sep
str15,str9,str16,int64,int64,str6,str10,str14,float64,str26,int64,str14,float64,float64,int64,float64,str26,str26,str9,str1,int64,str19
274.688 -13.792,N9CS020E0,NGC6611-NIC13,0,0,NICMOS,NIC2-FIX,F160W,127.83688,2006-05-18T10:06:48.817000,10533,OLIVEIRA,274.6806591846,-13.79206426926,5,16030.4,2007-05-18T18:56:50.210000,2006-05-18T10:13:50.777000,N9CS020E0,--,--,0.4277670361974681
274.688 -13.792,N9CS020D0,NGC6611-NIC13,0,0,NICMOS,NIC2-FIX,F110W,127.83688,2006-05-18T10:05:55.817000,10533,OLIVEIRA,274.6806591778,-13.79206427886,5,11234.7,2007-05-18T18:56:35.373000,2006-05-18T10:12:57.777000,N9CS020D0,--,--,0.4277674376002503
274.688 -13.792,N9CS02060,NGC6611-NIC13,0,0,NICMOS,NIC2-FIX,F160W,127.83688,2006-05-18T08:43:10.817000,10533,OLIVEIRA,274.6820655545,-13.79948205104,5,16030.4,2007-05-18T18:37:41.323000,2006-05-18T08:50:12.777000,N9CS02060,--,--,0.566662194223732
274.688 -13.792,N9CS02050,NGC6611-NIC13,0,0,NICMOS,NIC2-FIX,F110W,127.83688,2006-05-18T08:42:17.817000,10533,OLIVEIRA,274.6820628711,-13.79948025557,5,11234.7,2007-05-18T18:37:15.127000,2006-05-18T08:49:19.780000,N9CS02050,--,--,0.5666722973050055
274.688 -13.792,N9CSA2030,NGC6611-NIC13,0,0,NICMOS,NIC2-FIX,F110W,127.83688,2006-05-18T14:50:39.817000,10533,OLIVEIRA,274.6865879409,-13.80250798823,5,11234.7,2007-05-18T20:11:02.347000,2006-05-18T14:57:41.780000,N9CSA2030,--,--,0.6358254152213503


# Query by Region (Conesearch)

In [16]:
# Let's do a conesearch at coordinate 0,0
coords = SkyCoord(0, 0, unit=('deg'))

# All results within 5 arcseconds of the (0,0) coordinate
results = missions.query_region(coords, radius=5 * u.arcsec)

display(len(results))

display(results[0:5])

5000

search_pos,sci_data_set_name,sci_targname,sci_hapnum,sci_haspnum,sci_instrume,sci_aper_1234,sci_spec_1234,sci_actual_duration,sci_start_time,sci_pep_id,sci_pi_last_name,sci_ra,sci_dec,sci_refnum,sci_central_wavelength,sci_release_date,sci_stop_time,sci_preview_name,scp_scan_type,sci_hlsp,ang_sep
str3,str9,str11,int64,int64,str6,str15,str17,float64,str26,int64,str9,float64,float64,int64,float64,str26,str26,str9,str1,int64,str3
0 0,W1460G01T,53W045A,0,0,WFPC,W1,F555W,1400.0,1992-09-28T04:29:18.040000,3545,WINDHORST,0.0,0.0,4,5479.0,1993-09-28T13:54:30,1992-09-28T04:52:38.040000,W1460G01T,--,--,0.0
0 0,W1460G02T,53W045A,0,0,WFPC,W1,F555W,1400.0,1992-09-28T06:05:18.040000,3545,WINDHORST,0.0,0.0,4,5479.0,1993-09-28T14:06:07,1992-09-28T06:28:38.040000,W1460G02T,--,--,0.0
0 0,W1460G03T,53W045A,0,0,WFPC,W1,F555W,1400.0,1992-09-28T07:41:18.040000,3545,WINDHORST,0.0,0.0,4,5479.0,1993-09-28T14:14:37,1992-09-28T08:04:38.040000,W1460G03T,--,--,0.0
0 0,W1460G04T,53W045A,0,0,WFPC,W1,F555W,1400.0,1992-09-28T09:18:18.040000,3545,WINDHORST,0.0,0.0,4,5479.0,1993-09-28T14:23:42,1992-09-28T09:41:38.040000,W1460G04T,--,--,0.0
0 0,W1460H01T,53W045A,0,0,WFPC,W1,F785LP,1400.0,1992-09-28T12:31:18.040000,3545,WINDHORST,0.0,0.0,4,8958.0,1993-09-28T14:33:28,1992-09-28T12:54:38.040000,W1460H01T,--,--,0.0


# About this Notebook

* **Authors**: Yingquan Li, Bernie Shao
* **Keywords**: GALEX, Pan-STARRS, Bulk Download, Python, AWS
* **Updated On**: 2024-11-08
* **References**: [Missions Mast Search (Sam Bianco)](https://github.com/spacetelescope/mast_notebooks/blob/main/notebooks/multi_mission/missions_mast_search/missions_mast_search.ipynb)

For support, please contact the Archive HelpDesk at archive@stsci.edu.