<a id="top"></a>
# How do I work with data in the cloud?
***
This Notebook will answer some "first-time" questions about working with cloud data. We'll then cover a basic example of cloud access syntax that you can copy for your own use.

By the end of this tutorial, you will be able to:

- Describe the basic workflow for accessing data in the cloud
- Apply this cloud workflow to your own data queries

## Notebook TOC
- [Introduction](Introduction)
- [Imports and setup](Imports-and-Setup)
- [A Quick Query](A-Quick-Query)

## Introduction

### What is "the cloud"? 

In this case, "the cloud" is the AWS East Datacenters in northern Virginia. By storing a cloud copy of MAST data here, we're able to offer our data in a new, highly accessible, highly available format. Cloud hosted data also permits users to interact with our data in new ways, as we'll see in the example below.

### What datasets are available?

The [MAST Archive](https://archive.stsci.edu/) offers a cloud copy of several mission datasets, including data from TESS, HST, GALEX, and more. They are generally cataloged in full on the [MAST Public Datasets](https://registry.opendata.aws/collab/stsci/) page, with a more condensed listing available on the [Public AWS Data](https://outerspace.stsci.edu/display/MASTDOCS/Public+AWS+Data) page.

### How can I access cloud-hosted data?

There are two approaches to accessing cloud-hosted data:
1. While on TIKE, loading files directly into memory (recommended)
2. A traditional download to your local machine from the cloud-hosted copy of MAST

Whenever possible, it's best to use the first method. The vast majority of users, with small tweaks to existing code, should be able to access data this way.

## Imports and Setup

We'll use the standard tools to open and plot a fits file:
- `astropy.io fits` to read in the fits file
- `matplotlib` to create the plot
- `numpy` to automatically set brightness limits in the plot

To access the cloud data, we need
- `astroquery.mast` to search for and select data
- `s3fs` to access cloud files as though they were local

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import s3fs

from astropy.io import fits
from astroquery.mast import Observations

The most important step in this process is to enable cloud data access. Once we do
- Get cloud URIs
- Download from the cloud

In [None]:
Observations.enable_cloud_dataset()

## A Quick Query

Now we can begin our query. This is not a particularly interesting query, but makes for a nice, quick example. We'll look for a particular HST Observation, then keep only the minimum recommended science files from that observation.

In [None]:
# You likely wouldn't search on obs_id, but it makes this example reproducable
obs = Observations.query_criteria(obs_id="ibxl50020")

# Get the products, then filter to keep science and MRP (minimum recommended products)
prod = Observations.get_product_list(obs)
filtered = Observations.filter_products(prod, mrp_only=True, productType='SCIENCE')

filtered

TD: "something about the results"

Now let's get a cloud URI

In [None]:
c_uri = Observations.get_cloud_uris(filtered)[0]

## Loading files directly into memory
Set up the filesystem so Python can read it. Note: must be anonymous to access data

In [None]:
fs = s3fs.S3FileSystem(anon=True)

using "with" syntax to avoid timeouts on s3 file

In [None]:
with fs.open(c_uri, 'rb') as f:
    with fits.open(f, 'readonly') as ff:
        ff.info()
        sci = ff[1].data
        plt.imshow(sci, cmap='gray', norm='log', vmin=np.nanpercentile(sci, 1), vmax=np.nanpercentile(sci,99))
        plt.colorbar()

wow! we plotted data without downloading! we read directly from the AWS filesystem, super cool

## Also you could just download directly

## Other methods and notes/caveats

### AWS command-line tool
### Integrated methods
e.g. lightcurve will do a download, which clogs up your storage on TIKE
status of astrocut

***

## About this Notebook

open a PR or issue or send an email if you must


**Author:** Thomas Dutkiewicz <br>
**Keyword(s):** TIKE, AWS, Cloud <br>
**Last Updated:** Dec 2023 <br>
***
[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 