<a id=top></a>
# Lesson 0: Working on TIKE, with Cloud Data

## Learning Goals: 
- Learn about TIKE and working on a cloud platform.
- Define cloud terminology: what is a “bucket” or a server? For that matter, what is the “cloud”?
- Access data through astroquery by name, region, or criteria
- Download TESS data and show an image

<!-- ## Lesson Outline:
- Go over TIKE: what can it do? What are its limits (memory, processing power, etc.)?
- Cloud overview. What does it mean to work “on the cloud”? How is this different from working on a laptop? [Provide motivation: why are we doing this?]
- Demo cloud data access using astroquery. Meaningfully interact with data (maybe just read FITS headers) so people see they don’t need to perform a download.
 -->

## What is TIKE?

TIKE stands for the *Timeseries Integrated Knowledge Engine*.

TIKE uses a web-based platform, called JupyterHub, to allow you to run [Jupyter Notebooks](https://jupyterlab.readthedocs.io/en/latest/) and other software "on the cloud" using your web browser: no need to install anything on your local computer. TIKE has access to a cloud copy of the [MAST Archive](https://archive.stsci.edu), enabling researchers (or students!) to access and analyze data from NASA's [TESS mission](https://archive.stsci.edu/missions-and-data/tess). 

TIKE is continually maintained and updated by humans, so if you run into issues please let us know. Don't hesitate to send us your suggestions for packages and tutorials, either through the [MAST help desk](mailto:archive@stsci.edu) or [Project TIKEBook repository](https://github.com/spacetelescope/tike_content).

## What is the "cloud"?

The "cloud" is a term used to describe a global network of servers, each with a unique function. The cloud is not a physical entity, but instead is a vast network of remote servers around the globe which are hooked together and meant to operate as a single data ecosystem. In other words, the cloud refers to servers that are accessed over the internet, and the software and databases that run on those servers. 

In our case, TIKE is a cloud service that runs "in proximity to" MAST data. In practice, this means that the data is not transmitted over the internet, but rather within a data center. This leads to faster access since you don't need to perform a traditional download to move the data to your machine.

### Why would I want to work on the cloud?
Using the cloud has several benefits; principally, as mentioned above, there's no need to download data to your local machine. This saves time, and allows you to perform analyses that wouldn't be possible without a major upgrade to your hard drive capacity. You can access data whenever and wherever you want to, from any device, as long as you have an internet connection.

### What's the difference between working on the cloud and working on TIKE?
Although you might choose to work directly with data stored on the cloud, it can be complex to configure such a system. TIKE handles this complexity, making it as easy as opening a Jupyter Notebook.

## How do I access the cloud?
There are many options for accessing data on the cloud, including command line tools and various Python packages. For this lesson, we will be using a Python data query package called [Astroquery](https://astroquery.readthedocs.io/en/latest/). 

Astroquery is a Python package inside the [Astropy Project](https://www.astropy.org/), which is a community-driven "core package" for doing astronomy with Python. Astroquery is a package within this ecosystem that allows users to access a variety of astronomical data archives. We will use it to access TESS data on the cloud hosted by MAST. 

## Astroquery demonstration

### Imports
The following cell holds the imported packages. These packages are necessary for running the rest of the cells in this notebook, and you can expect to use some of these packages almost everytime you do astronomical research. A description of each import is as follows:

* `numpy` to handle array mathematics
* `pandas` to handle date conversions
* `fits` from astropy.io for accessing FITS files
* `Table` from astropy.table for creating tidy tables of the data
* `WCS` from astropy.wcs for storing World Coordinate Systems information 
* `SkyCoord` from astropy.coordinates for defining RA and Dec for targets
* `matplotlib.pyplot` for plotting data and images
* `Mast` and `Observations` from astroquery.mast for querying data and observations from the MAST archive

In [None]:
from astropy.io import fits
from astropy.table import Table
from astropy.wcs import WCS
from astropy.coordinates import SkyCoord
from astroquery.mast import Mast, Observations
from astropy.io import ascii
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

### Querying TESS data from MAST
We will be using the Observations class in the astroquery.mast subpackage from Astroquery. We will show you a few different ways to access data on an object in the TESS data products. 
Accessing data can be done with the following functions:
- `query_object()`
- `query_region()`
- `query_criteria()`

We'll discuss some of their differences and similarities below. But first, we need to enable cloud data access. Fortunately, this is a one-line command.

In [None]:
Observations.enable_cloud_dataset()

### Choosing a target
First, we have to choose a star from which we'd like to get TESS data. It is best practice to reduce the size of the data you are accessing by selecting only data within a certain radius of the target; by default, this is 0.2 degrees.

Let's start with the star Fomalhaut, the brightest star in the southern constellation of Piscis Austrinus, the "Southern Fish", and one of the brightest stars in the night sky.

<img src="https://upload.wikimedia.org/wikipedia/commons/a/ae/Heic0821f.jpg" width="300">

It is located at Right ascension: 22h 57m 39.0465s, Declination: −29° 37′ 20.050″ with an aparant magnitude in the V-band of 1.16. 

### query_region( )
This function allows us to choose a specific region of the sky and query all the data that TESS has inside that region, defined by some radius from a set of coordinates. So, we will use the coordinates for Fomalhaut with a 2 arcsecond radius around it. 

TESS requires your coordinates to be converted into degrees. For the Right Ascension, we know that there are 15° in each hour (24 hr a day to rotate 360° = 15° per hour), so let's convert everything to hours and then mulitply by 15. We can do this by dividing our seconds by 3600, dividing our minutes by 60, leaving the hours alone, and adding them all up. For the declination, we can convert to degrees by dividing our arcseconds by 3600, dividing our arcminutes by 60, leaving the degrees alone, and then adding all three values together. So, for Fomalhaut this would be:

$ RA = (22h + 57m/60 + 39.0465s/3600) * 15 = 344.41269272° $

$ DEC = -29° + 37'/60 + 20.050"/3600 = -29.62223703° $

#### Packages for unit conversion
With all that being said, there are robust packages out there for unit conversions. Astropy.units is capable of storing variables with units built-in to the data object and will provide simple conversion options. We will provide an example of this down the line.

In [None]:
coordinates = SkyCoord(344.41269272, -29.62223703, unit="deg") #defaults to ICRS frame

observations = Observations.query_region(coordinates, radius = "2s")
#Parse the query for only TESS observations
obs_wanted = (observations['obs_collection'] == 'TESS') #observations['dataproduct_type'] == 'timeseries')

#Let's take a peek at the data
print( observations[obs_wanted]['obs_collection', 'project','obs_id'] )

### query_object()
This type of query is the "simplest", in the sense that we only need to give the name of our target. Let's try it.

In [None]:
# Query MAST for TESS observations of Fomalhaut 
obs = Observations.query_object("Fomalhaut", radius="1s")

# Print out the observations we have queried, with limited columns
obs['obs_collection', 'project','obs_id']

We limit the columns in the `obs` output above for simplicity. Feel free to print the full table if you're curious.

Now let's limit the results to those from the TESS mission.

In [None]:
#Let's parse these observations for just those from TESS
want = (obs['obs_collection'] == "TESS")

# Print out the observations we have queried 
obs[want]['obs_collection', 'project','obs_id']

<div class="alert-success">
    <b>Using this query method, how many TESS observations are within 1 arcsecond of Trappist-1?</b>
</div>

ANS: 
<div class="alert-danger">
    <b></b>
</div>

 <div class="alert-success">
    <b>Try to query a larger region, does TESS have more observations when the search radius is larger?</b>
</div>

ANS: 
<div class="alert-danger">
    No, TESS only has 4 observations on this region of the sky, even when the search radius is made larger around these coordinates, no other observations show up. 
</div>

 <div class="alert-success">
    <b>In what scenarios would query_region be a better choice than query_object?</b>
</div>

ANS: 
 <div class="alert-danger">
    query_region allows you to not specify the object name if you only have or would only like to input the coordinates with a search radius. query_object is just a wrapper function for this which allows you to use an object name if you have one. 
</div>

### query_criteria( )
This function allows us to give astroquery a larger number of parameters to refine the data to exactly what we want in just one line of code. 

In [None]:
TESS_table = Observations.query_criteria(objectname="Fomalhaut",radius="2 arcsec", obs_collection="TESS") #dataproduct_type="timeseries"

# Let's print out some relevant columns of this table
columns = ["instrument_name","filters","target_name","obs_id","calib_level","t_exptime"]
TESS_table[columns].show_in_notebook(display_length=10)

<div class="alert-success">
    <b>How could you tell if the query produced no data?</b>
</div>

ANS: 
<div class="alert-danger">
    The table length would = 0
</div>

<div class="alert-success">
    <b>Which query function is the most general/broad/unspecific/etc. (to choose right word)? How can you tell?</b>
</div>

ANS: 
<div class="alert-danger">
    query_object is the most general, if you don't 
    parse the results it will return any observation that MAST has from any mission for that target name (not just TESS). 
</div>

### Plot a Full Frame Image
We can see from our observation table that some of the data TESS has on Fomalhaut are Full Frame Images. These will be stored in FITS files which can be downloaded and viewed on TIKE. 

#### What is a FITS file?
File Image Transport System (FITS) is a file format designed to store, transmit, and manipulate scientific images and associated data. It is the most widely used file type in astronomical research. A FITS file consists of one or more Header + Data Units (HDUs), where the first HDU is called the primary HDU, or primary array. The primary array may be empty or contain an N-dimensional array of pixels, such as a 1-D spectrum, a 2-D image, or a 3-D data cube. 

For more info on FITS files, see the [NASA FITS Support Office](https://fits.gsfc.nasa.gov/). 

In [None]:
# Query the observations from MAST to get a list of products for our selected observations
data_products = Observations.get_product_list(TESS_table)

# Get the minimum required products
filtered = Observations.filter_products(data_products, dataproduct_type ='image')

#check out what this filter returned
filtered

Let's plot one of the calibrated full frame images of Fomalhaut from TESS. We can choose a Calibrated FFI data product at random, how about obsID 60895527. Let's see how that one looks!

We need to filter our data again so that we don't have to download all 9760 data products, we just want the data for obsID 60895527 with a description of "Calibrated full frame image". 

In [None]:
filtered2 = Observations.filter_products(data_products, dataproduct_type ='image', obsID = '60895527', description = 'Calibrated full frame image')

#Print out the data products we have filtered for
filtered2

Download the FITs file

In [None]:
data = Observations.download_products(filtered2)
#Take a peek at the FITS file we downloaded
fits_file = data['Local Path'][0]

### Understanding the FITS FFI structure

TESS FFI FITS files contain a primary HDU with metadata stored in the header.  The first extension HDU contains more metadata in the header, and stores the full frame image.  The second extension HDU contains the uncertainty values for the image.  Let's examine the structure of the FITS file using the astropy.fits `info` function, which shows the FITS file format in more detail.

In [None]:
fits.info(fits_file)

<div class="alert-success">
    <b>What are the dimensions of the Full Frame Image?</b>
</div>

ANS:
<div class="alert-danger">
    <b>2136 pixels by 2078 pixels</b>
</div>

### Reading the WCS and Calibrated Image

Now that we have the file, let's store the world coordinate system information for use later. World Coordinate Systems (WCSs) describe the geometric transformations between one set of coordinates and another, almost all the FITS files you come across will have a WCS in its metadata. We can use the astropy.wcs WCS function to store the information from the image extension HDU's header. 

In [None]:
# open the file, extract the WCS and Image data, and then close the file
with fits.open(fits_file, mode = "readonly") as hdulist:
    wcs = WCS(hdulist[1].header)
    image = hdulist[1].data
    header = hdulist[1].header
    
#Take a peek at the header
header

### Display the Image

Finally, let's plot our full frame image of Fomalhaut. 

In [None]:
plt.figure(figsize = (12,12))
# plt.subplot(projection = wcs) #TO FIX -this is not working, i think its something to do with my versions of matplotlib and astropy
plt.imshow(image, vmin = np.percentile(image,4),vmax = np.percentile(image, 98),origin = "lower", cmap = 'inferno')
plt.ylim(1100,1900)
plt.xlim(200,1000)
plt.xlabel('RA')
plt.ylabel('Dec')
plt.show()

## Homework
1. Choose a star from this list:
    - Proxima Centauri
    - Procyon (* alf CMi)
    - Aldebaran (* alf Tau)
    - Polaris (alf UMi)
    - Rigel (* bet Ori)
    
Next, query TESS observations for it, download a fits file for 1 calibrated FFI, and then print out the FITS Header. Answer the following questions using the code above as an example. HINT: you may need to find the coordinates for your chosen star, [SIMBAD](https://simbad.u-strasbg.fr/simbad/sim-fid) is a reliable source. 


- What is the date of this observation?
- What are CRVAL1 and CRVAL2?
- How many TESS observations of this star are there?
- What are the exposure times of these observations?

2. Using the data you queried, plot a FITS FFI for the star. 

## Additional Resources
Can't get enough? Here are some links to more information!

If you need an introduction (or a refresher!) to basic Python syntax, there are several great resources available online. [CodeAcademy](https://www.codecademy.com/learn/learn-python-3) is a great service with a totally free option for getting started with Python, note you will have to create an account to use it. Additionally, the Youtube channel FreeCodeCamp.org has a great [video tutorial](https://www.youtube.com/watch?v=rfscVS0vtbw) on everyting you need to get started programming in Python. Another good resource is the [Python 4 Everyone](https://www.py4e.com/) book. 

The full astropy documentation can be found [here](https://docs.astropy.org/en/stable/index.html).

For more info on FITS files, here is a link to the [FITS NASA site](https://fits.gsfc.nasa.gov/). 

SIMBAD is a web-based query service from the University of Strausberg, it is a great resource for getting quick info on stars and other astronomical targets. Here is the link to [Fomalhaut's SIMBAD page](https://simbad.u-strasbg.fr/simbad/sim-id?Ident=fomalhaut&NbIdent=1&Radius=2&Radius.unit=arcmin&submit=submit+id)


## What's next?

Next week we will use what we have learned about gathering TESS data and use it to plot a light curve of an exoplanetary system, in addition to exploring time series data and their applications. This will involve learning how exoplanets are detected and discover what other types of systems can be analyzed using light curves. Stay tuned!


## Acknowldegements

If you write a paper using TESS data from MAST, please acknowledge this using the following template:

> This paper includes data collected with the TESS mission, obtained from the MAST data archive at the Space Telescope Science Institute (STScI). Funding for the TESS mission is provided by the NASA Explorer Program. STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5–26555.

Any published work that uses Astroquery should include a citation which can be found at [this link](https://github.com/astropy/astroquery/blob/main/astroquery/CITATION), or can be printed out in a code cell with: `astroquery.__citation__` as long as the astroquery package is imported. 

### About this Notebook:
If you have comments or questions on this notebook, please contact us through the [Archive Help Desk e-mail](mailto:archive@stsci.edu).

**Author:** Emma Lieb

**Last Updated:** June 2023

[Top of Page](#top)

<img style=float:right; src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 