# TAURUS + NSF REU 2022
# Introduction to Python Day 4
# Astropy, FITS files & Imaging

Congrats on making it to the last day of this seminar! You've learned a lot in three days, so now let's put it all together to do some astronomy! 

A package I highly recommend you become familiar with is astropy. This includes a library of astronomical constants and unit conversions, coordinate conversions, cosmological calculations, image processing, and much more! There are many components of astropy that we won't be able to cover, but as a sneak peak you can deal with tables of data, measuring photometry, and processing psfs in images. See more here: https://docs.astropy.org/en/stable/index.html

## Pip install

This leads us to a great opportunity to show you how to install additional packages to your Python distribution. If it is a large, well-known package (i.e., not developed by one person on github), you can usually install things using pip. Go ahead and open a terminal session. Once you are in bash and your Python 3.5 environment, type

    pip install astropy
    
This will automatically download and install astropy from the Internet! Note that this needs to be done outside of a Jupyter notebook (it'll throw an error if you try it). 

(Depending on the distribution of Python you have installed on your computer, it may or may not have come with astropy preloaded. If it's already there, then pip install will stop and say 'requirement already satisfied.')


## Astropy: Units & Constants

The most basic use for Astropy is units and constants. I highly recommend using the constants within Astropy so that a) you're not constantly Googling them, and b) they are precise and accurate. The constants are loaded with units as well, making conversions between them quite easy.

One quick disclaimer is that the solar mass constant is slightly out of date, so researchers doing high-precision stellar work may need to keep that in mind.

Load up the constants as well as our other usual packages:


In [None]:
from astropy import constants as const
import numpy as np
import matplotlib.pyplot as plt

To call a constant, simply type const.(constant) where the symbols are generally the same as what's in your textbooks. For example, G is Newton's gravitational constant:

In [None]:
G = const.G

print(G)
const.G

Calling the constant will give you a bunch of information about it, including its name, value, uncertainty, and SI unit. Astropy is not in cgs units, but it's easy to convert by adding .cgs to the end of the unit:

In [None]:
G_cgs = const.G.cgs

print(G_cgs)

You can also convert to specific units as you choose, by doing .to('new units'). For example:

In [None]:
G_weirdunits = const.G.to('kpc3 / (Msun Gyr2)')

print(G_weirdunits)

This can be very helpful when using normalized equations. Note that Python will have trouble with certain units: for example, it doesn't know it can cancel out Hz and seconds. Also, if there are units inside a log expression, it will complain. So if you already know your units work out and you just want the value, you can type .value!

These can also be combined, like G.cgs.value.

In [None]:
G_value = const.G.value

print(G_value)

If you do want to keep track of units, you can assign them by multiplying a float by the unit.

In [None]:
from astropy import units as u

R = 1.0 * u.kpc

print(R)

## Astropy: Cosmological Calculations

There's a good chance you will need to calculate things like luminosity distance, angular diameter distance, or age of the Universe, especially if you are doing high-redshift work. To do cosmology in astropy, first set the cosmological parameters, like so:

In [None]:
from astropy.cosmology import FlatLambdaCDM

cosmo=FlatLambdaCDM(H0=70., Om0=0.3)

This sets the Hubble constant to 70 km/s / Mpc and the matter density factor, $\Omega_M$, to 0.3. The cosmological calculations are attributes of FlatLambdaCDM which we've set to "cosmo." Here are a couple of examples:

In [None]:
LD4 = cosmo.luminosity_distance(4).to('kpc').value
ADD4 = cosmo.angular_diameter_distance(4)

print(LD4, ADD4)

## Astropy: Sky Coordinate Systems

Another fundamental part of astronomy is celestial coordinate systems. You will quickly learn that there are many ways to present the locations of objects in the sky: through right ascension and declination vs. galactic longitude and latitude for instance, or in degrees vs. hourangle, or in different frames of reference. With astropy.coordinates it is easy to convert between these.

In [None]:
from astropy.coordinates import SkyCoord

#Say you want to define a star at 150.00 degrees, +2.00 degrees

coords = SkyCoord(ra=150.0, dec=2.00, unit='deg', frame='icrs')

#ICRS is common, but you might use barycentric 

Now we can access other information or convert it to another format.

In [None]:
print(coords.ra.hms)
print(coords.dec.degree)
print(coords.to_string('hmsdms'))

# Can convert to galactic coordinates l, b
coords_galactic = coords.galactic
print(coords_galactic)

# Matches and Separations

SkyCoord objects have some pretty awesome attributes that allow you to match between two different catalogs and gives a quick and easy way to compute the separation between different objects. This comes in handy when you need to find an imaging counterpart for an emission line, or when you are checking the astrometry (coordinates) of an image.

For more information click the link here: https://docs.astropy.org/en/stable/coordinates/matchsep.html

The first thing we will cover is separation. It takes two Sky Coordinates objects and it will give you back the separation between them.

In [None]:
coords = SkyCoord(ra=150.0, dec=2.00, unit='deg', frame='icrs')
counterpart = SkyCoord(ra=150.000056, dec=2.000043, unit='deg', frame='icrs')

sep = coords.separation(counterpart)

print(f'The separation of the counterpart is: {sep}')

In [None]:
#You can also change the units of the separation with ease using the following

#give separation in arcseconds
print(f'Separation in arcseconds is: {sep:.3f}')

#give separation in arcminutes
print(f'Separation in arcminutes is: {sep:.5f}')

#gives separation in degrees
print(f'Separation in degrees is: {sep:.6}')

#gives separation in radian
print(f'Separation in radians is: {sep:.8f}')

In [None]:
#You can also find separation between an array of SkyCoordinates

skycoord_arrays = SkyCoord(ra = np.random.uniform(100, 200, 10000) * u.degree, 
                           dec = np.random.uniform(0, 10, 10000) * u.degree)

sep = coords.separation(skycoord_arrays)

In [None]:
#you can then impose conditions to find candidate counterparts by using a search radius

#makes a boolean mask for us to then apply to sep variable to filter only the 
#sources that satisfy this condition
possible_match_mask = sep.degree < 1 #searching around a 1 degree search radius

print('Matches in skycoords array within 1 degree of coords:')
skycoord_arrays[possible_match_mask]

# Matching Catalogs

The next very useful attribute of skycoords is its ability to quickly find nearby matches between two different catalogs. 

In [None]:
catalog1 = SkyCoord(ra = np.random.uniform(0, 360, 500) * u.degree, 
                    dec = np.random.uniform(-90, 90, 500) * u.degree)

catalog2 = SkyCoord(ra = np.random.uniform(0, 360, 1000) * u.degree, 
                    dec = np.random.uniform(-90, 90, 1000) * u.degree)

#the way to find matches between catalog is by using the following
idx, sep2d, sep3d = catalog1.match_to_catalog_sky(catalog2)

#idx is the index in catalog 2 that was the closest match to the coordinates in catalog1
#sep2d is the separation between the closest match
#sep3d the 3d separation only useful if you have distance in SkyCoord, not useful here

matches_to_cat1 = catalog2[idx]

close_matches_mask = sep2d.degree < 1 # checking which coordinates are w/in 1 deg of one another

#applying mask to catalog 1
print('Matches in Catalog 1 within 1 degree: ')
print(catalog1[close_matches_mask])
print()

#applying mask to matches in catalog2
print('Matches in Catalog 2 within 1 degree: ')
print(matches_to_cat1[close_matches_mask])
print()

# Astropy Tables

The next major thing to know about is using astropy tables. These tables will be some of the data that may be stored in FITS files which we will cover in the next section. This section will teach you how to make an astropy table, and append values to it. 

In [None]:
from astropy.table import QTable, Table, Column

In [None]:
#Creating a Table Numero 1

#creates an empty table
t = Table()

#fills in the table with data with the columns being 'a', 'b', and 'c'
# with the corresponding data for each column
t['a'] = [1, 4]
t['b'] = [2.0, 5.0]
t['c'] = ['x', 'y']

In [None]:
#Creating a Table Numero 2

#we make our three columns into arrays
a = np.array([1, 4], dtype=np.int32)
b = [2.0, 5.0]
c = ['x', 'y']

#generate the table below with the arrays as the columns and the names of the columns
#in the names 
t = Table([a, b, c], names=('a', 'b', 'c'))

In [None]:
#Creating a Table Numero 3

#this is using a mehtod called dictionaries which are a type of containers 
#they use a key to access data with the first entry the key and the next value the data
#we can see in this one that the keys become the columns names and that the data becomes
#the data in the columns
arr = {'a': np.array([1, 4], dtype=np.int32),
        'b': [2.0, 5.0],
        'c': ['x', 'y']}

Table(arr)

There is no right or wrong way to generate an astropy table, it all comes down to personal preference and what data you are handling. Sometimes you do not know what data you will store so method 1 may be best, othertimes you may have data in the forms of dictionaries and so method 3 would work best here. 

# Getting values from the astropy table

In [None]:
arr = np.arange(15).reshape(5, 3)
t = Table(arr, names=('a', 'b', 'c'), meta={'keywords': {'key1': 'val1'}})

In [None]:
t['a']       # Column 'a'
t['a'][1]    # Row 1 of column 'a'
t[1]         # Row 1
t[1]['a']    # Column 'a' of row 1
t[2:5]       # Table object with rows 2:5
t[[1, 3, 4]]  # Table object with rows 1, 3, 4 (copy)
t[np.array([1, 3, 4])]  # Table object with rows 1, 3, 4 (copy)
t[[]]        # Same table definition but with no rows of data
t['a', 'c']  # Table with cols 'a', 'c' (copy)
dat = np.array(t)  # Copy table data to numpy structured array object
t['a'].quantity  # an astropy.units.Quantity for Column 'a'
t['a'].to('km')  # an astropy.units.Quantity for Column 'a' in units of kilometers
t.columns[1]  # Column 1 (which is the 'b' column)
t.columns[0:2]  # New table with columns 0 and 1

# FITS Files

Astropy can read in FITS files, which is a typical format of astronomical images. FITS stands for Flexible Image Transport System, and most basic image viewers can't do anything with them since they aren't files like JPGs or GIFs. There are a few programs that will read them, including CASA if you're a radio astronomer, or DS9.

FITS is a useful format for astronomical data because it contains a lot of behind the scenes information. The header in particular will usually give you information about the telescope the data is from, the reference position in the sky for the data, the pixel scale, the size of the image, and more! First, let's import the package.

In [None]:
from astropy.io import fits

Let's step through how to work with the files I've given you to work with. First, open the file using fits.open(). To get some basic information about the file, type fitsfile.info().

In [None]:
hdulist = fits.open('f001a066.fits') #B filter image of Andromeda
hdulist.info()

The actual information of interest is contained in the header. FITS files can contain many images in the same file, but this is a 2D image, so it only contains one slice. We will still have to specify that it's the first slice, so we call "hdulist[0]". Note that you can do a shortcut in the first place and place the [0] at the end of fits.open(). 

Let's read out the header:

In [None]:
hdulist[0].header

Now we know a little bit about what we're working with. You'll have to check headers pretty frequently so I'd commit this step to memory!

Now we can access the data contained in the FITS file. The values are typically units of brightness and the positions in the matrix are the pixel values, with (0, 0) in the bottom left. You can manipulate this matrix like you would with any array. 

In [None]:
M51B = hdulist[0].data

# Aplpy

My favorite way to work with images in Python is the package aplpy (pronounced Apple Pie) which allows you to make lovely publication-worthy image plots. 

#### Pip install aplpy, and import it here.

In [None]:
import aplpy

To use aplpy, you don't need to go through the process of opening the FITS file and grabbing the data as in astropy. You will just open the file using FITSFigure. Let's check out this mysterious image:

In [None]:
galaxy = aplpy.FITSFigure('f001a066.fits')
galaxy.show_colorscale(cmap='hot')
galaxy.show_colorbar()

plt.show()

This is the tutorial included in the Aplpy documentation that nicely shows how you can overplot contours and include multiple layers. Take some time to play with this:

In [None]:
galaxy = aplpy.FITSFigure('aplpy_tutorial/fits/2MASS_k.fits')

galaxy.show_rgb('aplpy_tutorial/graphics/2MASS_arcsinh_color.png')

galaxy.show_contour('aplpy_tutorial/fits/mips_24micron.fits')

data = np.loadtxt('aplpy_tutorial/data/yso_wcs_only.txt')
ra, dec = data[:, 0], data[:, 1]

galaxy.show_markers(ra, dec, layer='marker_set_1', edgecolor='red', facecolor='none', 
                    marker='o', s=10, alpha=0.5)

plt.show()

# Exercises

## Exercise 1
Let's practice some more plotting skills, now incorporating units.

A. Write a function that takes an array of frequencies and spits out the Planck distribution. That's this equation:

$ B(\nu, T) = \frac{2h\nu^3/c^2}{e^{\frac{h\nu}{k_B T}} - 1}$
 
 
This requires you to use the Planck constant, the Boltzmann constant, and the speed of light from astropy. Make sure they are all in cgs.

B. Plot your function in log-log space for T = 25, 50, and 300 K. The most sensible frequency range is about $10^5$ to $10^{15}$ Hz. Hint: if your units are correct, your peak values of B(T) should be on the order of $10^{-10}$. Make sure everything is labelled.

# Exercise 2

Let's put everything together now! Here's a link to the full documentation for FITSFigure, which will tell you all of the customizable options: http://aplpy.readthedocs.io/en/stable/api/aplpy.FITSFigure.html. Let's create a nice plot of M51 with a background optical image and X-ray contours overplotted.

The data came from here if you're interested: http://chandra.harvard.edu/photo/openFITS/multiwavelength_data.html

A. Using astropy, open the X-RAY data (m51_xray.fits). Flatten the data array and find its standard deviation, and call it sigma.

B. Using aplpy, plot a colorscale image of the OPTICAL data. Choose a colormap that is visually appealing (list of them here: https://matplotlib.org/2.0.2/examples/color/colormaps_reference.html). Show the colorbar. 

C. Plot the X-ray data as contours above the optical image. Make the contours spring green with 80% opacity and dotted lines. Make the levels go from 2$\sigma$ to 10$\sigma$ in steps of 2$\sigma$. (It might be easier to define the levels array before show_contours, and set levels=levels.)

## Exercise 3: Spectral Analysis

Code and Exercise provided to us by Olivia Cooper

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
%matplotlib inline
import pandas as pd
import astropy
from astropy.io import fits
from astropy.visualization import ZScaleInterval
from astropy.stats import sigma_clip
from scipy.optimize import curve_fit
plt.style.use('cooper-paper.mplstyle')

# Plotting spectra in 1D and 2D

In this example, we'll be looking at a source from a MOSFIRE program called The MOSFIRE Deep Evolution Field Survey (MOSDEF). This data set contains hundreds of galaxies (find the data here: https://mosdef.astro.berkeley.edu/for-scientists/data-releases/), but we'll look at just one.

In [None]:
# load in the spectrum in 1D and 2D for object 2335 in the MOSDEF catalog

#This is the one that has 2335.2d.fits extension
hdu2 = fits.open('data/co2_01.H.2335.2d.fits') # 2d spectrum

#This is the one that has 2335.2d.fits extension
hdu1 = fits.open('data/co2_01.H.2335.ell.1d.fits') # 1d spectrum

Let's first look at the 2D spectrum. Based on the header, we want to look at the science frame, extension 1. (If you want to see the noise frames etc, type in the appropriate index for the extension noted in the header!) I wrote a quick function to help us plot it.

In [None]:
# it's a good idea to check out the header to see how the data are set up, 
#Open up the header for the 2D file

#Code here


In [None]:
def show_2d(image, lower=-1, upper=2):
    """
    Plot grayscaled 2D spectrum

    Parameters
    ----------
    image : fits data
        Fits data for 2D image.
        
    lower : float
        Lower value to dictate vmin for grayscale. The default is -1.
    
    upper : float
        Upper value to dictate vmax for grayscale. The default is 2.

    Returns
    -------
    Plot of 2D spectrum.

    """

    sample = sigma_clip(image) 
    vmin = sample.mean() + lower * sample.std()
    vmax = sample.mean() + upper * sample.std()
    plt.figure(figsize=(15, 3))
    plt.imshow(image, origin='lower', cmap='gray', aspect='auto', vmin=vmin, vmax=vmax)
    plt.xlabel('Column Number')
    plt.ylabel('Row Number')
    plt.grid(False)

In [None]:
# displays the 2D spectrum

show_2d(hdu2[1].data) 

You'll notice a few things:

1) Going across the image there is a white line -- this is the continuum! That is the signal or light coming from the galaxy, all spread out in wavelength space.

2) Above and below the continuum there are black lines going across. This is the negative of the signal, and is a feature we expect to see in this type of data if the signal is real (it's a product of the ABBA set-up...ask us if you want to know more!).

3) There are TV static noisy vertical features -- these are the residuals of sky lines, of which there are a ton in the near-infrared! These sky lines have to be removed (as they are here) so we can extract the underlying signal from the galaxy.

4) Along the continuum, there are a couple bright spots. Finding and identifying these features is our goal! These are emission lines, and based on the exact positions of these we can measure a spectroscopic redshift :-) More on this later...

The rows and columns are showing the pixel scale, but there is information in the header of the FITS file that tells us how to convert from pixels to wavelength so we can precisely analyze the data. Let's do this with the 1D spectrum...

In [None]:
# Lets compare the header of the 2D spectrum with the header of the 1D spectrum

#Code for 1D header here



That's a lot of info! Take a gander and see what you can find. (Remind us to talk about optimal vs boxcar extraction sometime if we haven't already) 

Of interest to us at the moment is our wavelength scaling info. We will use the information in the 1D spectra header to make our wavelength array whishc will be the first extension in hdu1.

Of particular interest to us are the following keywords in the header:

CRVAL1: This tells us the starting wavelength for our array \
CDELT1: This tells us the spacing between wavelengths (ie: the $\Delta \lambda$ between different pixels)\
NAXIS1: The length of the wavelength array

In [None]:
spec_header = hdu1[].header

In [None]:
# Translate the header data into a wavelength array
# Generate a wavelength array using the optimal extraction extension
# The starting value fo the wavelength array is located at
# CRVAL1 and goes the length of the array using NAXIS1 in steps of CDELT1
#       ENTRY1
# Wavelength = CRVAL1 + CDELT1, ENTRY1 + CDELT1, ENTRY2 + CDELT1, ...,  


wavelength = 

#Use your knowledge of Astropy hdu objects to get the spectral data
spec1d = 

Great! Now we have the data for the x-axis (wavelength) and for the y-axis (spectrum). Let's plot!

In [None]:
# plot the 1D spectrum

plt.plot(wavelength, spec1d)
plt.xlabel(r'Wavelength $[\AA]$')
plt.ylabel(r'Signal [$erg\, s^{-1}\, cm^{-2}\, \AA^{-1}$]')

Uh...what does it mean? Basically, the way `matplotlib` has scaled this data is terrible. But that's fair, because spectra have huge noise spikes that we don't care about. So, we gotta scale better!

In [None]:
# plot the 1D spectrum, now with a better scale
### note, another way to do this is to multiply the signal by a factor
### here I would use 1e17 and then set ylim from -1 to 1

plt.plot(wavelength, spec1d)
plt.xlabel(r'Wavelength $[\AA]$')
plt.ylabel(r'Signal [$erg\, s^{-1}\, cm^{-2}\, \AA^{-1}$]')

#Play around with a scaling below so that you can see the emission features of the spectrum 
#very prominently 



There's our signal!! And you may start to notice some features...it'll get easier with practice looking at spectroscopic data. And using the 1D and 2D together is essential to pick out the features in 1D!

# Fitting for a redshift

Our next task is to figure out what lines we're seeing. Load in the provided line list and let's see if we can figure it out.

The following cell loads in common emission, absoprtions and sky lines at various rest-frame wavelengths data from the Sloan digital Sky Survey (SDSS).

In [None]:
# list of spectral lines (originally from SDSS)
### shows the vacuum (rest) wavelength, the species, and the type of line

lines = pd.read_csv('linelist.csv',delimiter=",",comment='#') # all lines
em = lines[lines['type']=='Emission'] # emission lines
ab = lines[lines['type']=='Absorption'] # absorption lines
sky = lines[lines['type']=='Sky'] # absorption lines

em

We will use these lines to figure out the redshift of the galaxy we are looking at. Redshift is an astronomical term that basically tells us how much the light from a galaxy has been stretched due to the expansion of the universe. I more nearby galaxy will have little stretch and will have a low redshift where a galaxy further away will have its light stretched more and will have a higher redshift. 

Our task is to take the lines that we loaded in and determine the redshift of the source. We will do this by guessing where the lines should be for a given redshift.

Your task is to input a guess for redshift in for zguess below and try to match up line features in the spectrum, Matching the emission features with spikes, and absorption features with dips.

In [None]:
# pay attention to the scaling here!
# your task: change zguess until the lines match up with the features :-) 

zguess =  ## adjust the redshift!

plt.plot(wavelength, spec1d*1e17)
plt.xlabel(r'Wavelength $[\AA]$')
plt.ylabel(r'Signal [$10^{-17} erg\, s^{-1}\, cm^{-2}\, \AA^{-1}$]')

plt.xlim(15500,17000)
plt.ylim(-0.5,1.5)

#Plotting the lines
plt.plot(ab['lambda']*(1+zguess),0*np.ones_like(ab['lambda']),'kv',ms=3,label='absorption')
plt.plot(em['lambda']*(1+zguess),0*np.ones_like(em['lambda']),'k^',ms=3,label='emission')
plt.plot(sky['lambda']*(1+zguess),0*np.ones_like(sky['lambda']),'kd',ms=3,label='sky')

#this is code that draws an arrow for us to visually see them lining up
for i in range(len(lines['lambda'])):
    plt.annotate(str(lines['species'][i]+', '+str(lines['lambda'][i])),xy=(lines['lambda'][i]*(1+zguess), 0),\
                xytext=(lines['lambda'][i]*(1+zguess)-10, 1),arrowprops=dict(arrowstyle="-",),size=7,rotation=90)
plt.legend()

# More Exact Redshift Estimate

Doing things by eye is a great first start but our eyes cannot see down to very fine details. For exact science and a more robust prediction on the redshift we need to use fitting techniques to fit the emission line and get a better measurment of the peak emission wavelength. For that we will use $\textbf{Model Fitting}$.

Your goal is to fit an emission line of your choosing with a Gaussian and provide us with the wavelength and the spectrosopic redshift of the source.

In [None]:
# First step is to make the model of our Gaussian 
# Complete the function definition of this Gaussian
# Look back to Day 3 for definition of a Gaussian or feel free to google it
# Just note what each parameter in this gaussian is in the context of the spectra

# x = wavelength
# sigma = broadness of the emission line
# A = amplitude of the emission line
# mu = central wavelength of the gaussian

def gaussian():
    '''
    Function declaration for a Gaussian
    
    Input(s)
    ----------
    
    
    Output(s)
    -------------
    
    '''
    
    
    return 

In [None]:
# To fit a gaussian to find line center
# we need to make a cut so the fit only uses data around the line we want to fit
# As an example you may want to fit the emission line near 16400 Angstroms and select 
# a window of 100 angstroms on either side
# Hint: use np.where() here and apply it to both wavelength and spectrum array

wav = wavelength[]
spec = spec1d[] * 1e17

In [None]:
#Next step is to fit the line and for that we need you to provide us with a best guess 
#for every parameter

#      Guess_amplitude, Guess_center, Guess_sigma 
# (Note that the order of guesses will depend on how you defined your gaussian function)

guess = 

#       provide some bounds for your fit. 
# Note that this will need to be in the same order as your guesses
bounds = ((, , ), (, , ))

#fitting spectrum with a gaussian
popt, pcov = curve_fit(gaussian, wav, spec, p0=guess, bounds=bounds)


obswav = popt[]
obswav_error = np.sqrt(pcov[, ])
print(f'Observed wavelength of OII [5008] is {obswav:.3f} +/- {obswav_error:.3f}')

In [None]:
# measure the redshift

restwav = 
redshift = obswav / restwav - 1
print(f'The redshift is {redshift:.4f} ')