# Using Spectroscopy to Study Galaxies

## What is a Spectrum?
Light, similar to sound, is energy that travels as a wave (a wave of electric and magnetic fields in the case of light). The *wavelength* of light just describes the distance between peaks in a traveling light wave. The figure below shows a wave with its wavelength labeled.

![figure1](images/wavelength_6.png)

It is rare for anything to emit light at just *one* wavelength, however. Almost everything that we see is made up of light at many different wavelengths. Let's imagine that we wanted to know how much light of each wavelength is being emitted by a light source, say the Sun. To accomplish this, we can take a *spectrum*, meaning that we split the light into its component colors or wavelengths. This is what a prism does when it makes a rainbow out of sunlight: it splits up light from the Sun into all of the visible colors from red to violet, as shown below. 

![figure2](images/prismcolors.png)

Similar to sunlight, when we collect the light from a star or a galaxy and spread it out, we can study how bright something is at each wavelength: a.k.a., its spectrum. Analyzing the spectrum of an object tells us a lot of information: its temperature, what it's made of, how fast it's moving, and more.

## How Do we Get so Much Information from a Spectrum?
When we split up the light coming from a light bulb (for example) and look at its spectrum, it looks like a smooth rainbow, like the top panel of the figure below, where the x-axis is wavelength in nanometers. But, if you take hydrogen gas, super heat it/shock into a glowing plasma, and take a spectrum of the glowing hydrogen, it will look like the bottom panel of the figure below, with light only showing up at very specific wavelengths.

![figure3](images/spectrum.png)

These features at specific wavelengths are called *emission lines*. Each chemical element gives off light at particular wavelengths when its atoms get excited (like in a hot plasma), so these lines act like a fingerprint that tells us which elements are present. You can see examples below of emission lines coming from other elements, acting as unique "fingerprints" for each element.

![figure4](images/element_spectra.png)

## What Produces Emission Lines in a Galaxy?
Hot plasma may not be common here on Earth, but it is quite common in galaxies! Emission lines in galaxies are mostly produced by clouds of gas and plasma that are being energized by nearby sources of energy, usually ultraviolet light, x-rays, or collisions. These gas clouds are part of the interstellar medium (ISM), which fills the space between stars. We will elaborate on this in the next section.

Young, hot stars can heat the surrounding hydrogen gas and cause it to glow, producing strong hydrogen emission lines.

Exploding stars (supernovae), stellar winds, and supermassive black holes (AGN) can also energize the gas and influence the shape of the spectrum.

## What is the Interstellar Medium (ISM)?
You may have heard that space is a vacuum, but this isn't completely true. The reason that people say this is because, compared to Earth's atmosphere, the matter in space is *extremely* diffuse. To give you an idea of how diffuse this gas is, Earth's atmosphere has roughly $10^{19}$ particles per cubic centimeter. In many regions of the ISM, you will typically find $10^2 - 10^4$ particles per cubic centimeter. In other words, the Earth's atmosphere can be up to **one hundred thousand trillion** times denser than the ISM! And yet, even though it is widely spread out, there is so much material in the ISM that it emits enormous amounts of light when it is energized.

The interstellar medium is made of gas (mostly hydrogen and helium) and dust that fills the space between stars in a galaxy, and it plays a very important role in galaxy evolution. New stars form out of this gas, and the deaths of stars return energy and elements back into the medium. When the gas in the ISM is heated or ionized, it produces *emission lines*, just like the ones shown in the images above.

## Redshift ($z$) and the Expanding Universe?
We have known since the 1920s that the universe is expanding, which means that distant galaxies appear to be moving away from us. When light from a distant galaxy is emitted, it travels towards the Earth through the expanding Universe, and this causes the light waves to become stretched out, like a slinky. This effect causes the entire spectrum to "shift" toward redder colors and is known as **redshift**. The more distant a galaxy is, the more that its light is "redshifted". You can see this effect in the image below. By measuring how much the light is redshifted, astronomers can determine how far away a galaxy is and learn about the history and expansion of the universe.

![figure5](images/Measuring_Distance_With_Redshift.jpg)

## Observed Wavelength and Redshift

When we observe a distant, "redshifted" galaxy's spectrum, we see that the wavelengths of its emission lines are at redder (larger) wavelengths compared to where they would appear in a laboratory. This **redshift** ($z$) is calculated by relating the **observed wavelength** $\lambda_{\text{obs}}$ to the **rest-frame (original/laboratory) wavelength** $\lambda_{\text{rest}}$ using the equation:

$$
\lambda_{\text{obs}} = \lambda_{\text{rest}} (1 + z)
$$

Below, we see an image showing how a galaxy's spectrum changes depending on its redshift, or how far away it is. See any emission lines that look familiar in the top panel? (*Hint: Look at the Hydrogen emission line spectrum in one of the images above.*)

![figure6](images/galaxy_redshift_illustration.png)

# The goal of this project:

Now that you know what redshift is and how it affects spectra, you're ready to apply that knowledge to real data.

In this project, you'll measure the observed wavelength of an emission line in several galaxy spectra and match each one to its known redshift. Then, you'll perform a linear fit to determine the line's rest-frame wavelength and figure out which element it comes from!

---

# Important emission lines in galaxy spectra

> Coding learning goal: 
*   Read and understand what a function does and how to use it
*   Plot a spectrum

Let's look at a nice spectrum of a galaxy!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from astropy.table import Table
from helper_functions import *


In [None]:
# Define the data directory

# example: raw_data_dir = 'data/raw_spectra/'
raw_data_dir = 'FIXME_the_name_of_the_folder/'  # Where the spectra are stored
figure_dir = 'FIXME_the_name_of_the_folder/'    # Where you want to save plots

In [None]:
def read_spectrum(filename):
    """
    Reads a FITS file from SDSS and returns the spectrum data as an Astropy Table.
    
    Parameters:
    filename (str): The path to the FITS file.
    
    Returns:
    astropy.table.Table: The spectrum data.
    """
    
    t = Table.read(filename, hdu='COADD')

    wavelength = 10**t['loglam']
    flux = t['flux']

    return wavelength, flux

In [None]:
# Pre-made function to plot a spectrum
def plot_spectrum(filename, wave_range=None, fig=None, ax=None):
    """
    Plots the spectrum from the given filename.

    Parameters:
        filename (str): Path to the spectrum file.
        wave_range (tuple, optional): (wave_min, wave_max) for x-axis limits.
        fig (matplotlib.figure.Figure, optional): Existing figure to use.
        ax (matplotlib.axes.Axes, optional): Existing axes to use.

    Returns:
        fig, ax: The figure and axis containing the plot.
    """
    wave, flux = read_spectrum(filename)

    # Apply wavelength range if specified
    if wave_range is not None:
        wave_min, wave_max = wave_range

        # Create a mask to isolate the wavelength region
        mask = (wave >= wave_min) & (wave <= wave_max)

        # Apply the mask to the wavelength and flux arrays
        wave = wave[mask]
        flux = flux[mask]

    # Create figure and axis if not provided
    if ax is None or fig is None:
        fig, ax = plt.subplots(figsize=(8, 4))

    # Plot the spectrum
    ax.step(wave, flux, where='mid', color='black', lw=1)
    ax.set_xlabel('Wavelength [Å]')
    ax.set_ylabel('Flux')

    if wave_range is not None:
        ax.set_xlim(wave_min, wave_max)

    return fig, ax


## Plot a spectrum

In [None]:
# Read in a spectrum and plot it!

# Specify the file name of the spectrum
spectrum_fn = 'FIXME_spectrum1.txt'

# Plot the spectrum!
# First, let's define a figure and axis
fig, ax = plt.subplots(1,1, figsize=(8,4))

# Now, let's use the plot_spectrum() function we saw above
fig, ax = plot_spectrum(raw_data_dir+spectrum_fn, fig=fig, ax=ax)

Awesome! You just plotted a spectrum of a galaxy! Notice that we did this by using a function called "plot_spectrum" that we created above. Make sure you understand what is happening in the function.

What kind of features can you see in the spectrum? Continuum? Emission lines? Absorption lines? 

## Zoom in on some emission lines

Emission lines act like fingerprints for chemicals and elements in galaxies --- once you find a pattern, you start spotting them in most other spectra. Let's zoom into an easily recognizable group of 3 emission lines around 500nm...

In [None]:
# Define the range of wavelengths to zoom in on
min_wave = 400 #nm
max_wave = 700 #nm
test_wave_range = [min_wave, max_wave]   # Create a list with the min and max wavelength

fig, ax = plt.subplots(1,1, figsize=(8,4))

# Input the additional argument "wave_range" to plot_spectrum() 
# to zoom in on a specific wavelength range
fig, ax = plot_spectrum(raw_data_dir+spectrum_fn, fig=fig, ax=ax, wave_range=test_wave_range)

If the spectrum is not zoomed in enough, try changing min_wave and max_wave to capture the 3 emission lines better

## Find the same group of emission lines in a different spectrum

Now, let's plot another spectrum. Can you find this group of emission lines in a new spectrum? Try zooming into the group of lines. Are they still at around 500nm?

In [None]:
# Specify another file name 
spectrum_fn = 'FIXME_spectrum2.txt'

# Define a new wavelength range
min_wave = FIXME #nm
max_wave = FIXME #nm
test_wave_range = [min_wave, max_wave]

fig, ax = plt.subplots(1,1, figsize=(8,4))
fig, ax = plot_spectrum(raw_data_dir+spectrum_fn, fig=fig, ax=ax, wave_range=test_wave_range)

What is the approximate wavelength of the rightmost emission line?

This emission line usually falls at around 500nm. Is the one you located bluer or redder than 500nm? What happened to this galaxy?

The emission lines of far-away galaxies are often redshifted away from what they would look like in a laboratory. Since the wavelength of an emission line of a redshifted galaxy isn't its true wavelength, we call it the "observed wavelength".

# Using code to find the wavelength of an emission line

> Coding learning goal: \\
*   Overlay a curve on a plot

While we can try to figure out the observed wavelength of emission lines by eye and hope for the best, that's not very scientific... Is the emission line exactly at one single wavelength? 

What we can do instead, is write some code to find the "center" of an emission line. The function "fit_gaussian" helps us create a model emission line that best matches the one in the spectrum. After it figures out the best model, it tells us what the emission line "center" of this model is.

In [None]:
# Define a Gaussian function for fitting
def gaussian(x, amp, cen, wid, offset):
  return amp * np.exp(-(x-cen)**2 / (2*wid**2)) + offset

# Pre-made function to fit Gaussian profile to an emission line
def fit_gaussian(filename, wave_range=[4980, 5020]):
  wave, flux = read_spectrum(filename)

  wave_min, wave_max = wave_range
  # Create a mask to isolate the wavelength region
  mask = (wave >= wave_min) & (wave <= wave_max)
  # Apply the mask to the wavelength and flux arrays
  wave, flux = wave[mask], flux[mask]

  # Fit a Gaussian profile to the emission line
  p0 = [1, 500, 10, 0]  # Initial guess for the parameters
  fit, cov = curve_fit(gaussian, wave, flux, p0=p0)
  # fit contains the best-fit parameters (amp, cen, wid, offset)
  # cov contains the error estimates for the parameters

  return fit, cov


In [None]:
# Run fit_gaussian() on the first spectrum

spectrum_fn = 'FIXME_spectrum1.txt'

# Define a new wavelength range (good practice to re-set it here)
min_wave = FIXME #nm
max_wave = FIXME #nm
test_wave_range = [min_wave, max_wave]

fit, cov = fit_gaussian(raw_data_dir+spectrum_fn, wave_range=test_wave_range)


In [None]:
# Let's plot the spectrum and the best-fit Gaussian together

fig, ax = plt.subplots(1,1, figsize=(8,4))
fig, ax = plot_spectrum(raw_data_dir+spectrum_fn, fig=fig, ax=ax, wave_range=test_wave_range)

# Use read_spectrum() to get the wavelength array for plotting
wave, flux = read_spectrum(raw_data_dir+spectrum_fn)

# Apply the gaussian function to the wavelength array
flux_fit = gaussian(wave, *fit)  
# Note: *fit unpacks the fit array into individual parameters
# This is the same as gaussian(wave, fit[0], fit[1], fit[2], fit[3])

# Plot the best-fit Gaussian
ax.plot(wave, flux_fit, color='red', lw=2, label='Best-fit Gaussian')
ax.set_title(f'Spectrum with Best-fit Gaussian for {spectrum_fn}')
ax.legend()
plt.show()

In [None]:
# Print out the best-fit parameters
print(f"Best-fit parameters for spectrum in {spectrum_fn}")
print(f"Amplitude: {fit[0]:.3f}") # :.3f means print 3 decimal places
print(f"Center: {fit[1]:.3f}")
print(f"Width: {fit[2]:.3f}")
print(f"Offset: {fit[3]:.3f}")


What is the observed wavelength of the emission line?  

## Do the same process for the second spectrum

In [None]:
# Now, let's repeat the fitting for the second spectrum
spectrum_fn = 'FIXME_spectrum2.txt'

FIXME
FIXME
FIXME



In [None]:
FIXME
FIXME
FIXME
FIXME


How different are the observed wavelengths of this emission line in spectrum 1 and spectrum 2? 

# Fitting emission lines in many spectra

Now that we've found the wavelength of an emission line in one spectrum, let's do the same procedure for a whole bunch of spectra!

## Crop the spectra

> Coding learning goal: \\
*   Read in file names from a directory
*   Save a list of information to a .txt file


In [None]:
# approx 10 spectra
# use os.listdir to get the filenames, save in filenames_arr

In [None]:
file_no = 0    # Change this number from 0 to 9

# show the full spectrum



# list of wave_range for each file_no
wave_range_list = [[FIXME,FIXME],   # 0
                   [FIXME,FIXME],   # 1
]

# example:
# wave_range_list = [[100,150],   # 0
#                    [200,230],   # 1
#                    [200,230],   # 2 ........
#                              ]  # 9


In [None]:
# save filename and wave_range_list into a .txt file


Just to save you some time, we helped you crop about 100 other spectra!

## For Loop to apply the procedure on many spectra

> Coding learning goal: \\
*   Write for loops to apply a procedure to many files



In [None]:
# write a for loop to use fit_gaussian() on all our files
# and save the fitted wavelength in an array

In [None]:
# show a few fits to make sure the code is okay

In [None]:
# save array to the .txt file

# Adding redshift information to our catalog

> Coding learning goal: \\
*   Transfer information from one file into our main file



# Plotting our results

## Observed wavelength vs. redshift

In [None]:
# Populate a redshift array (x)

# Populate an observed wavelength array (y)



In [None]:
# Plot

## Fit a model to the data

$\lambda_{\mathrm{obs}} = \lambda_{\mathrm{rest}} (1+z)$ \\
where ....

This looks like a straight line: $y = mx+b$ \\
What is $x$? What is $y$?
What is $m$ and $b$?

In [None]:
# Define the equation

def straight_line(x, FIXME):

  return FIXME


In [None]:
# Use scipy.optimize.curve_fit()



## Plot our best-fit line

In [None]:
# Plot data and overlay best-fit

# What is the rest wavelength of this emission line??