# <p style="background-color: #f5df18; padding: 10px;">Programming & Plotting in Python | **Plotting** </p>

<div style="display: flex;">
    <div style="flex: 1; margin-right: 20px;">
        <h2>Questions</h2>
        <ul>
            <li>How can I plot my data?</li>
            <li>How can I save my plot for publishing?</li>
        </ul>
    </div>
    <div style="flex: 1;">
        <h2>Learning Objectives</h2>
        <ul>
            <li>Create a scatter plot showing relationship between two data sets.</li>
            <li>Create a time series plot showing a single data set.</li>
        </ul>
    </div>
</div>

## [`matplotlib`](https://matplotlib.org/) is the most widely used scientific plotting library in Python.
![](https://upload.wikimedia.org/wikipedia/en/thumb/5/56/Matplotlib_logo.svg/540px-Matplotlib_logo.svg.png)
- Commonly use a sub-library called [`matplotlib.pyplot`](https://matplotlib.org/stable/tutorials/introductory/pyplot.html).
- The Jupyter Notebook will render plots inline by default.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

- Simple plots are then (fairly) simple to create.

## üîî Display All Open Figures

In our Jupyter Notebook example, running the cell should generate the figure directly below the code.
The figure is also included in the Notebook document for future viewing.
However, other Python environments like an interactive Python session started from a terminal
or a Python script executed via the command line require an additional command to display the figure.

Instruct `matplotlib` to show a figure:

```python
plt.show()
```

This command can also be used within a Notebook - for instance, to display multiple figures
if several are created by a single cell.


Here are some additinal arguments that we can give inside the `plot` function:

* **`linestyle` or `ls`** takes the type of line you want as its input (`dashed`, `solid`, `dash dot`) or you can "draw" them out as we have in the example. Either way, the argument given needs to be given as a string (in quotes).

* **`linewidth` or `ls`** argument makes your line thinner or thicker, and takes a number as an input

* **`color` or `c`** takes a color's name as an input, which also needs to be in quotes

If you want to look closer at the default options for these parameters, or see what else is available, [here is the documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D).


## **Plot labeling and saving**

While our plot does show the data we gave, it doesn't really tell us anything. We want to know what is being plotted! Here are important functions for labeling our plot and saving it to a file once it is done:

- `plt.xlabel(input)`: Creates a label for the x axis
- `plt.ylabel(input)`: Creates a label for the y axis
- `plt.title(input)`: Creates a title for the plot
- `plt.legend()`: Creates a legend for the figure. By default matplotlib will attempt to place the legend in a suitable position, but you can use the `loc=` argument, e.g to place the legend in the upper left corner of the plot, specify `loc='upper left'`.
- `plt.savefig(savename)`: saves the figure as savename

Note that the `legend` function is really useful when you have multiple curves to plot **but** only works if you have given each curve a `label` argument. Check it out in this example:

# **Subplots**

Sometimes, plotting two lines on the same figure doesn't look appealing, and you might prefer to display them separately but side by side. For this, we can use subplots.

First, we will create a figure, then add subplots to it and plot within each subplot separately. Here are some commands we will use:

- `plt.figure()`: Generates an empty figure to build upon

- `plt.subplot(XYZ)`: Sets the current subplot (and generates it if necessary). X specifies the number of rows, Y specifies the number of columns, and Z specifies the figure you want to work on. A grid of X * Y subplot spaces will be created, so plan accordingly.
    
- `plt.tight_layout(pad = X)`: Adds extra space between the subplots.

Now, let's see these commands in action.

## **Other types of 2D plots**

But what if the data we're plotting doesn't look good on a line plot and would be better represented as a scatter plot, bar graph, or log plot? You can use the following commands for different types of plots:

- `plt.semilogy(data)`: Creates a log plot, where only the Y axis is log scaled
- `plt.semilogx(data)`: Creates a log plot, where only the X axis is log scaled
- `plt.scatter(x,y)`: Creates a scatter plot, requires both X and Y input
- `plt.bar(x,y)`: Creates a bar plot, requires both X and Y input

In [None]:
#Create data
x1 = np.linspace(0,10,11)
y1 = x1 * 1000 + 100

#Plotting


## <p style="background-color: #f5df18; padding: 10px;"> üõë Multiple plots exercise </p>

Make a figure that has 3 side by side plots, where the first is a line, the second is x$^2$, and the third is x$^3$. Make sure to add labels to them.



In [None]:
# your solution here

## <p style="background-color: #f5df18; padding: 10px;"> üõë Plotting the Expansion Rate of the Universe Part I </p>


Read in the file `Hubble-1929.csv` that is stored in the `data/` directory. The file contains three columns (Object, Distance, and Velocity), which represent the galaxy name, distance in units of 1e6 pc (or Mpc), and velocity in km/s. Recreate the scatter plot from [Hubble's 1929 paper](https://www.pnas.org/doi/10.1073/pnas.15.3.168) that deduced that the univese was expanding.

In [None]:
# your answer here

hubble_data = #



## <p style="background-color: #f5df18; padding: 10px;"> üõë Plotting the Expansion Rate of the Universe Part II </p>

The data from Hubble 1929 is consistent with the idea that the farther a galaxy is, the faster it is moving away from us. This observation was one of the first pieces of evidence that the universe is expanding.

We want to fit a line to this data assuming the relation:

$$
v = H_0 \times d
$$

where $H_0$ (the Hubble parameter) is the slope.

You can derive the best-fit slope by minimizing the sum of squared residuals, which leads to the formula:

$$
H_0 = \frac{\sum_i v_i \, d_i}{\sum_i d_i^2}
$$

This formula gives the value of $H_0$ that best fits the data, assuming the line passes through the origin (i.e., zero velocity at zero distance).

---

### Step 1: Write a function to compute $H_0$

- Use the formula above to calculate $H_0$ from the observed distances ($d_i$) and velocities ($v_i$).

### Step 2: Plot the data and the best-fit line

- Create an array of distances from 0 to 2.25 Mpc.
- Use the value of $H_0$ that you inferred to calculate the corresponding velocities using the formula $v = H_0 \times d$.
- Plot the original data points.
- Overplot the best-fit line (velocity vs. distance) as a solid line on the same plot.

In [None]:
### your solution here ##


# create a function to infer H0 using the formulate shown above


# overplot a line with slope = H0 and b = 0.



## <p style="background-color: #f5df18; padding: 10px;"> üõë Estimate the Age of the Universe from $H_0$ </p>

The Hubble constant $H_0$ describes the rate of expansion of the universe in units of kilometers per second per megaparsec (km/s/Mpc).

Because $H_0$ has units of inverse time (after appropriate conversion), you can use it to estimate the age of the universe by calculating the time it would take for galaxies to reach their current distances assuming a constant expansion rate.

#### Why does this work?

- If the universe has been expanding at a roughly constant rate, the age of the universe is approximately the inverse of the expansion rate.
- In other words, the age is approximately:

$$
\text{Age} \approx \frac{1}{H_0}
$$

#### Your task:

1. Convert $H_0$ from units of km/s/Mpc to units of $\text{s}^{-1}$.
   - For reference:  
     - 1 Mpc = $3.086 \times 10^{19}$ km  
     - 1 year = $3.154 \times 10^{7}$ seconds
2. Compute the age of the universe in years using the inverse of $H_0$.
3. Compare your estimate to the currently accepted age of the universe (~13.8 billion years).
4. Reflect on why Hubble‚Äôs original estimate of $H_0$ implied a much younger universe.


In [None]:
### your answer here ###

Interpretation:

- Hubble‚Äôs original estimate of $H_0$ implied an age of only about 2 billion years, much younger than the currently accepted age of about 13.8 billion years.
- This discrepancy arose because of:
  - Overestimated $H_0$ due to errors in distance measurements and calibration.
  - Limitations in technology and understanding at the time.
  - Assumptions of constant expansion rate, ignoring cosmic acceleration or deceleration.


This estimate gives a rough age of the universe assuming a constant expansion rate. Historically, Hubble's original value of $H_0 \approx 500$ km/s/Mpc implied an age of only about 2 billion years, which was already known to be too short‚Äîeven in the 1920s‚Äîbecause it conflicted with geological and stellar estimates of Earth's and stars' ages.

Today, more accurate measurements give:

$$
H_0 \approx 70 \, \text{km/s/Mpc}
$$

This leads to an inferred age of the universe of about 13.8 billion years, which is consistent with:

- **Stellar evolution models**, which estimate the ages of the oldest stars and globular clusters.
- **Radiometric dating** of Earth and Moon rocks, which place the age of the solar system at about 4.6 billion years.
- **Cosmic Microwave Background (CMB)** measurements, such as those from the Planck satellite, which provide precise constraints on cosmological parameters including the age of the universe.
- **Nucleosynthesis** predictions, which relate the abundance of light elements like helium and deuterium to conditions in the early universe.

These independent lines of evidence converge around an age of 13.7‚Äì13.8 billion years, lending confidence to modern values of $H_0$.

## <p style="background-color: #f5df18; padding: 10px;"> üõë Plotting a Color‚ÄìColor Diagram: Galaxies vs. Quasars</p>

Load the file `SDSS_2020.csv` as a Pandas DataFrame. This file contains data on astronomical objects observed by the Sloan Digital Sky Survey (SDSS), including their brightness in different filters and classifications as stars, galaxies, or quasars.

You will **color‚Äìcolor diagram** diagram ‚Äî a plot that compares how bright objects appear in one color vs. another ‚Äî to visualize the distribution of **galaxies** and **quasars** in color space. These diagrams are useful in astronomy because different types of objects (like stars, galaxies, and quasars) often occupy distinct regions of color‚Äìcolor space based on their physical properties, such as temperature, age, and emission mechanisms.

### Instructions

1. **Load the SDSS data** into a DataFrame.  
2. **Compute two colors** for each object:  
   - `u ‚àí g` (difference between the u-band and g-band magnitudes)  
   - `g ‚àí r` (difference between the g-band and r-band magnitudes)  
3. **Filter the data** by creating masks to select only objects where the `class` column equals `'GALAXY'` or `'QSO'` (quasars).  
4. **Create a scatter plot**:  
   - **X-axis:** `u ‚àí g`  
   - **Y-axis:** `g ‚àí r`  
   - Use different colors or symbols to distinguish **galaxies** and **quasars**

---

### Questions

- Based on your plot, how are quasars and galaxies distributed in color-color space?

## Background Information

- **Colors** in astronomy are differences between magnitudes in two filters. A smaller color value means the object is **<font color="blue"><b>bluer</b></font>**, which typically indicates it‚Äôs **hotter** or **younger**. A larger color value means the object is **<font color="red"><b>redder</b></font>**, which typically indicates it‚Äôs **cooler** or **older**.
- **SDSS filter wavelengths**:
  - `u` (ultraviolet): ~300‚Äì400 nm  
  - `g` (blue/green): ~400‚Äì550 nm  
  - `r` (red): ~550‚Äì700 nm
- **Quasars (QSOs)** are **active galactic nuclei (AGN)** ‚Äî powered by material falling into a supermassive black hole. They emit strong UV light from very hot gas, making them **bluer** than most galaxies.
- **Galaxies** contain a mix of old and young stars, but many (especially inactive ones) are **redder**, as older stars dominate their light.

---

In [None]:
# Load the data using Pandas
data = #

# This line reduces the dataset from ~50,000 to ~5,000 objects by taking every 100th row to improve visualization clarity
data = data[::100]

# Extract colors and spectral class
ug = # compute u - g color
gr = # compute g - r color
spec_class = # access 'class' column

# Create boolean masks to select galaxies and quasars only
galaxies = (spec_class == 'GALAXY')
qsos = (spec_class == 'QSO')

# Prepare the plot
fig, ax = #

# Set limits for x and y axes
ax.set_xlim(-0.5, 2.5)
ax.set_ylim(-0.5, 1.5)

# Plot color distribution for galaxies
ax.scatter(#

# Plot color distribution for quasars
ax.scatter(#

# Add legend and axis labels
ax.legend(loc=2)
ax.set_xlabel('$u-g$')
ax.set_ylabel('$g-r$')

plt.show()


## <p style="background-color: #f5df18; padding: 10px;"> üõë Visualizing the Whirlpool Galaxy Part I </p>

In this exercise, you'll practice loading astronomical images and visualizing them using `matplotlib.pyplot.imshow()`.

### üåå Background

We'll use real **Hubble Space Telescope** data of the **Whirlpool Galaxy (M51)** taken in three different filters. Each filter name follows a standard format: `F###X`, where `###` is the central wavelength in **nanometers (nm)**, and `X` indicates the **bandwidth** (e.g., `W` for **wide**).

- **F435W (B)**: captures blue light  
- **F555W (V)**: captures green/yellow light  
- **F814W (I)**: captures red/near-infrared light  

These images are stored as `.fits` files in a local `data/` folder with the synax `h_m51_filter_s20_drz_sci.fits` where `filter` can be `b`, `v`, or `i`.


### Instructions:

1. Choose one of the three images and load it using the provided `astropy.io.fits` code to extract the image data ‚Äî we'll discuss `astropy` in more detail on Thursday and Friday.
2. The images are composed of arrays of pixel values with a scale of 0.20 arcseconds per pixel. Take the **logarithm** of the pixel values using `np.log10(image_data)` to enhance faint features.
3. Use `plt.imshow()` to visualize the image.


In [None]:
from astropy.io import fits

# Load the FITS file
fits_file = # your solution here
with fits.open(fits_file) as hdul: # this opens the fits (flexible image transporter file)
    image_b = hdul[0].data # this extract the pixel data

# Display the image
plt.figure(figsize=(6, 6)) # creates figure with size 6in by 6 in
plt.imshow( # your solution here
plt.title(# your solution here
plt.show()


## <p style="background-color: #f5df18; padding: 10px;"> üõë Visualizing the Whirlpool Galaxy Part II </p>


Now, you'll use the `make_lupton_rgb` function from the `astropy.visualization` library to create a create a true-color composite (RGB) image of the Whirlpool Galaxy (M51). For convenience, a helper function has been provided to load images from FITS files.


### Instructions

1. Use the helper function `load_fits` to load the three FITS files:
   - `"h_m51_b_s20_drz_sci.fits"` (F435W, blue)
   - `"h_m51_v_s20_drz_sci.fits"` (F555W, green)
   - `"h_m51_i_s20_drz_sci.fits"` (F814W, red)

2. Call `make_lupton_rgb(image_r, image_g, image_b, stretch=..., Q=...)` to combine the three filter images into one RGB image.

   Assign the filters to RGB channels as follows:
   - Red channel ‚Üí F814W
   - Green channel ‚Üí F555W
   - Blue channel ‚Üí F435W

 **Suggestion:** If one color (like red) appears too strong in your RGB image, try adjusting the relative contribution of each channel by multiplying or dividing the image arrays before passing them to `make_lupton_rgb()`.  
   Example:  
   ```python
   image_r_adjusted = image_r / 2.0
   ```

3. Use the `stretch` and `Q` parameters to control the brightness and contrast of the image ‚Äî use tab completion or `help(make_lupton_rgb)` to learn more about these parameters.

4. Display the final RGB image using `plt.imshow()`.

In [None]:
def load_fits(fits_file):
    """
    Loads a .fits file and extracts the image data.

    Parameters:
        fits_file (str): File path to the .fits image.

    Returns:
        image (2D array): The pixel data array.
    """
    with fits.open(fits_file) as hdul:
        image = hdul[0].data
    return image

In [None]:
from astropy.visualization import #

# Load images using the helper function
image_b =  #
image_v =  #
image_i =  #

## suggestion: try rescaling the pixel valeus

# Create RGB composite image
rgb_image = make_lupton_rgb( #

# Plot the image
plt.figure(figsize=(8, 8))
plt.imshow(#
plt.title(#
plt.show()

## üîî Saving your plot to a file
---

If you are satisfied with the plot you see you may want to save it to a file,
perhaps to include it in a publication. There is a function in the
matplotlib.pyplot module that accomplishes this:
[savefig](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html).
Calling this function, e.g. with

```python
plt.savefig('my_figure.png')
```

will save the current figure to the file `my_figure.png`. The file format
will automatically be deduced from the file name extension (other formats
are pdf, ps, eps and svg).

Note that functions in `plt` refer to a global figure variable
and after a figure has been displayed to the screen (e.g. with `plt.show`)
matplotlib will make this  variable refer to a new empty figure.
Therefore, make sure you call `plt.savefig` before the plot is displayed to
the screen, otherwise you may find a file with an empty plot.

When using dataframes, data is often generated and plotted to screen in one line.
In addition to using `plt.savefig`, we can save a reference to the current figure
in a local variable (with `plt.gcf`) and call the `savefig` class method from
that variable to save the figure to file.

```python
data.plot(kind='bar')
fig = plt.gcf() # get current figure
fig.savefig('my_figure.png')
```

## üîî Making your plots accessible
---

Whenever you are generating plots to go into a paper or a presentation, there are a few things you can do to make sure that everyone can understand your plots.

- Always make sure your text is large enough to read. Use the `fontsize` parameter in `xlabel`, `ylabel`, `title`, and `legend`, and [`tick_params` with `labelsize`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.tick_params.html) to increase the text size of the numbers on your axes.
- Similarly, you should make your graph elements easy to see. Use `s` to increase the size of your scatterplot markers and `linewidth` to increase the sizes of your plot lines.
- Using color (and nothing else) to distinguish between different plot elements will make your plots unreadable to anyone who is colorblind, or who happens to have a black-and-white office printer. For lines, the `linestyle` parameter lets you use different types of lines. For scatterplots, `marker` lets you change the shape of your points. If you're unsure about your colors, you can use [Coblis](https://www.color-blindness.com/coblis-color-blindness-simulator/) or [Color Oracle](https://colororacle.org/) to simulate what your plots would look like to those with colorblindness.

# <p style="background-color: #f5df18; padding: 10px;"> üóùÔ∏è Key points</p>
---

- [`matplotlib`](https://matplotlib.org/) is the most widely used scientific plotting library in Python.
- Plot data directly from a Pandas dataframe.
- Select and transform data, then plot it.
- Many styles of plot are available: see the [Python Graph Gallery](https://python-graph-gallery.com/matplotlib/) for more options.
- Can plot many sets of data together.