### *** Names: [Insert Your Names Here]***

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas
%matplotlib inline

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Lab 4 - Plotting and Fitting with Hubble's Law

This lab builds directly on Prelab 4. In it, you'll manipulate and fit both Hubble's original Hubble's law data and the modern data for the same galaxies. The cells below pull the original and modern values of the relevant quantities into this notebook and set them up appropriately for the exercises. Take a moment to make sure you understand what they're doing. Make sure you mount your Google Drive and point the second cell below to where you've stored your cat.txt file from the prelab. 

In [None]:
dists = np.array([0.032,0.034,0.214,0.263,0.275,0.275,0.45,0.5,0.5,0.63,0.8,0.9,0.9,
         0.9,0.9,1.0,1.1,1.1,1.4,1.7,2.0,2.0,2.0,2.0])#Mpc
vels = np.array([170.,290,-130,-70,-185,-220,200,290,270,200,300,-30,650,150,500,920,450,500,500,960,500,850,800,1000]) #km/sec

In [None]:

cols = ['Obj Name', 'Redshift', 'Redshift Uncert', 'Dist Mean (Mpc)', 'Dist Std Dev (Mpc)', 'Num Obs']
file_path = '/content/drive' ##replace/add the path to your cat.txt file
df = pandas.read_csv(file_path, delimiter ='|', skiprows=3, header = 0, names = cols, skipinitialspace=True)
redshift = df["Redshift"].tolist()
redshift_uncert = df["Redshift Uncert"].tolist()
dists2 = df["Dist Mean (Mpc)"].tolist()
dists2_uncert = df["Dist Std Dev (Mpc)"].tolist()

In [None]:
#display table (python "data frame" object)
df

---
## Exercise 1 
---
The conversion between a "[redshift](https://en.wikipedia.org/wiki/Redshift)" (z) as provided in the database and a galaxy's "recessional velocity", which is how this quantity appears in Hubble's original paper, is given by the formula below. 
$$z=\sqrt{\frac{1+\beta}{1-\beta}}$$
where $\beta$=v/c. c is the speed of light, so the quantity $/beta$ is just a fractional measure of the galaxy's speed relative to the speed of light. The laws of physics tell us that v cannot be > c, so this quanity should always be between 0 and 1. The formula can be rewritten in a form more useful for solving for v as:
$$\beta=\frac{(z+1)^2-1}{(z+1)^2+1}$$
or
$$v=\frac{(z+1)^2-1}{(z+1)^2+1}\times c$$

(a) Write a function with an appropriate docstring that applies this forumula to an input array of redshifts. Your function should return an array of velocities in units of km/sec.  
b) Apply your new function to your redshift and redshift uncertainty arrays from the modern Hubble's law data to translate them to "recessional velocities", as in Hubble's original plot  

\* Note that technically we should do some more complicated error propagation here, and we will discuss this later in this class. Luckily though, this formula is roughly equivalent to z = v/c, which means that errors in z and v can be directly translated. 

---

In [None]:
#part a here

In [None]:
#part b here

---

## Exercise 2
---

Make the following plots, with appropriate axis labels and titles. 

a) A plot of the new data similar to the one you made in exercise 5b of the prelab, only with error bars this time. Use the function plt.errorbar and inflate the errors in the modern recessional velocities by a factor of 10, because they are actually so small for these very nearby galaxies with today's measurement techniques, that we can't even see them unless we  blow them up. 
b) A plot showing both the new and old data overplotted on the same graph, with different colors for the original and modern data and a legend.   
c) A plot showing Hubble's distances vs. the new distances, with a "1 to 1" (slope =1) line overplotted  
d) A plot showing Hubble's recessional velocities vs. the new velocities, with a "1 to 1" line overplotted  
e) Discuss at least two trends that you see in the graphs and make a data-driven argument for how they might explain the discrepancy between the modern values and Hubble's. Additionally, make an argument for what kind of error you think is at play here (random or systematic?). As always, your explanations need not be lengthy, but they should be ***clear and specific***, pointing to precise statistics and features of the plots and data arrays. 

In [None]:
#Plot a here

In [None]:
#Plot b here

In [None]:
#Plot c here

In [None]:
# Plot d here

***Part e explanations here***

---

## Exercise 3
---

Time for fitting! Use the ideas you encountered in the Prelab to complete this exercise. The function below is a good place to start. 

In [None]:
#line with an intercept
def slopeintfunc(x,sl,incpt):
    return sl*x+incpt


***I strongly encourage you to avoid copy/pasting from one notebook to the other. Rather, try to understand what the Model Fitting Intro in the prelab is demonstrating and then apply it in this notebook. This helps to ensure that you understand what everything is doing.*** 

a) Fit a linear model to Hubble's data and one to the modern data. Make a plot showing both datasets and both fit lines. The plot should include a legend with both the points and the lines. The lines should be labeled in the legend with their y=mx+b equations. 

b) Now, let's fit a linear model ***to the modern data only*** that takes the error bars in the recessional velocities into account in the fit. The problem here though is that the uncertainties in redshifts/recessional velocities are VERY small for these galaxies. So small in fact that when you overplot error bars on the data points you can't even see them (you can do this to verify). So to demonstrate differences between weighted and unweighted fits here, let's inflate them by a factor of 50. Overplot both the unweighted and weighted lines together with the modern data (with y error bars) and an appropriate legend. 

c) Discuss at least one trend or effect that you see in each graph. As always, your explanations need not be lengthy, but they should be ***clear, supported with references to the plot, and specific***. 

d) We won't do fitting with x and y error bars here, but you can easily make a plot that shows errors in both quantities using plt.errorbar. Do this using the TRUE errors in velocity and distance (not the inflated values), compute the fit, and use your plot to make an argument for whether the "Hubble's Law" line is a good fit to the data. 

In [None]:
#calculate the best fits (hint: use optimization.curve_fit, as in the model fitting intro)

In [None]:
#A plot for you to use as a starting point for fitting
#overplot your fit lines and label them
f = plt.plot(dists,vels, 'mo', label="Hubble's data")
plt.plot(dists2,vels2, 'cs', label="modern data")
plt.xlabel("Distance in Mpc")
plt.ylabel("Recessional Velocity in km/sec")
plt.title("Hubble's Law")
l = plt.legend(loc="lower right")

In [None]:
#calculate a fit with (inflated) errors in velocity accounted for

In [None]:
#plot with (inflated) y error bars and best fit line

***Document at least two trends or effects that you note in the graph above here***

In [None]:
#plot the true error bars in velocity AND distance.

***Make a data-driven argument about the quality of your fit to the data***

---

# Sumbitting Prelabs and Labs for Grading

Before submitting any Google Colab notebook for grading, please follow the following steps

**1) Try running everything in one go (Runtime menu -> Restart and run all)**

Make sure the entire notebook runs from start to finish. If necessary, comment out any un-executable cells from the instructions portion of the lab so the whole notebook will execute in one go. 

**2) Restart the kernel (Runtime menu --> Restart Runtime).**

**3) Clear all output (Edit --> clear all outputs).**

**4) Make sure the names of all group members are in a markdown cell at the top of the file and submit the notebook through the Moodle link for this Lab**