# *YOUR NAME HERE*

# In-Class Model Fitting Challenge

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas
import scipy.optimize as optimization

## Introduction

This exercise builds on Lab 4, where you compared Hubble's original 1957 data on the distances and recessional velocities of nearby galaxies and compared it to the modern data for those galaxies. These data demonstrate a phenomenon known as "Hubble's Law" whereby galaxies appear to be moving away from us with greater recessional velocities the more distant they are from us. 

The slope of the best fit line to these data, known as the "Hubble Constant" is a very important, and hotly debated, quantity that tells us about the expansion rate of the universe as well as its' age.

The exercise below requires knowledge that was introduced in the ModelFitting_Intro notebook that you were asked to look over before class today. If you have not yet reviewed it, please do so before beginning this exercise. If you are in this boat, you may not finish the exercise during today's class period, and should complete it after class and submit it before next Monday. 

The cells below pull the original and modern values of the relevant quantities into this notebook and set them up appropriately for the exercise. Take a moment to make sure you understand what they're doing. 

***Your may discuss your methods with your peers during the class period and ask questions of me, but this exercise should be completed individually.***

In [None]:
dists = np.array([0.032,0.034,0.214,0.263,0.275,0.275,0.45,0.5,0.5,0.63,0.8,0.9,0.9,
         0.9,0.9,1.0,1.1,1.1,1.4,1.7,2.0,2.0,2.0,2.0])#Mpc
vels = np.array([170.,290,-130,-70,-185,-220,200,290,270,200,300,-30,650,150,500,920,450,500,500,960,500,850,800,1000]) #km/sec

In [None]:
cols = ['Obj Name', 'Redshift', 'Redshift Uncert', 'Dist Mean (Mpc)', 'Dist Std Dev (Mpc)', 'Num Obs']
df = pandas.read_csv('cat.txt', delimiter ='|', skiprows=2, header = 0, names = cols, skipinitialspace=True)
redshift = df["Redshift"].tolist()
redshift_uncert = df["Redshift Uncert"].tolist()
dists2 = df["Dist Mean (Mpc)"].tolist()
dists2_uncert = df["Dist Std Dev (Mpc)"].tolist()

In [None]:
def z_to_v(z):
    vels = []
    c = 3e5
    for entry in z:       
        beta = ((entry+1)**2-1)/((entry+1)**2+1)*c
        vels.append(beta)
    return(np.array(vels))

In [None]:
vels2 = z_to_v(redshift)
vels2_uncert = z_to_v(redshift_uncert)

In [None]:
#line with an intercept
def slopeintfunc(x,sl,incpt):
    return sl*x+incpt

<div class=hw>

## Exercise 

Time for fitting! Use the lecture notes on Model fitting as a guide to help you complete the exercises below. 
    
***I strongly encourage you to avoid copy/pasting from one notebook to the other. Rather, try to understand what the Model Fitting Intro is demonstrating and then apply it in this notebook. This helps to ensure that you understand what everything is doing.*** 

a) Fit a linear model to Hubble's data and to the modern data. Make a plot showing both datasets and both fit lines. The plot should include a legend with both the points and the lines. The lines should be labeled in the legend with their y=mx+b equations. 

b) Now, let's fit a linear model ***to the modern data only*** that takes the error bars in the recessional velocities into account in the fit. The problem here though is that the uncertainties in redshifts/recessional velocities are VERY small for these galaxies. So small in fact that when you overplot error bars on the data points you can't even see them (you can do this to verify). So to demonstrate differences between weighted and unweighted fits here, let's inflate them by a factor of 50. Overplot both the unweighted and weighted lines together with the modern data (with y error bars) and an appropriate legend. 

c) Discuss at least one trend or effect that you see in each graph. As always, your explanations need not be lengthy, but they should be ***clear, supported with references to the plot, and specific***. 

d) We won't do fitting with x and y error bars, but you can easily make a plot that shows errors in both quantities using plt.errorbar. Do this using the TRUE errors in velocity and distance (not the inflated values), and use your plot to make an argument for whether the "Hubble's Law" line is a good fit to the data. 

In [None]:
#calculate the best fits (hint: use optimization.curve_fit, as in the model fitting intro)

In [None]:
#A plot for you to use as a starting point for fitting
#overplot your fit lines and label them
f = plt.plot(dists,vels, 'mo', label="Hubble's data")
plt.plot(dists2,vels2, 'cs', label="modern data")
plt.xlabel("Distance in Mpc")
plt.ylabel("Recessional Velocity in km/sec")
plt.title("Hubble's Law")
l = plt.legend(loc="lower right")

In [None]:
#calculate a fit with (inflated) errors in velocity accounted for

In [None]:
#plot with (inflated) y error bars and best fit line

***Document at least two trends or effects that you note in the graph above here***

In [None]:
#plot the true error bars in velocity AND distance.

***Make a data-driven argument about the quality of your fit to the data***

In [None]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../../custom.css", "r").read()
    return HTML(styles)
css_styling()