# Comparison of pW Values in CSP DR1

This code produces a plot that shows the Bronder 2008 results for pW values vs. the measurements for pWs obtained in this experiment. It also provides a linear regression model to estimate how accurate the human measurements were. First, necessary imports are made and the measured data is read in.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sndata.csp import DR1
dr1 = DR1()
import pandas as pd
import scipy.stats as stats

In [None]:
output_file_data = pd.read_csv("../SN-Spectral-Evolution/results/anish_csp.csv")
output_file_data
#prints out the table of saved data for the scripts program


In [None]:
pew_df = output_file_data['pew']
pew_sorted = output_file_data.set_index(['obj_id', 'feat_name']) #organizes columns for convenience
pew_sorted = pew_sorted['pew']
pew_sorted #This is the measured results dataframe


The next objective is to create a dataframe of values obtained through the Bronder 2008 study. The 'nan' values should not be present in the final result. In order for this dataframe to be plottable, it must be matched up with the previous one. This is done through a pandas method that will relate the object ids and pW values.

In [None]:
published = dr1.get_available_tables()
dr1_table = dr1.load_table(published[3]) #download relevant table of pWs from DR1

pub_frame = dr1_table.to_pandas()
pub_frame.rename(columns = {'SN' : 'obj_id'}, inplace = True) #creates dictionary to search through dataframe
pub_frame.set_index(['obj_id'], inplace = True)
pub_frame


In [None]:
output = output_file_data.set_index(['obj_id', 'feat_name'])
group_data = output.groupby('feat_name')
print(group_data) #This holds information used to combine the dataframe below
            

The final subplot is found below with plots for each feature in the spectra for objects. They were plotted using a for loop to iterate through each index in the combined dataframe, made of published and measured results. The linear regression for each pW is also found and shows somewhat of a linear dependence.

In [None]:
fig, axes = plt.subplots(2, 4, figsize = (10, 6))
pw_list = ['pW1', 'pW2', 'pW3', 'pW4', 'pW5', 'pW6', 'pW7', 'pW8', 'pew']

for (feat_name, feat_data), axis in zip(group_data, axes.flatten()): #zip matches data from both sets to each other
    data_final = feat_data.join(pub_frame) #combines measured and published dataframes
    nonan_frame = data_final[pw_list].dropna()
    axis.scatter(nonan_frame['pew'], nonan_frame[feat_name], marker = "x", color = "darkorange", label = feat_name)
    slope, intercept, r, p, s = stats.linregress(nonan_frame['pew'], nonan_frame[feat_name])
    axis.plot(nonan_frame['pew'], nonan_frame['pew']*slope + intercept, 'r-', label = f'{slope: .2f}x+{intercept: .2f}')
    axis.tick_params(top = True, right = True)
    axis.legend(loc = 'upper left')
    
axes[0, 0].set_xlim(50, 200)
axes[0, 0].set_ylim(50, 200)
axes[0, 1].set_xlim(0, 40)
axes[0, 1].set_ylim(0, 40)
axes[0, 2].set_xlim(50, 200)
axes[0, 2].set_ylim(50, 200)
axes[0, 3].set_xlim(50, 250)
axes[0, 3].set_ylim(50, 250)
axes[1, 0].set_xlim(20, 110)
axes[1, 0].set_ylim(20, 110)
axes[1, 1].set_xlim(0, 75)
axes[1, 1].set_ylim(0, 75)
axes[1, 2].set_xlim(50, 160)
axes[1, 2].set_ylim(50, 160)
axes[1, 3].set_xlim(0, 310)
axes[1, 3].set_ylim(0, 310)

fig.text(-0.03, 0.5, "Measured pW [$\AA$]", va = 'center', rotation = 'vertical', fontsize = 14)
fig.text(0.52, -0.03, "Published pW [$\AA$]", ha = 'center', fontsize = 14)
fig.text(0.23, 1, "Measured vs. Published Results for Features in CSP", fontsize = 16)
plt.tight_layout()