<a href="https://colab.research.google.com/github/fedhere/scratch/blob/main/DSPSmidterm2019_instructions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# READ THESE INSTRUCTIONS CAREFULLY AND IN FULL BEFORE YOU PROCEED TO WORK!

### CONTEXT :  Investigation of the nature of stellar explosions

10 stars explode every second in the sky, most of them unseen as they are too far for their light to reach us. When they do, we can measure the property of the explosion and we find that there is a diversity of properties. Through the study of their spectra, we can measure the chemical composition of the progenitor star and the energy released in the explosion. We will focus here on 3 types of stellar explosions, or Supernovae (shortened to SN, for the singular, SNe for the plural). SNe Ic, SNe Ic-BL (which stands for "broad-lined", describing the broad quality of their spectral features), and SNe Ic-BL accompanied by a Gamma Ray Burst (GRB) which is an emission at higher electromagnetic frequencies (gamma rays). The details of this taxonomy are not important at this point (you can read about them [here](https://ned.ipac.caltech.edu/level5/March03/Filippenko/frames.html) if you wish, but you should really wait until _after_ the midterm!), but suffice to say that these explosions have increasing energy release Ic < Ic-BL < Ic-BLGRB.

We do not know why the energy released is different in spite of similar observed chemical properties. In a recent paper, we asked if the *environment* in which they explode is the same: the environment from an astrophysical perspective means the chemical environment of the region of the Galaxy where the explosion happens, which we measure relative to the chemical environment near our Sun: the *Metallicity*.




In the files below, we report the metallicity measured in several ways for the environment of several stellar explosions.

The specific forumlae used to measure metallicity are called "metalicity scales" and the ones used in the dataset are:

`D31, KD02comb, PP04, M08, Emv, M13`

***Focusing on the metallicity measured as "KD02comb" find a statistical answer to the question: do the three kinds of explosions, SN Ic, SN Ic-BL, SN Ic-BLGRB, come from the same distribution of chemical properties?***


Note: "KD02comb" stands for [Kewely and Dopite 2012](https://www2.scopus.com/record/display.uri?eid=2-s2.0-0013467486&origin=inward&txGid=fcab97eb7a9f1602e0512094f61c91ed), combined scale

# Objective: assessing differences in the chemical environment of stellar explosion

PLEASE SEE THE HINTS TO EACH TASK [here](https://docs.google.com/document/d/1V9PVb6tK0yuCSTyCILkKXZD9--5PVIvhKDPZIiREdtQ/edit?usp=sharing) (no points loss for hints!)

You will receive points for each of the marked deliverables, *and for your discussion of your findings*.

## 1-6 are the data wrangling portion of the exercise, and are worth up to 60% of the points
*In the data wrangling portion points are awarded for data processed so that they support the analysis in parts 7-11. This portion is likely going to be the most time consuming.*

## 7-11 are the analysis portion of the exercise (including exploratory and visual analysis), and are worth up to 50% of the points
*In the data analysis portion points are awarded for correct analysis, conclusions supported by the analysis* ***and expressed in full sentences***, *and plots that support the analysis and conclusions.*

*That sums up to 110%, but the maximum grade is reached at 100%. This means you have a better chance to get 100% on your test, and if you exceed 100% the rest will be accounted for as extra credit and help your cumulative score at the end of the semester.*

## Data Wrangling (each task is 10 points)
    1. Read in the data for SN metallicity

    2. Read in the GRB metallicity file
    
    3. Read in the classification of Ic and Ic-BL explosions
    
    4. Merge the metallicity and classification files for SNe
    
    5. Separate the samples of SN Ic and SN IcBL
    
    6. Create a second "abridged" GRB sample by removing uncertain classifications


## Analysis (visual and quantitative) (each task is 10 points)

    7. Plot the distribution of metallicities (KD02comb) for each of the (4) samples
    
    8. Calculate the central tendency of the metallicity for each of the samples and its uncertainty. Discuss if the means are statistically different based on their uncertainties (feel free to add plots, and chose your preferred way to measure the central tendency and spread)
    
    9. Compare the distribution of Ic with that of IcBL. You want to determine if they are the same (or more exactly "extracted from the same population)
   
    10. Compare the distribution of SNe Ic with that of GRBs and that of SNe IcBL with GRBs using, in both cases, the full GRB sample
    
    11. Repeat the comparison of SNe IcBL with GRBs using now the abridged GRB sample
    
**Extra Credit**. The metallicity measurements include uncertainties which are asymmetric. Do you think these are included in the analysis I outlined above? if not, how would you incorporate these uncertainty in your analysis? (Describe what you would do and, of course, if you have time include this step in the analysis.)"

    
     



### DATA WRANGLING

# 1 Read in the SN metallicity file

You can find the data in the github repository

https://github.com/fedhere/DSPS_2023_midterm
Clean your dataset removing all the columns you will not need in the analysis and all invalid values in the resulting dataframe
_the deliverable in this step is a table showing the first 7 rows of the dataframe_


[hints](https://docs.google.com/document/d/1V9PVb6tK0yuCSTyCILkKXZD9--5PVIvhKDPZIiREdtQ/edit#heading=h.wqv9503ir8mr)

You can find the solution at this URL: https://github.com/fedhere/scratch/blob/main/data/SNmetallicity_final.csv - if you use this solution you will not get the  points for this deliverable (but you can take the shortcut now and go back later to this! and if you do get the solution later you will get the points!)


In [None]:
# imports here
...

In [None]:
#delete
...
sne


Populating the interactive namespace from numpy and matplotlib


Unnamed: 0,SN,D31,D13m,D13p,KD02comb,KD02m,KD02p,PP04,PP04m,PP04p,M08,M08m,M08p,Emv,mvm,mvp,M13,M13m,M13p
0,09ps,8.309,0.088,0.073,8.347,0.131,0.116,8.362,0.025,0.026,8.472,0.05,0.042,0.292,0.078,0.083,8.287,0.016,0.017
1,10bip,8.378,0.05,0.042,,,,8.302,0.013,0.013,8.58,0.027,0.027,0.504,0.061,0.065,8.246,0.01,0.009
2,10gvb,7.945,0.231,0.191,8.28,0.135,0.129,8.102,0.062,0.045,8.161,0.185,0.118,0.329,0.058,0.061,8.175,0.008,0.009
3,10svt,8.296,0.079,0.07,8.479,0.061,0.049,8.218,0.016,0.015,8.331,0.042,0.037,0.032,0.02,0.021,8.191,0.013,0.012
4,12gzk,8.048,0.091,0.072,7.943,0.053,0.046,8.071,0.017,0.015,7.941,0.052,0.046,0.081,0.022,0.022,,,
5,12hni,8.373,0.024,0.025,8.457,0.007,0.004,8.211,0.006,0.006,8.451,0.013,0.013,0.091,0.024,0.025,8.186,0.008,0.008
6,09sk,8.368,0.007,0.006,8.486,0.008,0.009,8.335,0.002,0.002,8.522,0.004,0.004,0.111,0.007,0.007,8.269,0.003,0.003
7,10aavz,8.516,0.193,0.151,8.631,0.133,0.078,8.481,0.051,0.044,8.728,0.127,0.095,0.0,0.0,0.0,8.366,0.034,0.029
8,10bzf,8.152,0.27,0.188,8.625,0.218,0.146,8.333,0.064,0.052,8.451,0.161,0.117,0.0,0.0,0.0,8.269,0.039,0.033
9,10ciw,8.545,0.07,0.073,8.572,0.084,0.074,8.4,0.06,0.096,8.708,0.041,0.037,0.282,0.083,0.092,8.312,0.04,0.063


In [None]:
...


In [None]:
sne.head()

Unnamed: 0,SN,KD02comb
0,09ps,8.347
2,10gvb,8.28
3,10svt,8.479
4,12gzk,7.943
5,12hni,8.457


# 2.  Read in the GRB metallicity file

You can find the data in the github repository

https://github.com/fedhere/DSPS_2023_midterm
Clean your dataset of all the columns you will not need and all invalid values in the remaining dataframe
_the deliverable is a table like the one below_

[hints](https://docs.google.com/document/d/1V9PVb6tK0yuCSTyCILkKXZD9--5PVIvhKDPZIiREdtQ/edit#heading=h.wqv9503ir8mr)

You can find the solution at https://github.com/fedhere/scratch/blob/main/data/grb_clean.csv - if you use this solution you will not get the 10 points for this deliverable (but you can go back later to this! and if you do get the solution later you will get the points!)


In [None]:
...

In [None]:
grb.head()

Unnamed: 0,SN,KD02comb
0,GRB980425/SN1998bw,8.485
1,XRF020903,8.183
2,GRB030329/SN2003dh,8.073
3,GRB031203/SN2003lw,8.647
4,GRB/XRF060218/SN2006aj,8.125


In [None]:
grb.shape

(11, 2)

# 3 Read in the classification of Ic and Ic-BL explosions and process it to extract the SN name

Modify the SN name column to be consistent with the name column in the SNe file

[hint](https://docs.google.com/document/d/1V9PVb6tK0yuCSTyCILkKXZD9--5PVIvhKDPZIiREdtQ/edit#heading=h.isi30wsuhrhc)

You can get the solution at https://github.com/fedhere/scratch/blob/main/data/SNtype.csv


In [None]:
...
SNtypes.head()

Unnamed: 0_level_0,PTFname_region,SNtype
galnum,Unnamed: 1_level_1,Unnamed: 2_level_1
1,PTF10xem-host-SNsite,Ic-BL
2,PTF10wal-host-SNsite,Ic
3,PTF09iqd-host-HII3,Ic
4,PTF10xik-host-SNsite,Ic
5,PTF10ood-host-SNsite,Ic


In [None]:
...

In [None]:
SNtypes.head()

Unnamed: 0_level_0,SNtype,name
galnum,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Ic-BL,10xem
2,Ic,10wal
3,Ic,09iqd
4,Ic,10xik
5,Ic,10ood


# 4 Merge the metallicity and classification files

Merge the SN names and SN metallicity files based on the SN name

[hints](https://docs.google.com/document/d/1V9PVb6tK0yuCSTyCILkKXZD9--5PVIvhKDPZIiREdtQ/edit#heading=h.vu3ga15gdvc0)

You can get the solution at this url: https://github.com/fedhere/scratch/blob/main/data/SNmetallicity_final.csv

In [None]:
...

In [None]:
sneall.head()

Unnamed: 0,SN,KD02comb,SNtype
0,09ps,8.347,uncertain
1,10gvb,8.28,uncertain
2,10svt,8.479,uncertain
3,12gzk,7.943,uncertain
4,12hni,8.457,uncertain


# 5 Define the 3 samples GRB, IcBL, and Ic

Split the SN sample so that you have 2 dataframes: snIc and snIcBL

[hint](https://docs.google.com/document/d/1V9PVb6tK0yuCSTyCILkKXZD9--5PVIvhKDPZIiREdtQ/edit#heading=h.bojijnl6aiq5)

In [None]:
...

In [None]:
snIc.head()

Unnamed: 0,SN,KD02comb,SNtype
19,09iqd,8.802,Ic
20,10bhu,8.751,Ic
21,10fmx,8.812,Ic
22,10hfe,8.787,Ic
23,10hie,8.108,Ic


In [None]:
snIcBL.head()

Unnamed: 0,SN,KD02comb,SNtype
5,09sk,8.486,Ic-BL
6,10aavz,8.631,Ic-BL
7,10bzf,8.625,Ic-BL
8,10ciw,8.572,Ic-BL
9,10qts,8.033,Ic-BL


# 6 Create a second "abridged" GRB sample by removing uncertain classifications

The classification of SN 2013dx ('13dx') is uncertain: remove it from the grb sample to create an "abridged" sample (saving the original sample as well)

You can get help with this [here](https://docs.google.com/document/d/1V9PVb6tK0yuCSTyCILkKXZD9--5PVIvhKDPZIiREdtQ/edit#heading=h.e1jtfsuajcdv)


In [None]:
...

In [None]:
grbabridged.shape

(10, 2)

#### ANALYSIS

# 7  Plot the distribution of metallicities  for each of the  samples
    
   

[hint](https://docs.google.com/document/d/1V9PVb6tK0yuCSTyCILkKXZD9--5PVIvhKDPZIiREdtQ/edit#heading=h.pwkb8bo4nrla)

In [None]:
...

#  8. Calculate the mean metallicity for each of the samples  and its uncertainty

Don't just spit out numbers, but properly write what the numbers are in a well formatted statement


In [None]:
...

# 9 Compare the distribution of Ic and IcBL

Describe the test you are about to run and why you choose it, and state its Null Hypothesis and weather it is rejected or not.

**Extra Credit** define a function that given the 2 samples runs and interprets the test reporting the result automatically. That way you can use it for all sample comparisons



In [None]:
...

# 10 Compare the distribution of Ic and Ic-BL GRB with the full GRB sample


In [None]:
...

# 11. Repeat the comparison of IcBL and GRB with the abridged GRB sample
    


In [None]:
...

   
# Extra Credit. The metallicity measurements include uncertainties which are asymmetric. How would you incorporate them in your analysis (describe what you would do and, of course, if you have time to include this step in the analysis).
