# Physics 129 Final Project #2

## Parity Violation In Electron Scattering

### Learning objectives
In this question you will:

- Introduce the concept of a parity violating asymmetry
- Gain experience in data analysis using a data sample from the E158 experiment at SLAC
- Explore strategies for correcting measurements for systematic variations
- Use the measured correlations to translate between a "raw" (aka measured) asymmetry and corrected (aka "true") asymmetry
- Determine confidence intervals for measured quantities


The E158  experiment at SLAC measured a parity-violating asymmetry in M&ouml;ller (electron-electron) scattering. This was a fixed-target experiment, which scattered longitudinally-polarized electrons off atomic (unpolarized)
electrons in a 1.5m liquid hydrogen target.   A schematic view of the experiment can be found below.

<img src="E158.png" alt="Drawing" style="width: 600px;"/>



More details on E158 are available at https://www-project.slac.stanford.edu/e158/experiment.html
The paper describing the final E168 measurement of the parity violating asymmetry  (Phys.Rev.Lett.95:081601,2005) is available at https://arxiv.org/pdf/hep-ex/0504049.pdf



The file provided here contain a snapshot of 10,000 "events" from this experiment (overall, the experiment collected almost 400 million such events over the course of about 4 months).
Each event actually records a pair of pulses: one for the right-handed electron and one for the left-handed electron. For each event, 4 variables are recorded:

- $\bf \text{Counter}$: A unique number labeling the event

- $\bf \text{ Asym:}$ The "raw" cross section asymmetry from one of the detector channels (there are 50 of these overall). The cross section asymmetry is defined as:

$$ A_{raw} = \frac{\sigma_R -\sigma_L}{\sigma_R +\sigma_L} $$

The asymmetry is recorded in units of PPM (parts per million). It is called "raw" because corrections due to the difference in beam properties at the target are not yet applied (see below). 

- $\bf \Delta X$:  The beam position at the target in the $x$-direction in microns (with the convention that the beam is traveling along Z).

- $\bf \Delta Y$: The beam position at the target in the $y$-direction in microns}



### Note:  
This file contains only a small subset of the data that  E168 collected.  It is NOT sufficient to measure the parity violation signal expected in the Standard Model. The goal of this part of the project is to understand how E168 removes systematic biases from the data.   Later in this notebook you will access a summary file that  will allow you to calculate the asymmetry fron the full 2003 E168 data run (which corresponds to about 60% of the full E168 data sample).

### Part 1. Importing and Analyzing the Raw Data

First, import the data as 3 numpy arrays, one for the asymmetry, and one for the x and y coordinates, respectively. Plot the distribution of these three parameters. Compute the mean of the raw asymmetry (Asym) distribution and its statistical uncertainty.

In [5]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt

# Read in the data
with open("asymdata.txt","r") as file:
    lines = file.readlines()

# This is the header
count = []
A = []
Dx = []
Dy = []
for line in lines:
    a = line.split(" ")
    k = 0
    for i in range(len(a)):
        for j in a[i]:
            if j.isdigit():
#                 print(j)
#                 print(np.float64(a[i]))
#                 print(k)
#                 print("\n")
                if k == 0: count.append(int(a[i]))
                if k == 1: A.append(np.float64(a[i]))
                if k == 2: Dx.append(np.float64(a[i]))
                if k == 3: Dy.append(np.float64(a[i]))
                k += 1
                break


### Part 2. Exploring Correlations 

Compute the correlation coefficients: ${\rm Corr}(Asym,\Delta X)$, ${\rm Corr}(Asym,\Delta Y)$, and ${\rm Corr}(\Delta X,\Delta Y)$. Do NOT use built in numerical functions to do this. Instead, write a helper function `GetCorrelation(A,B)` which takes in 2 vectors, `A` and `B`, and returns the correlation coefficient. You may need to look up the definition of the correlation coefficients to do this. Which variables are approximately independent of each other?

In [6]:
from mpl_toolkits.mplot3d import Axes3D
def GetCorrelation(A,B):
    N = len(A)
    if (N != len(B)):
        print("Error. Lengths of vectors don't match")
        return
    # preset correlation to zero
    corr = 0.0
    # add your code here to find the correlation between A and B and fill the variable corr with that value
    return corr


Now make scatter plots comparing $(Asym,\Delta X)$, $(Asym,\Delta Y)$, and $(\Delta X,\Delta Y)$. Do the variables seem correlated in the ways you expect?

### Part 3. Linear Regression

A better estimator of the parity-violating asymmetry is the quantity that corrects for possible differences in the right- and left-handed electron pulses. We want to define a quantity:

  $$  A_{PV} = A_{raw} - (a_x \Delta X + b_x) - (a_y \Delta Y + b_y)$$
  with coefficients $a_i$ and $b_i$ defined in such a way that
  $A_{PV}$ is independent of $\Delta X$ and $\Delta Y$.
  This is called a linear regression. The lines $A_{raw} = a_y \Delta Y + b_y$ and $A_{raw} = a_x \Delta Y + b_x$ are the lines of best fit to the data for our two spatial parameters. For now, we will assume that $a_x$ and $b_x$ are negligible and focus only on $a_y$ and $b_y$. Compute these two fit parameters and their uncertainties using a least squares fit. Here you may use pre-built functions to build the correlation and covariance matrices. Using your values for $a_y$ and $b_y$, plot the line of best fit on top of the data and see how well it matches the data.

In [7]:
# put code here

Now perform the regression by subracting off the line of best fit from every event in the data. Create another scatter plot of $Asym$ versus $\Delta Y$ to check that the $\Delta Y$ dependence has been removed, then compute the mean of the regressed asymmetry distribution $A_{PV}$ and its statistical uncertainty. Then, plot the histogram of the raw asymmetry distribution on top of a histogram of the regressed distribution. Is there an improvement in the resolution? By what factor has the uncertainty decreased?

In [8]:
# put code here

Compute the 90\% \; confidence interval for the true value of $A_{PV}$

Now, repeat the same regression technique for $\Delta X$ using the data you've already regressed as the raw data, i.e. Find the line of best fit $A_{PV} = a_x \Delta Y + b_x$ and rotate away the dependence on $\Delta X$. By how much does the resolution improve this time? Based on the correlation coefficients you calculated in Part 2, does this make sense?

In [9]:
# put code here

### Part 4. Multivariate Regression

The last type of analysis you will do takes into account the fact that $\Delta X$ and $\Delta Y$ are not independent, as you implicitly assumed in Part 3. We will now take the fact that $\Delta X$ and $\Delta Y$ are indeed correlated. Calculate the correlation matrix $C_ij$ in the 2-dimensional basis $\{\Delta X, \Delta Y \}$. Next, diagonalize this matrix. The change-of-basis matrix you get out is the matrix which "rotates out" all of the $\Delta X, \Delta Y$ dependence from $A_{raw}$. Now, take every event from $A_{raw}$ and write it as a vector in the aforementioned basis. Apply the rotation to this vector and call this $A_{PW}$. Compute the mean of your final new distribution and its statistical uncertainty. How does this uncertainty compare to those from Part 3?

In [10]:
#put code here

Again, compute the 90\% confidence interval for the true value of $A_{PV}$. Using the formulae given in the Phys. Rev. Lett. cited above, explain why the data you have analyzed to so far is not sufficient to measure a significant asymmetry if the Standard Model is correct

### Part 5. Measuring the Asymmetry for the full Run 3 dataset

The E168 experiment has performed a multivariant regression similar to that you performed above.  Because the beam conditions and detector conditions vary with time, this regression is done separately for each "sub-run: (corresponding to a period of time over which conditions are stable).  The cell below imports the asymmetry data obtained for each sub-run after the regression is done.  The data fields are: 
- subrun:  a number lableing the time period during which the data were taken.  Larger numbers are later times
- Npairs:  Number of pulse pairs in the subrun
- MeasuredAsym: The asymmetry for this subrun
- UncertAsym: The statistical uncertainty on the asymmetry from this subrun
- RMSAsym: The RMS of the asymmetry data for this subrun

In [11]:
import numpy as np
from matplotlib import pyplot as plt

# Read in the data
with open("MollerM_MollerEStat.csv","r") as file:
    lines = file.readlines()

# This is the header
Subrun = []
Npairs = []
MeasuredAsym = []
UncertAsym = []
RMSAsym = []
for line in lines:
    a = line.split(",")
    k = 0
    for i in range(len(a)):
        for j in a[i]:
            if j.isdigit():
                if k == 0: Subrun.append(int(a[i]))
                if k == 1: Npairs.append(int(a[i]))
                if k == 2: MeasuredAsym.append(np.float64(a[i]))
                if k == 3: UncertAsym.append(np.float64(a[i]))
                if k == 4: RMSAsym.append(np.float64(a[i]))
                k += 1
                break

In [12]:
# Uncomment to test the parsing of the csv
#print(MeasuredAsym)

### Extensions

Here are some ideas of how you could explore the data further, but you are welcome to take your project in a different direction.

- **Time Dependence** One worry when experiments are making sensitive measurements is that the detector response might change with time.   Check to see if there is any time dependence in the measured cross section or asymmetry
- **Consistency of the uncertainties** Is the scatter in the data consistent with what you expect from statistical uncertainties?  
- **Visualization**  Think about ways to display the data to make your analysis strategy clear to the reader.  Provide several different ways of showing the same result and explain which is your favorite.
- **Determination of the weak mixing angle** Use the formulae in the Phys. Rev. Lett to turn your asymmetry measurement into a measurement of the weaik mixing 