# Correlation & Statistical Significance

## Pearson Correlation

Correlation coefficient (R): value in [-1,1]<br>

*p*-value:
<br>
<code>p < 0.001</code> strong<br>
<code>p < 0.05</code>  moderate<br>
<code>p < 0.001</code> weak<br>
<code>p < 0.001</code> no correlation

### *p*-value

A quick note on *p*-value. I feel like I finally understand it.

*p* is the probability that we would achieve the given results ASSUMING the null hypothesis is true. Thus, a lower *p*-value indicates that we are either **insanely lucky** to have seen what we've seen, or the <font color=#>null-hypothesis is false.** And we never assume we are insanely lucky.

The "branch of the multiverse" phrasing is really helpful here. Either, something really unlikely occured on this branch of the multiverse -- OR -- we are on a different branch of the multiverse. Being on a different branch of the multiverse is always more plausible. It is always the preferred explanation for a low *p*-value.

This metaphor is also helpful for my paradigm because I understand that the set of multiversal branches (universes) sums to 1. Thus, the relationships among the branches is *multiplicative/fracitonal/scalable*, which plays nicely into probablilty.

<font color=#ff4d4d>This text is red!</font>

I also learned how to do this ... which is not about math, but could come in handy later.

## Chi-squared Test

Let $\chi^2$ be the sum of squared deviations from the mean (scaled relative to the mean).

$$\sum\frac{(O_i-E_i)^2}{E_i}$$

I am saying "mean" for short, but it is really the "expected value." And $O_i$ is the $i^{\text{th}}$ observed value.

We can imagine that if there is a LOT of deviation from the mean, $\chi^2$ gets really big. And if all our values are expected, that is there is NO deviation from the mean, then $\chi^2 = 0$. Since the deviations are being squared, $\chi^2$ is always non-negative, without upper bound.

### Example

Here is an example of a $\chi^2$ test. In this context, we may want to see if men are more likely to react positively to a change than women, for example.

If there is no correlation, that is our null hypothesis assumes there won't be a reaction of like or dislike more common in men as compared to women, then the variation among reactions will be small enough that it fits within the empirical $\chi^2$ value.

In [4]:
import pandas as pd
from scipy.stats import chi2_contingency

# Create the contingency table
data = [[20, 30],  # Male: [Like, Dislike]
        [25, 25]]  # Female: [Like, Dislike]

# Create a DataFrame for clarity
df = pd.DataFrame(data, columns=["Like", "Dislike"], index=["Male", "Female"])

# Perform the Chi-Square Test
chi2, p, dof, expected = chi2_contingency(df)

# Display results
print("Chi-square Statistic:", chi2)
print("Degrees of Freedom:", dof)
print("P-value:", p)
print("Expected Frequencies:\n", expected)

df

Chi-square Statistic: 0.6464646464646464
Degrees of Freedom: 1
P-value: 0.4213795037428696
Expected Frequencies:
 [[22.5 27.5]
 [22.5 27.5]]


Unnamed: 0,Like,Dislike
Male,20,30
Female,25,25


The great thing about a $\chi^2$ test is that we can check all the variations of the relationships with the same data. We can see if men are more likely to react positively to the change than women, if women are more likely to react negatively to the change, if men are more likely to react negatively, etc.

I think this is where degrees of freedom comes into play. Because a $\chi^2$ test is open-ended enough to catch an instance of dependence, the degrees of freedom accounts for many of those effects taking place at once (???).