<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Everything-you-need-to-know-about-Variance" data-toc-modified-id="Everything-you-need-to-know-about-Variance-1">Everything you need to know about Variance</a></span><ul class="toc-item"><li><span><a href="#Variance-for-beginners" data-toc-modified-id="Variance-for-beginners-1.1">Variance for beginners</a></span></li></ul></li><li><span><a href="#Introduction" data-toc-modified-id="Introduction-2">Introduction</a></span></li><li><span><a href="#Sample-variance" data-toc-modified-id="Sample-variance-3">Sample variance</a></span></li><li><span><a href="#Standard-variance" data-toc-modified-id="Standard-variance-4">Standard variance</a></span><ul class="toc-item"><li><span><a href="#TI-nspire" data-toc-modified-id="TI-nspire-4.1">TI-nspire</a></span></li></ul></li><li><span><a href="#Covariance" data-toc-modified-id="Covariance-5">Covariance</a></span></li><li><span><a href="#Correlation" data-toc-modified-id="Correlation-6">Correlation</a></span></li><li><span><a href="#ddof-in-Pandas-and-Numpy-are-different" data-toc-modified-id="ddof-in-Pandas-and-Numpy-are-different-7">ddof in Pandas and Numpy are different</a></span></li><li><span><a href="#Investigating-Pandas-and-Numpy-ddof" data-toc-modified-id="Investigating-Pandas-and-Numpy-ddof-8">Investigating Pandas and Numpy ddof</a></span></li></ul></div>

# Everything you need to know about Variance
## Variance for beginners

# Introduction



# Sample variance

Python's `statistics.variance` returns a sample variance.


$$
s^2 _{n-1} = \frac{1}{n-1} \
$$


In [1]:
from scipy.stats import chi2_contingency
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import spearmanr
from scipy.stats import pearsonr

In [2]:
import statistics
import math
 
sample = [2.74, 1.23, 2.63, 2.22, 3, 1.98] 

var1 = statistics.variance(sample)
print(var1)
std1 = math.sqrt(var1)
print(std1)

0.40924
0.639718688174732


In [3]:
import numpy as np
var2 = np.var(sample,ddof=1)
print(var2)
std2 = math.sqrt(var2)
print(std2)

0.40924000000000005
0.639718688174732


# Standard variance

In [4]:
var3 = np.var(sample)
print(var3)
std3 = math.sqrt(var3)
print(std3)

0.34103333333333335
0.5839805932848569


## TI-nspire

![tinspire1](./image/tinspire1.png)

![tinspire2](./image/tinspire2.png)

https://corporatefinanceinstitute.com/resources/knowledge/finance/covariance/


# Covariance

# Correlation

Add contents from http://localhost:8888/notebooks/DataScience/my-medium-articles/Investigating_Pandas_and_Numpy_ddof.ipynb

In [8]:


df = pd.DataFrame(
    [
        [7,3],
        [6,4],
        [5,4],
        [3,2],
        [6,4],
        [8,9],
        [9,7]
    ],
    columns=['Set of A','Set of B'])

correlation, pval = spearmanr(df)
print(f'correlation={correlation:.6f}, p-value={pval:.6f}')


correlation=0.710560, p-value=0.073530


In [9]:
from scipy.stats import rankdata

r_x=rankdata(df.iloc[:,0])
print(r_x)
r_y=rankdata(df.iloc[:,1])
print(r_y)

[5.  3.5 2.  1.  3.5 6.  7. ]
[2. 4. 4. 1. 4. 7. 6.]


In [10]:
print(f'Pandas std default: {df.std()}')
print(f'Pandas std ddof=0: {df.std(ddof=0)}')
print(f'Pandas std ddof=1: {df.std(ddof=1)}')

Pandas std default: Set of A    1.976047
Set of B    2.429972
dtype: float64
Pandas std ddof=0: Set of A    1.829464
Set of B    2.249717
dtype: float64
Pandas std ddof=1: Set of A    1.976047
Set of B    2.429972
dtype: float64


# ddof in Pandas and Numpy are different


When you find the variance and standard deviation, Numpy and Pandas' default `ddof` are different.



# Investigating Pandas and Numpy ddof




In [11]:
s_x=np.std(r_x)
s_y=np.std(r_y)
s_x0=np.std(r_x,ddof=0)
s_y0=np.std(r_y,ddof=0)
s_x1=np.std(r_x,ddof=1)
s_y1=np.std(r_y,ddof=1)

print(f'Numpy std default: {s_x}')
print(f'Numpy std default: {s_y}')
print(f'Numpy std ddof=0: {s_x0}')
print(f'Numpy std ddof=0: {s_y0}')
print(f'Numpy std ddof=1: {s_x1}')
print(f'Numpy std ddof=1: {s_y1}')

Numpy std default: 1.9820624179302297
Numpy std default: 1.927248223318863
Numpy std ddof=0: 1.9820624179302297
Numpy std ddof=0: 1.927248223318863
Numpy std ddof=1: 2.140872096444188
Numpy std ddof=1: 2.0816659994661326
