<a href="https://colab.research.google.com/github/naomori/codexa_Statistics_1st/blob/master/Chapter6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 分散(Variance)と標準偏差(Standard Deviation)

両方ともデータの散らばり度合いを示すものです。

* 母集団（Population）
  - すべてのデータ
* 標本（Sample）
  - 母集団から抽出された一部分のデータ







In [0]:
import statistics as stats

import numpy as np
import matplotlib.pyplot as plt

In [9]:
years_of_experience = np.array([1.,2.,2.,5.,6.])

# 母平均: population mean, μで表す
population_mean = stats.mean(years_of_experience)
u = population_mean
print("population mean: %.2f" %(u))

population mean: 3.20


In [10]:
# 母分散：population variance
# 平均から各データポイントがどれくらい離れているか

population_variance = stats.pvariance(years_of_experience)
print("population variance: %.2f" % (population_variance))

population variance: 3.76


In [11]:
def find_mean(data):
  s = sum(data)
  n = len(data)
  mean = s / n
  return mean

print(find_mean(years_of_experience))

3.2


In [12]:
def find_diff(data):
  mean = find_mean(data)
  diff = []
  for num in data:
    diff.append(num - mean)
  return diff

diff = find_diff(years_of_experience)
print(diff)

[-2.2, -1.2000000000000002, -1.2000000000000002, 1.7999999999999998, 2.8]


In [15]:
def find_variance(data):
  diff = find_diff(data)
  sq_diff = []
  for d in diff:
    sq_diff.append(d**2)
  sum_sq_diff = sum(sq_diff)
  variance = sum_sq_diff / len(data)
  return variance

variance = find_variance(years_of_experience)
print("variance: %.2f" % (variance))

variance: 3.76


# 数式の確認

#### Population Mean

$\mu = \frac{1}{N} \times \sum_{i=1}^{N}x_i$

#### Population Variance

$\sigma^2 = \frac{1}{N} \times \sum_{i=1}^{N}(x_i - \mu)^2$

数式の表現には Tex 表現を使います。

[よく忘れるので数学のTeX記法をまとめ](https://qiita.com/shepabashi/items/27b7284d1f0007af533b)


# 標準偏差



In [24]:
height = np.array([1.6, 1.9, 1.5, 1.8, 1.7])

u = stats.mean(height)
print("populcation mean: %.2f [m]" % (u))
pv = stats.pvariance(height)
print("populcation variance: %.2f [m^2]" % (pv))

populcation mean: 1.70 [m]
populcation variance: 0.02 [m^2]


In [25]:
  sigma = np.sqrt(pv)
  print("populcation standard deviation: %.2f [m]" % (sigma))

populcation standard deviation: 0.14 [m]


In [28]:
pstdev = stats.pstdev(height)
print("pstdev: %.2f" % (pstdev))

pstdev: 0.14
