<a href="https://colab.research.google.com/github/kangwonlee/nmisp/blob/dependabot/pip/tests/requests-2.31.0/20_probability/15_generating_random_numbers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


In [None]:
import random

import matplotlib.pyplot as plt
import numpy as np
import numpy.random as nr
import scipy.stats



In [None]:
random.seed()



# 확률 분포에 따르는 난수 발생<br>Generating random numbers following probability distributions



## 히스토그램<br>Histogram



다음 비디오는 히스토그램을 그리는 예를 보여준다.<br>Following video shows an example of plotting a histogram.



[![How to create a histogram | Data and statistics | Khan Academy](https://i.ytimg.com/vi/gSEYtAjuZ-Y/hqdefault.jpg)](https://www.youtube.com/watch?v=gSEYtAjuZ-Y)



파이썬으로 한번 그려보자.<br>Let's plot it with python.



다음 데이터를 생각해 보자.<br>Let's think about following data



In [None]:
data = [1, 3, 27, 32, 5, 63, 26, 25, 18, 16,
        4, 45, 29, 19, 22, 51, 58, 9, 42, 6]



0 부터 70 까지 히스토그램 칸의 경계를 준비해 보자.<br>Let's prepare for a list of edges between bins of the histogram.



In [None]:
bins_list = list(range(0, 70+1, 10))
bins_list



`numpy`에는 히스토그램을 계산하는 함수가 있다.<br>`numpy` has a function calculating the histogram.



In [None]:
hist_result = np.histogram(data, bins=bins_list)
hist_result



`matplotlib`에는 히스토그램을 그려주는 함수도 있다.<br>`matplotlib` has a function plotting the histogram.



In [None]:
plt.hist(data, bins=bins_list)
plt.grid(True)
plt.title('Histogram')
plt.xlabel('value')
plt.ylabel('frequency');



칸 경계는 자동으로 정할 수도 있다.<br>One may let the function choose the bins.



In [None]:
plt.hist(data, bins='auto');
plt.grid(True)
plt.title('Histogram')
plt.xlabel('value')
plt.ylabel('frequency')



`matplotlib`의 `bar()` 함수로 그릴 수도 있다.<br>`bar()` function of `matplotlib` may plot too.



In [None]:
def bar(bins, result_0):
    width_list = [b1 - b0 for b0, b1 in zip(bins[:-1], bins[1:])]
    return plt.bar(bins[:-1], result_0, width=width_list, align='edge')



In [None]:
bar(bins_list, hist_result[0])
plt.grid(True)

plt.title('Histogram')
plt.xlabel('value')
plt.ylabel('frequency');



## 균일 분포<br>Uniform distribution



$n$개의 난수를 0 과 1 사이에서 균일 분포에 따라 발생시켜 보자.<br>Let's generate $n$ random numbers between zero and one following the uniform distribution.



In [None]:
n = 10000
x_min = 0.0
x_max = 1.0



### 표준 라이브러리<br>Standard library



파이썬 표준 라이브러리 가운데서는 `random` 모듈을 사용할 수 있다.<br>One can use `random` of the python standard libraries.



In [None]:
import random



`random` 모듈을 사용하기 전 반드시 `seed()` 함수로 초기화 하도록 하자.<br>
Let's always initialize by calling `seed()` function before using `random` module.



In [None]:
random.seed(1)



`random.uniform()` 함수는 균일분포를 따르는 임의의 `float` 실수를 생성할 수 있다.<br>
`random.uniform()` can generate random `float`s following the uniform distribution.



In [None]:
uniform_random_numbers_list = []

for i in range(n):
    uniform_random_numbers_list.append(random.uniform(x_min, x_max))



0.1 간격으로 칸의 경계를 준비하자.<br>Let's prepare edges of bins with 0.1 interval.



In [None]:
bin_interval = 0.1
bins_array = np.arange(x_min, x_max+0.5*bin_interval, bin_interval)
bins_array



히스토그램을 그려 보자.<br>Let's plot the histogram.



In [None]:
hist_uniform = np.histogram(uniform_random_numbers_list, bins=bins_array)



In [None]:
bar(bins_array, hist_uniform[0])
plt.grid(True)
plt.title('Histogram, Uniform distribution : Standard library')
plt.xlabel('value')
plt.ylabel('frequency');



확률을 계산해 보자.<br>Let's calculate the probabilities.



In [None]:
probaility_uniform = hist_uniform[0] / n



In [None]:
bar(bins_array, probaility_uniform)
plt.grid(True)
plt.title('Probability, Uniform distribution : Standard library')
plt.xlabel('value')
plt.ylabel('probability');



### `numpy.random`



`numpy`의 부 모듈 가운데 `numpy.random` 모듈을 이용할 수도 있다.<br>
One can also use `numpy.random`, a submodule of the `numpy`.



In [None]:
import numpy.random as nr



`numpy.random.uniform()` 함수는 균일분포를 따르는 임의의 `float` 실수를 생성할 수 있다.<br>
`numpy.random.uniform()` can generate random `float`s following the uniform distribution.



In [None]:
uniform_random_numbers_array = nr.uniform(x_min, x_max, n)



히스토그램을 그려 보자.  칸의 경계는 재사용하자.<br>
Let's plot the histogram reusing the edges of the bins.



In [None]:
hist_uniform_nr = np.histogram(uniform_random_numbers_array, bins=bins_array)



In [None]:
bar(bins_array, hist_uniform_nr[0])
plt.grid(True)
plt.title('Histogram, Uniform distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('frequency');



확률도 계산해 보자.<br>Let's calculate the probabilities, too.



In [None]:
probaility_uniform = hist_uniform_nr[0] / n



In [None]:
bar(bins_array, probaility_uniform)
plt.grid(True)
plt.title('Probability, Uniform distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('probability');



## 정규 분포<br>Normal distribution



이번에는 $n$개의 난수를 평균은 0, 표준편차는 1인 정규 분포를 따르도록 발생시켜 보자.<br>Now, let's generate $n$ random numbers following a normal distribution with average and standard deviation of zero and one respectively.



In [None]:
n = 10000
x_ave = 0.0
x_std = 1.0



### 표준 라이브러리<br>Standard library



`random.normalvariate()` 또는 `random.gauss()` 함수를 사용할 수 있다.<br>
`random.normalvariate()` or `random.gauss()` functions are available.



In [None]:
normal_random_numbers_list = [random.normalvariate(x_ave, x_std) for i in range(n)]



히스토그램을 그려 보자.<br>Let's plot the histogram.



In [None]:
bin_interval = 0.1
bins_array = np.arange(x_ave + (-3)*x_std, x_ave + (+3)*x_std + 0.5*bin_interval, bin_interval)



In [None]:
hist_normal = np.histogram(normal_random_numbers_list, bins=bins_array)



In [None]:
bar(bins_array, hist_normal[0])
plt.grid(True)
plt.title('Normal distribution : Standard library')
plt.xlabel('value')
plt.ylabel('frequency');



확률:<br>Probabilities:



In [None]:
probaility_normal = hist_normal[0] / n



In [None]:
bar(bins_array, probaility_normal)
plt.grid(True)
plt.title('Probability, Normal distribution : Standard library')
plt.xlabel('value')
plt.ylabel('probability');



### `numpy.random`



`numpy.random.normal()` 함수를 쓸 수 있다.<br>
One can use the `numpy.random.normal()` function.



In [None]:
normal_random_numbers_nr = nr.normal(x_min, x_max, n)



히스토그램을 그려 보자.<br>Let's plot the histogram.



In [None]:
hist_normal_nr = np.histogram(normal_random_numbers_nr, bins=bins_array)



In [None]:
bar(bins_array, hist_normal_nr[0])
plt.grid(True)
plt.title('Normal distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('frequency');



확률:<br>Probabilities:



In [None]:
probaility_normal_nr = hist_normal_nr[0] / n



In [None]:
bar(bins_array, probaility_normal_nr)
plt.grid(True)
plt.title('Probability, Normal distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('probability');



누적확률:<br>Cumulative probability



In [None]:
norm_cp = np.cumsum(probaility_normal_nr)
bar(bins_array, norm_cp)
plt.grid(True)
plt.title('Cumulative probability, Normal distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('probability');



누적 분포 함수와의 비교<br>Comparing with the cumulative distribution function (cdf)



In [None]:
norm_cdf = scipy.stats.norm.cdf(bins_array)

bar(bins_array, norm_cp)
plt.plot(bins_array, norm_cdf, 'r-')
plt.grid(True)
plt.title('Cumulative probability, Normal distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('probability');



누적분포 함수의 역함수:<br>Inverse of cumulative distribution function



In [None]:
normal_random_varaible = scipy.stats.norm()
ppf = normal_random_varaible.ppf


균일 분포로 발생시켰던 난수로 누적분포함수의 역함수를 호출해 보자.<br>Let's call the inverse of the cumulative distribution function with the instances of the uniform random number as the argument.


In [None]:
ppf_uniform = ppf(uniform_random_numbers_array)



그 히스토그램은 해당 cdf와 관련되어 있을 것이다.<br>The histogram would be related to the cdf.



In [None]:
hist_normal_inv_cdf = np.histogram(ppf_uniform, bins=bins_array)

bar(bins_array, hist_normal_inv_cdf[0])
plt.grid(True)
plt.title('Probability, uniform distribution through inverse of cdf')
plt.xlabel('value')
plt.ylabel('probability');



## 참고문헌<br>References



[[ref0](https://docs.python.org/3/library/random.html)]
[[ref1](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html)]
[[ref2](https://stackoverflow.com/a/33372888)]
[[ref3](https://numpy.org/doc/stable/reference/random/index.html)]



## 유사 난수 발생기의 `seed`<br>`seed` of pseudorandom number generator



`py.random()` 등은 유사 난수 발생기이다.<br>Functions such as `py.random()` are pseudorandom number generators.



난수, 임의의 숫자와 비슷한 특징을 보이는 일련의 숫자열를 발생시키지만 정말로 무작위인 것은 아니다.<br>It would generate a sequence of numbers showing similar characteristics of random numbers, they are not truely random.[[wikipedia](https://en.wikipedia.org/wiki/Pseudorandom_number_generator)]



`seed`로 난수 발생을 통제할 수 있다.<br>We can control random number generation using `seed`.



In [None]:
import pylab as py



다음 두 셀의 결과는 다를 것이다.<br>Following two cells would show different results.



In [None]:
py.seed()
py.random([5,])



In [None]:
py.seed()
py.random([5,])



다음 두 셀의 결과는 같을 것이다.<br>Following two cells would show the same results.



In [None]:
seed = 2038011903
py.seed(seed)
py.random([5,])



In [None]:
py.seed(seed)
py.random([5,])



## Final Bell<br>마지막 종



In [None]:
# stackoverfow.com/a/24634221
import os
os.system("printf '\a'");

