<a href="https://colab.research.google.com/github/kangwonlee/nmisp/blob/uv/20_probability/15_random_variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


In [None]:
import random

import matplotlib.pyplot as plt
import numpy as np
import numpy.random as nr
import scipy.stats



In [None]:
random.seed()



In [None]:
def bar(bins, result_0):
    width_list = [b1 - b0 for b0, b1 in zip(bins[:-1], bins[1:])]
    return plt.bar(bins[:-1], result_0, width=width_list, align='edge')



# Random Variables and Probability Distributions<br>확률변수와 확률분포



* A random variable is a number that represents the uncertain outcome of an event.<br>확률변수란 어떤 사건의 확정적이지 않은 결과를 나타내는 어떤 변수를 말한다.



## Random Variables<br>확률변수



### Discrete and continuous random variables<br>이산 확률변수와 연속 확률변수



* Examples of discrete random variables<br>이산확률변수의 예
    * Number of heads in 3 coin flips<br>동전을 세번 던졌을 때 앞면이 나올 확률
    * Roll of a dice<br>주사위 던지기
* Examples of continous random variables<br>연속확률변수의 예
    * Height of a person<br>사람들의 키
    * Time it takes for a component to fail<br>부품 수명


|  | discrete<br>이산 | continuous<br>연속
:-----:|:-----:|:-----:
value range<br>값의 범위 | finite, countable set of values<br>유한하고 셀 수 있는 값 | uncountable set of values within a range<br>어떤 범위 안의 셀 수 없는 값
probability distribution<br>확률분포| probability mass functions (PMF)<br>확률질량함수 | probability density functions (PDF)<br>확률밀도함수
cumulative probability<br>누적확률| cumulative distrubtion functions<br>누적분포함수 | cumulative distrubtion functions<br>누적분포함수
examples of probability distributions<br>확률분포 사례 | Bernoulli, Binormal<br>베르누이분포, 이항분포 | Normal, uniform, exponential<br>정규분포, 균등군포, 지수분포



### Representing Probability Distributions<br>확률분포의 표현



#### Probability Mass Functions<br>확률질량함수



* The probability mass function (PMF) assigns a probability to each possible discrete outcome of a random variable.<br>확률질량함수는 어떤 확률변수의 개별적으로 떨어져 있는 각각의 가능한 결과의 확률을 나타낸다.


##### Bernoulli distribution<br>베르누이 분포
* The Bernoulli distribution models the probability of a single experiment with two possible outcomes (success or failure), where the probability of success remains constant.<br>베르누이분포는 가능한 결과가 성공과 실패 두가지 있고, 성공 확률이 정해져 있을 경우, 한번의 실험의 결과의 확률을 보여 준다.



In [None]:
# Define the success probability for our Bernoulli trial
p = 0.7

# Possible outcomes of a Bernoulli trial (success = 1, failure = 0)
outcomes = [0, 1]

# Calculate probabilities for each outcome using the Bernoulli PMF
pmf_values = [1 - p, p]

# Create a bar plot visualization of the PMF
plt.stem(outcomes, pmf_values)
plt.xlabel('Outcome (0 = failure, 1 = success)')
plt.ylabel('Probability')
plt.title('Probability Mass Function (PMF) of Bernoulli Distribution (p = 0.7)')



#### Probability Density Functions<br>확률밀도함수



* The probability density function (PDF) describes the relative likelihood of a continuous random variable taking on a particular value within a certain range.<br>확률밀도함수는 연속적인 확률변수가 어떤 범위 안에서 특정 값을 가질 상대적인 가능성을 나타낸다.


##### Normal distribution<br>정규분포




In [None]:
average = 0.0
std_dev = 1.0

x = np.linspace(average + (- 4.0) * std_dev, average + (+ 4.0) * std_dev)
pdf_values = scipy.stats.norm.pdf(x, average, std_dev)
plt.plot(x, pdf_values)
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.title('Normal Distribution (Mean = 0, Standard Deviation = 1)')
plt.grid(True)



#### Cumulative Distribution Functions<br>누적분포함수



* The cumulative distribution function (CDF) of a random variable gives the probability that the random variable will take on a value less than or equal to a specified value.<br>어떤 확률변수의 누적분포함수는 해당 확률변수가 어떤 값 이하일 확률 또는 해당 값을 가질 확률을 나타낸다.



In [None]:
# Discrete (Binomial)
n_trials = 10
p_success = 0.3

# Continuous (Normal)
mean = 0
std_dev = 1

# Discrete
outcomes = np.arange(0, n_trials + 1)
pmf_values = scipy.stats.binom.pmf(outcomes, n_trials, p_success)
cdf_discrete = scipy.stats.binom.cdf(outcomes, n_trials, p_success)

# Continuous
x_vals = np.linspace(-4, 4, 200)
pdf_values = scipy.stats.norm.pdf(x_vals, mean, std_dev)
cdf_continuous = scipy.stats.norm.cdf(x_vals, mean, std_dev)

fig, axes = plt.subplots(2, 2, figsize=(8, 6))

# PMF
axes[0, 0].stem(outcomes, pmf_values)
axes[0, 0].set_xlabel('Outcome')
axes[0, 0].set_ylabel('Probability')
axes[0, 0].set_title(f'PMF (Binomial, n = {n_trials})')
axes[0, 0].grid(True)

# PDF
axes[0, 1].plot(x_vals, pdf_values)
axes[0, 1].set_xlabel('Value')
axes[0, 1].set_ylabel('Probability Density')
axes[0, 1].set_title('PDF (Normal)')
axes[0, 1].grid(True)

# CDF (Discrete)
axes[1, 0].plot(outcomes, cdf_discrete, marker='o', linestyle='none')
axes[1, 0].set_xlabel('Outcome')
axes[1, 0].set_ylabel('Cumulative Probability')
axes[1, 0].set_title(f'CDF (Binomial, n = {n_trials})')
axes[1, 0].grid(True)

# CDF (Continuous)
axes[1, 1].plot(x_vals, cdf_continuous)
axes[1, 1].set_xlabel('Value')
axes[1, 1].set_ylabel('Cumulative Probability')
axes[1, 1].set_title('CDF (Normal)')
axes[1, 1].grid(True)

fig.tight_layout()



### Common Distributions<br>자주 사용되는 확률분포



#### Uniform distribution<br>균일분포



* The uniform distribution is a probability distribution where all outcomes within a specified range have an equal chance of occurring.<br>균일분포는 어떤 특정 범위 안 모든 결과가 일어날 확률이 같은 확률분포이다.



In [None]:
# Set parameters for the distribution
lower_bound = 0
upper_bound = 10
x_array = np.linspace(lower_bound, upper_bound, 21)

# Generate random samples from the uniform distribution
uniform_pdf = scipy.stats.uniform.pdf(x_array, loc=lower_bound, scale=upper_bound-lower_bound)

# Plot the distribution (histogram)
plt.plot(x_array, uniform_pdf)
plt.xlabel('Value')
plt.ylabel('Frequency')

plt.ylim(bottom=0)
plt.title('Uniform Distribution')
plt.grid(True)



#### Normal distribution<br>정규 분포



* The normal distribution (also known as the Gaussian distribution) is a bell-shaped probability distribution that is symmetrical around its mean, with most values clustering around the center and decreasing in frequency as they move away from the mean.<br>가우스 분포라고도 알려진 정규분포는 평균을 중심으로 좌우대칭인 종모양의 확률분포로, 대부분의 값은 가운데에 모여 있고 평균으로부터 멀어질수록 빈도가 줄어든다.
* Many physical properties and measurement results follow a normal distribution (e.g., material strength, dimensional variations due to many small, random factors).<br>다수의 물리적 특성값과 측정 결과가 정규분포를 따른다.



In [None]:
# Set parameters for the distribution
mean = 0
std_dev = 1
x_array = np.linspace(-3.0, 3.0, 21)

# Generate random samples
norm_pdf = scipy.stats.norm.pdf(x_array, loc=mean, scale=std_dev)

# Plot the distribution (histogram)
plt.plot(x_array, norm_pdf)
plt.xlabel('Value')
plt.ylabel('Frequency')

plt.title('Normal (Gaussian) Distribution')
plt.grid(True)



#### Binomial distribution<br>이항분포



* The binomial distribution models the probability of getting a certain number of successes in a fixed number of independent trials, each with a constant probability of success.<br>이항분표는 성공 확률이 같은 일정 횟수의 독립 시행에서 성공 횟수의 확률을 수학적으로 묘사한다.
    * Probability of a component failing within a certain number of cycles.<br>어떤 부품이 특정 횟수의 작동 사이클 안에 고장날 확률.



In [None]:
# Parameters
n_trials = 10      # Number of trials
p_success = 0.3     # Probability of success on each trial

# Possible outcomes (number of successes)
outcomes = np.arange(0, n_trials + 1)

# Calculate probabilities for each outcome
pmf_values = scipy.stats.binom.pmf(outcomes, n_trials, p_success)

# Plot the distribution (bar chart)
plt.stem(outcomes, pmf_values)
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.title(f'Binomial Distribution (n = 10, p = {p_success})')
plt.grid(True)



#### Poisson distribution<br>푸아송 분포



* The Poisson distribution models the probability of a given number of events occurring within a fixed interval of time or space, given an average rate of occurrence.<br>푸아송 분포는 어떤 사건의 평균 발생률이 주어졌을 때, 고정된 시간 또는 공간의 구간 내에서, 해당 사건이 주어진 횟수 만큼 발생할 확률을 수학적으로 묘사한다.
    * The number of guests visiting a bank's branch<br>은행의 지점을 방문하는 손님의 수



In [None]:
# Parameter
average_rate = 3  # ex : Average number of guests in one hour

# Possible outcomes (number of events)
outcomes = np.arange(0, 15)

# Calculate probabilities for each outcome
pmf_values = scipy.stats.poisson.pmf(outcomes, average_rate)

# Plot the distribution (bar chart)
plt.stem(outcomes, pmf_values)
plt.xlabel('Number of Events')
plt.ylabel('Probability')
plt.title(f'Poisson Distribution (average rate = {average_rate})')
plt.grid(True)



#### Exponential distribution<br>지수 분포



* The exponential distribution models the amount of time until a specific event occurs in a continuous process where events occur randomly and independently at a constant average rate.<br>지수 분포는 사건이 무작위로 독립적으로 일정한 평균 비율로 일어날 때 특정 사건이 일어날 때 까지의 시간의 확률이다.
    * The waiting times between arrivals of customers<br>손님 방문 간격



In [None]:
# Parameter
rate = 2.0  # ex : Average number of guests in one hour
average_hr = 1 / rate  # Calculate the mean waiting time
x_array = np.linspace(0, 3.0, 41)

# Generate random samples
expo_pdf = scipy.stats.expon.pdf(x_array, scale=average_hr)
expo_cdf = scipy.stats.expon.cdf(x_array, scale=average_hr)

_, axs = plt.subplots(2, 1,)

# Plot the distribution
axs[0].plot(x_array, expo_pdf)
axs[0].set_ylabel('Frequency')
axs[0].set_title(f'Exponential Distribution (ave wait time = {average_hr} hr)')
axs[0].grid(True)


# Plot the distribution
axs[1].plot(x_array, expo_cdf)
axs[1].set_xlabel('Time until event(hours)')
axs[1].set_ylabel('Probability')
axs[1].grid(True)



#### Weibull distribution<br>Weibull 분포



* The Weibull distribution is a versatile probability distribution used to model the time-to-failure of components or systems, where the failure rate can change over time.<br>Weibull 분포는 부품 또는 시스템의 고장 확률이 시간에 따라 변화할 때 해당 고장이 발생할 때 까지의 시간을 모델링하는데 매우 널리 사용된다.



In [None]:
# Parameters (shape and scale)
# shape < 1 : The rate would decrease over time
# shape = 1 : The rate would stay constant over time
# shape > 1 : The rate would increase over time
# scale : for example, hour of operation until ~ 63% failures

_, axs = plt.subplots(2, 1)

for shape in (0.5, 1.0, 2.0):
    scale = 1000
    x_array = np.linspace(0, 2500, 21)

    # Generate random samples
    weibull_pdf = scipy.stats.weibull_min.pdf(x_array, shape, scale=scale)
    weibull_cdf = scipy.stats.weibull_min.cdf(x_array, shape, scale=scale)

    # Plot the distribution
    axs[0].plot(x_array, weibull_pdf, label=f'shape={shape}')
    axs[0].set_ylabel('Frequency')
    axs[0].set_title(f'Weibull Distribution (shape = {shape}, scale = {scale})')
    axs[0].legend(loc=0)
    axs[0].grid(True)

    axs[1].plot(x_array, weibull_cdf, label=f'shape={shape}')
    axs[1].set_xlabel('Time to event')
    axs[1].set_ylabel('Probability')
    axs[1].legend(loc=0)
    axs[1].grid(True)



## Generating Random Numbers in Python<br>파이썬에서의 확률변수 생성



### Pseudorandom Number Generators<br>의사(유사)난수 생성기


Functions such as `py.random()` are pseudorandom number generators.<br>`py.random()` 등은 유사 난수 발생기이다.



<br>It would generate a sequence of numbers showing similar characteristics of random numbers, they are not truely random.[[wikipedia](https://en.wikipedia.org/wiki/Pseudorandom_number_generator)]<br>난수, 임의의 숫자와 비슷한 특징을 보이는 일련의 숫자열을 발생시키지만 정말로 무작위인 것은 아니다.



`seed`로 난수 발생을 통제할 수 있다.<br>We can control random number generation using `seed`.



In [None]:
import pylab as py



다음 두 셀의 결과는 다를 것이다.<br>Following two cells would show different results.



In [None]:
py.seed()
py.random([5,])



In [None]:
py.seed()
py.random([5,])



다음 두 셀의 결과는 같을 것이다.<br>Following two cells would show the same results.



In [None]:
seed = 2038011903
py.seed(seed)
py.random([5,])



In [None]:
py.seed(seed)
py.random([5,])



### Standard Library<br>표준 라이브러리



#### Uniform distribution<br>균일분포



$n$개의 난수를 0 과 1 사이에서 균일 분포에 따라 발생시켜 보자.<br>Let's generate $n$ random numbers between zero and one following the uniform distribution.



In [None]:
n = 10000
x_min = 0.0
x_max = 1.0



파이썬 표준 라이브러리 가운데서는 `random` 모듈을 사용할 수 있다.<br>One can use `random` of the python standard libraries.



In [None]:
import random



`random` 모듈을 사용하기 전 반드시 `seed()` 함수로 초기화 하도록 하자.<br>
Let's always initialize by calling `seed()` function before using `random` module.



In [None]:
random.seed(1)



`random.uniform()` 함수는 균일분포를 따르는 임의의 `float` 실수를 생성할 수 있다.<br>
`random.uniform()` can generate random `float`s following the uniform distribution.



In [None]:
uniform_random_numbers_list = []

for i in range(n):
    uniform_random_numbers_list.append(random.uniform(x_min, x_max))



0.1 간격으로 칸의 경계를 준비하자.<br>Let's prepare edges of bins with 0.1 interval.



In [None]:
bin_interval = 0.1
bins_array = np.arange(x_min, x_max+0.5*bin_interval, bin_interval)
bins_array



히스토그램을 그려 보자.<br>Let's plot the histogram.



In [None]:
hist_uniform = np.histogram(uniform_random_numbers_list, bins=bins_array)



In [None]:
bar(bins_array, hist_uniform[0])
plt.grid(True)
plt.title('Histogram, Uniform distribution : Standard library')
plt.xlabel('value')
plt.ylabel('frequency');



확률을 계산해 보자.<br>Let's calculate the probabilities.



In [None]:
probaility_uniform = hist_uniform[0] / n



In [None]:
bar(bins_array, probaility_uniform)
plt.grid(True)
plt.title('Probability, Uniform distribution : Standard library')
plt.xlabel('value')
plt.ylabel('probability');



#### Normal distribution<br>정규분포



이번에는 $n$개의 난수를 평균은 0, 표준편차는 1인 정규 분포를 따르도록 발생시켜 보자.<br>Now, let's generate $n$ random numbers following a normal distribution with average and standard deviation of zero and one respectively.



In [None]:
n = 10000
x_ave = 0.0
x_std = 1.0



`random.normalvariate()` 또는 `random.gauss()` 함수를 사용할 수 있다.<br>
`random.normalvariate()` or `random.gauss()` functions are available.



In [None]:
normal_random_numbers_list = [random.normalvariate(x_ave, x_std) for i in range(n)]



히스토그램을 그려 보자.<br>Let's plot the histogram.



In [None]:
bin_interval = 0.1
bins_array = np.arange(x_ave + (-3)*x_std, x_ave + (+3)*x_std + 0.5*bin_interval, bin_interval)



In [None]:
hist_normal = np.histogram(normal_random_numbers_list, bins=bins_array)



In [None]:
bar(bins_array, hist_normal[0])
plt.grid(True)
plt.title('Normal distribution : Standard library')
plt.xlabel('value')
plt.ylabel('frequency');



확률:<br>Probabilities:



In [None]:
probaility_normal = hist_normal[0] / n



In [None]:
bar(bins_array, probaility_normal)
plt.grid(True)
plt.title('Probability, Normal distribution : Standard library')
plt.xlabel('value')
plt.ylabel('probability');



### `numpy.random`



#### Uniform distribution<br>균일분포



`numpy`의 부 모듈 가운데 `numpy.random` 모듈을 이용할 수도 있다.<br>
One can also use `numpy.random`, a submodule of the `numpy`.



In [None]:
import numpy.random as nr



`numpy.random.uniform()` 함수는 균일분포를 따르는 임의의 `float` 실수를 생성할 수 있다.<br>
`numpy.random.uniform()` can generate random `float`s following the uniform distribution.



In [None]:
uniform_random_numbers_array = nr.uniform(x_min, x_max, n)



히스토그램을 그려 보자.  칸의 경계는 재사용하자.<br>
Let's plot the histogram reusing the edges of the bins.



In [None]:
hist_uniform_nr = np.histogram(uniform_random_numbers_array, bins=bins_array)



In [None]:
bar(bins_array, hist_uniform_nr[0])
plt.grid(True)
plt.title('Histogram, Uniform distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('frequency');



확률도 계산해 보자.<br>Let's calculate the probabilities, too.



In [None]:
probaility_uniform = hist_uniform_nr[0] / n



In [None]:
bar(bins_array, probaility_uniform)
plt.grid(True)
plt.title('Probability, Uniform distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('probability');



#### Normal distribution<br>정규분포



`numpy.random.normal()` 함수를 쓸 수 있다.<br>
One can use the `numpy.random.normal()` function.



In [None]:
normal_random_numbers_nr = nr.normal(x_min, x_max, n)



히스토그램을 그려 보자.<br>Let's plot the histogram.



In [None]:
hist_normal_nr = np.histogram(normal_random_numbers_nr, bins=bins_array)



In [None]:
bar(bins_array, hist_normal_nr[0])
plt.grid(True)
plt.title('Normal distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('frequency');



확률:<br>Probabilities:



In [None]:
probaility_normal_nr = hist_normal_nr[0] / n



In [None]:
bar(bins_array, probaility_normal_nr)
plt.grid(True)
plt.title('Probability, Normal distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('probability');



누적확률:<br>Cumulative probability



In [None]:
norm_cp = np.cumsum(probaility_normal_nr)
bar(bins_array, norm_cp)
plt.grid(True)
plt.title('Cumulative probability, Normal distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('probability');



누적 분포 함수와의 비교<br>Comparing with the cumulative distribution function (cdf)



In [None]:
norm_cdf = scipy.stats.norm.cdf(bins_array)

bar(bins_array, norm_cp)
plt.plot(bins_array, norm_cdf, 'r-')
plt.grid(True)
plt.title('Cumulative probability, Normal distribution : numpy.random')
plt.xlabel('value')
plt.ylabel('probability');



누적분포 함수의 역함수:<br>Inverse of cumulative distribution function



In [None]:
normal_random_varaible = scipy.stats.norm()
ppf = normal_random_varaible.ppf


균일 분포로 발생시켰던 난수로 누적분포함수의 역함수를 호출해 보자.<br>Let's call the inverse of the cumulative distribution function with the instances of the uniform random number as the argument.


In [None]:
ppf_uniform = ppf(uniform_random_numbers_array)



그 히스토그램은 해당 cdf와 관련되어 있을 것이다.<br>The histogram would be related to the cdf.



In [None]:
hist_normal_inv_cdf = np.histogram(ppf_uniform, bins=bins_array)

bar(bins_array, hist_normal_inv_cdf[0])
plt.grid(True)
plt.title('Probability, uniform distribution through inverse of cdf')
plt.xlabel('value')
plt.ylabel('probability');



## 히스토그램 그리기<br>Plotting a Histogram



다음 비디오는 히스토그램을 그리는 예를 보여준다.<br>Following video shows an example of plotting a histogram.



[![How to create a histogram | Data and statistics | Khan Academy](https://i.ytimg.com/vi/gSEYtAjuZ-Y/hqdefault.jpg)](https://www.youtube.com/watch?v=gSEYtAjuZ-Y)



파이썬으로 한번 그려보자.<br>Let's plot it with python.



다음 데이터를 생각해 보자.<br>Let's think about following data



In [None]:
data = [1, 3, 27, 32, 5, 63, 26, 25, 18, 16,
        4, 45, 29, 19, 22, 51, 58, 9, 42, 6]



0 부터 70 까지 히스토그램 칸의 경계를 준비해 보자.<br>Let's prepare for a list of edges between bins of the histogram.



In [None]:
bins_list = list(range(0, 70+1, 10))
bins_list



`numpy`에는 히스토그램을 계산하는 함수가 있다.<br>`numpy` has a function calculating the histogram.



In [None]:
hist_result = np.histogram(data, bins=bins_list)
hist_result



`matplotlib`에는 히스토그램을 그려주는 함수도 있다.<br>`matplotlib` has a function plotting the histogram.



In [None]:
plt.hist(data, bins=bins_list)
plt.grid(True)
plt.title('Histogram')
plt.xlabel('value')
plt.ylabel('frequency');



칸 경계는 자동으로 정할 수도 있다.<br>One may let the function choose the bins.



In [None]:
plt.hist(data, bins='auto');
plt.grid(True)
plt.title('Histogram')
plt.xlabel('value')
plt.ylabel('frequency')



`matplotlib`의 `bar()` 함수로 그릴 수도 있다.<br>`bar()` function of `matplotlib` may plot too.



In [None]:
def bar(bins, result_0):
    width_list = [b1 - b0 for b0, b1 in zip(bins[:-1], bins[1:])]
    return plt.bar(bins[:-1], result_0, width=width_list, align='edge')



In [None]:
bar(bins_list, hist_result[0])
plt.grid(True)

plt.title('Histogram')
plt.xlabel('value')
plt.ylabel('frequency');



## 참고문헌<br>References



[[ref0](https://docs.python.org/3/library/random.html)]
[[ref1](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html)]
[[ref2](https://stackoverflow.com/a/33372888)]
[[ref3](https://numpy.org/doc/stable/reference/random/index.html)]



## Final Bell<br>마지막 종



In [None]:
# stackoverfow.com/a/24634221
import os
os.system("printf '\a'");

