# 간단한 통계 기능 소개<br>Simple introduction to statistics features

참고 References : 
* 맥키니 저, 김영근 역, 파이썬 라이브러리를 활용한 데이터 분석, 한빛미디어, 2013, ISBN 978-89-6848-047-8 ([코드와 데이터](https://github.com/wesm/pydata-book/tree/1st-edition)) <br> McKinney, Python for Data Analysis, O'Reilly, 2012, ISBN 978-14-4931-979-3 ([Code and data](https://github.com/wesm/pydata-book/tree/1st-edition))
* Varoquaux, Statistics in Python, Scipy lecture notes, 2018 Sept 01, [Online] Available: http://www.scipy-lectures.org/packages/statistics/index.html.

In [None]:
# NumPy & matplotlib
import pylab as py

# Data table
import pandas as pd


데이터 배열 생성<br>Creating data arrays

In [None]:
t_deg = py.linspace(-360, 360, 24+1)
t_rad = py.deg2rad(t_deg)
sin_t = py.sin(t_rad)
cos_t = py.cos(t_rad)

데이터 표 생성<br>Creating data table

In [None]:
df = pd.DataFrame(
    {
        't_rad': t_rad,
        'sin': sin_t,
        'cos': cos_t,
    },
    index=t_deg,
    columns=['t_rad', 'sin', 'cos']
)

데이터 표 내용<br>Content of the data table

In [None]:
# https://www.shanelynn.ie/using-pandas-dataframe-creating-editing-viewing-data-in-python/
# set maximum number of rows to display
pd.options.display.max_rows = 10
df

데이터 표 정보<br>Data table info

In [None]:
print(f'df.shape = {df.shape}')
print(f'df.columns = {df.columns}')


이름으로 열 선택<br>Selecting a column by its name

In [None]:
print(f'df["sin"] = \n{df["sin"]}')


논리식으로 행 선택<br>Choosing rows by a boolean logic

In [None]:
print(f"df[abs(df.sin)<1e-3] = \n{df[abs(df.sin)<1e-3]}")


다양한 통계<br>Various statistics

In [None]:
df.describe()

산포도 행렬<br>Scatter matrix

In [None]:
import pandas.plotting as plotting
plotting.scatter_matrix(df[['t_rad', 'cos','sin']])

## 선형 회귀 예<br>Linear Regression Example

### 데이터 준비<br>Prepare data

선형 회귀는 $n$ 개의 $(x, y)$ 데이터를 대표할 수 있는 직선의 방정식을 찾는 것이다.<br>
Linear regression is to find the equation of the straight line representing $n$ $(x, y)$ data points.<br>

데이터를 준비해 보자.<br>
Let's prepare for some data points.<br>

아래는 참값을 생성한다고 가정하자.<br>
Let's assume that following cell generates the true value

In [None]:
import pylab as py

a = 0.5
b = 2.0

x_array = py.linspace(0, 5, 20+1)
y_true = a * x_array + b

In [None]:
py.plot(x_array, y_true, label='true')
py.grid(True)
py.ylim(ymin=0)
py.legend(loc=0)
py.xlabel('x')
py.ylabel('y')

잡음이 섞인 측정값도 준비해 보자.<br>
Lets' prepare for the measurements contaminated by some noise.


In [None]:
import numpy.random as nr

nr.seed()

w_array = nr.normal(0, 0.25, size=x_array.shape)
y_measurement = y_true + w_array


이것도 그려 보자.<br>
Let's plot this too.

In [None]:
py.plot(x_array, y_true, label='true')
py.plot(x_array, y_measurement, '.', label='measurements')

py.grid(True)
py.ylim(ymin=0)
py.legend(loc=0)
py.xlabel('x')
py.ylabel('y')
