<a href="https://colab.research.google.com/github/kangwonlee/nmisp/blob/reorg-optim/15_optimization/020_Curve_Fitting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Curve fitting by Optimization<br>최적화를 이용한 곡선적합



This time, let's assume that $x$ and $y$ are related in the following polynomial form.<br>이번에는 $x$와 $y$사이의 관계가 다음과 같은 다항식의 형태를 가진다고 가정해 보자.



$$
y = ax^2 + bx + c
$$



Now we can tweak $a$ in addition to $b$ and $c$ to fit the curve to the data.<br>데이터를 위한 최적의 곡선을 찾기 위해 $b$와 $c$에 더하여 $a$도 바꾸어 볼 수 있다.



In [None]:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as so



Regarding `scipy.optmize.leastsq()`, delete `#` in the following cell and press <kbd>Shift</kbd>+<kbd>Enter</kbd>.<br>`scipy.optmize.leastsq()` 에 대해서는 아래 셀에서 `#`를 지우고 <kbd>Shift</kbd>+<kbd>Enter</kbd>를 눌러 본다.



In [None]:
# help(so.leastsq)



* `scipy.optmize.minimize()`
* `scipy.optmize.leastsq()`
* `scipy.optmize.curve_fit()`



데이터 생성<br>Generating data



In [None]:
x_data = np.linspace(-10, 10)
y_true = (x_data - 1.0) * (x_data - 2.0)
noise = np.random.normal(0, 10, y_true.shape)
y_measure = y_true + noise



생성한 데이터를 표시<br>
Visualizing the data



In [None]:
plt.plot(x_data, y_true, label="true")
plt.plot(x_data, y_measure, '.', label="with noise")
plt.grid(True)
plt.legend(loc=0)
plt.show()
plt.close();



## Mathematical Model<br>수학적 모델



Let's assume that the relationship between `x` and `y` can be represented as a second order polynomial.<br>`x` 와 `y` 사이의 관계를 2차 다항식으로 표현할 수 있다고 가정해 보자.



Accepting coefficients from parameters, let's implement a function of a second order polynomial.<br>
계수를 매개변수로 받아들이는 2차 다항식 모델을 함수로 구현한다.



In [None]:
def model(x, a, b, c):
    result = x * x * a + x * b + c
    # or
    # result = np.polyval((a, b, c), x)
    return result



In [None]:
def curve_fitting(coefs):
    a, b, c = coefs
    return model(x_data, a, b, c)



위 함수를 시각화<br>Plot the function above



In [None]:
def plot_curve_fitting(coefs):
    plt.plot(x_data, np.polyval(coefs, x_data), 'o', label="math model in polyval()")
    plt.plot(x_data, y_measure, '.', label="noisy measurements")
    plt.plot(x_data, curve_fitting(coefs), label="math model")

    plt.title(f"coefs = {coefs}")
    plt.legend(loc=0)
    plt.grid(True)
    plt.show()
    plt.close();



In [None]:
plot_curve_fitting([1, 2, 3])



오차의 제곱의 합의 평균을 계산<br>Calculate root mean square error



In [None]:
def rms(coefs):
    y_fit = curve_fitting(coefs)
    error = y_measure - y_fit
    error_square = error * error
    result = error_square.mean() ** 0.5
    return result



중간 과정을 시각화하는 비용함수를 선언<br>
Declare a cost function visualizing intermediate steps



In [None]:
def rms_plot(coefs:np.ndarray) -> float:
    result = rms(coefs)

    plot_curve_fitting(coefs)

    return result



`scipy.optimize.minimize()` 을 이용하여 곡선 적합<br>
Using `scipy.optimize.minimize()`, find the optimal coefficients



In [None]:
result = so.minimize(rms_plot, [-1, 2, 30], method="Nelder-Mead")
result



In [None]:
plot_curve_fitting(result.x)



### `scipy.optimize.curve_fit()`



In [None]:
popt, pcov = so.curve_fit(model, x_data, y_measure, (1, 2, 3))
result = popt
result



In [None]:
plot_curve_fitting(result)



### `scipy.optimize.leastsq()`



In [None]:
def polynomial_error(param, x_i, y_i, model=model):
    y_i_estimation = model(x_i, *param)

    return (y_i_estimation - y_i)



In [None]:
any_initial_guess = (1, 1, 1)

polynomial_regression_param = so.leastsq(
    polynomial_error, 
    any_initial_guess, 
    args=(x_data, y_measure)
)

polynomial_regression_param



We could use the coefficients as follows.<br>해당 계수는 예를 들어 다음과 같이 사용할 수 있을 것이다.



In [None]:
a_reg, b_reg, c_reg = polynomial_regression_param[0]

y_reg_leastsq = np.polyval(polynomial_regression_param[0], x_data)



이 결과를 그려보자.<br>
Let's plot this result.



In [None]:
plt.plot(x_data, y_true, label='true', alpha=0.3)
plt.plot(x_data, y_measure, '.', label='measurements')
plt.plot(x_data, y_reg_leastsq, '.', label='leastsq()')

plt.grid(True)
plt.ylim(ymin=0)
plt.legend(loc=0)
plt.xlabel('x')
plt.ylabel('y');



## Overfitting<br>과적합



Let's think about a more general case.<br>이제 좀 더 일반적인 경우를 생각해 보자.



What if the highest order of the polynomial $n$ is larger than two?<br>
다항식의 최고 차수 $n$ 이 2 보다 큰 경우를 생각해 보자.



In [None]:
def polynomial_model_n(*x_param):
    return np.polyval(x_param[1:], x_param[0])



In [None]:
n = 10

popt, pcov = so.curve_fit(polynomial_model_n, x_data, y_measure, (1,) * n)

result = popt
result



Let's plot this result.<br>이 결과를 그려보자.



In [None]:
plt.plot(x_data, y_true, label='true', alpha=0.3)
plt.plot(x_data, y_measure, '.', label='measurements')
plt.plot(x_data, np.polyval(polynomial_regression_param[0], x_data), '.', label='n=2')
plt.plot(x_data, np.polyval(popt, x_data), 'x', label='n=10')

plt.grid(True)
plt.legend(loc=0)
plt.xlabel('x')
plt.ylabel('y');



What about x out of the range?<br>범위 밖의 x 는 어떤가?



In [None]:
x_min, x_max = x_data.min(), x_data.max()
x_range = x_max - x_min

x_array2 = np.linspace(x_min + (-0.1)*x_range, x_max + 0.1*x_range)

x_detailed = np.linspace(x_array2.min(), x_array2.max(), len(x_data) * 10)

plt.plot(x_detailed, np.polyval(polynomial_regression_param[0], x_detailed), '.', label='n=2')
plt.plot(x_detailed, np.polyval(popt, x_detailed), 'x', label='n=10')

plt.plot(x_data, y_true, label='true', alpha=0.3)
plt.plot(x_data, y_measure, 'o', label='measurements')

plt.grid(True)
plt.legend(loc=0)
plt.xlabel('x')
plt.ylabel('y');



references :

* Ahush Pant, "Introduction to Linear Regression and Polynomial Regression", Towards Data Science, Medium, Jan 13, 2019, [Online](https://towardsdatascience.com/introduction-to-linear-regression-and-polynomial-regression-f8adc96f31cb).
* Nikolay Mayorov, "Robust nonlinear regressio in scipy", Scipy Cookbook, Aug 17, 2018, [Online](https://scipy-cookbook.readthedocs.io/items/robust_regression.html).



## Final Bell<br>마지막 종



In [None]:
# stackoverfow.com/a/24634221
import os
os.system("printf '\a'");

