## Non-Linear Regression


$$
y = \beta_0 + \beta_1 f_1(x) +\beta_2 f_2(x) + ... + \epsilon
$$

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std


  import pandas.util.testing as tm


Create data set for a 3D feature vector

$$
y = \beta_0 + \beta_1 x+ \beta_2 x^2+ \epsilon
$$


To generate data we assign values to the coefficients
$$
y = 50 + 15 x - 2 x^2 + 2*N(0,1)
$$

In [None]:
nsample = 200
x = np.linspace(0, 10, nsample)
X = np.column_stack((x, x**2))
beta = np.array([50, 15, -2])
e = 2*np.random.normal(size=nsample)
X = sm.add_constant(X)
y = np.dot(X, beta) + e

Use statsmodels.api OLS to fit the model

In [None]:
print(beta)
print(X)
print(y)
import plotly.express as px
fig = px.scatter_3d(X,X[:,1],X[:,2],y[:],color=y)
fig.show()

[50 15 -2]
[[1.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.00000000e+00 5.02512563e-02 2.52518876e-03]
 [1.00000000e+00 1.00502513e-01 1.01007550e-02]
 [1.00000000e+00 1.50753769e-01 2.27266988e-02]
 [1.00000000e+00 2.01005025e-01 4.04030201e-02]
 [1.00000000e+00 2.51256281e-01 6.31297189e-02]
 [1.00000000e+00 3.01507538e-01 9.09067953e-02]
 [1.00000000e+00 3.51758794e-01 1.23734249e-01]
 [1.00000000e+00 4.02010050e-01 1.61612081e-01]
 [1.00000000e+00 4.52261307e-01 2.04540289e-01]
 [1.00000000e+00 5.02512563e-01 2.52518876e-01]
 [1.00000000e+00 5.52763819e-01 3.05547840e-01]
 [1.00000000e+00 6.03015075e-01 3.63627181e-01]
 [1.00000000e+00 6.53266332e-01 4.26756900e-01]
 [1.00000000e+00 7.03517588e-01 4.94936997e-01]
 [1.00000000e+00 7.53768844e-01 5.68167471e-01]
 [1.00000000e+00 8.04020101e-01 6.46448322e-01]
 [1.00000000e+00 8.54271357e-01 7.29779551e-01]
 [1.00000000e+00 9.04522613e-01 8.18161158e-01]
 [1.00000000e+00 9.54773869e-01 9.11593142e-01]
 [1.00000000e+00 1.00502513e+

In [None]:
model = sm.OLS(y, X)
results = model.fit()


Print the fitted coefficients

In [None]:
print('Parameters: ', results.params)
print('R2: ', results.rsquared)

Parameters:  [49.95015781 14.89308238 -1.98987895]
R2:  0.9898151772387199


Print goodness of fit

In [None]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.990
Model:                            OLS   Adj. R-squared:                  0.990
Method:                 Least Squares   F-statistic:                     9573.
Date:                Fri, 25 Sep 2020   Prob (F-statistic):          6.07e-197
Time:                        11:12:09   Log-Likelihood:                -433.73
No. Observations:                 200   AIC:                             873.5
Df Residuals:                     197   BIC:                             883.4
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         49.9502      0.448    111.526      0.0