# LOWESS(Locally Weighted Scatterplot Smoothing)
---
LOWESS is a ***non-parametric*** fitting technique, which means that we do noot need to assume that data follows any specific distribution.

We do not need to assume that data follows any speicific distribution. 
one important addition to LOWESS over OLS(ordinary least squares) is that it uses **weights**.

The algorithm uses a tri-cube weight function-

$W(x) = (1 - |d|^{3})^{3}$
where d is the distance of a given datapoint from the point on the curve beiing fitted, scaled to lie in the range between 0 nd 1.

***The LOWESS model is very computationally expensive thus we should not use it in all of our regression problems but for specific problems.***

In [11]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go 
import plotly.express as px
import statsmodels.api as sm
from scipy.interpolate import interp1d
from statsmodels.nonparametric.smoothers_lowess import lowess

In [7]:
df = pd.read_csv('../data/Real_estate.csv', encoding='utf-8')

In [8]:
fig = px.scatter(df, x = df['X3 distance to the nearest MRT station'], y = df['Y house price of unit area'],
                    opacity=0.8, color_discrete_sequence=['black'])

fig.update_layout(dict(plot_bgcolor = 'white'))

fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
                 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
                 showline=True, linewidth=1, linecolor='black')

fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
                 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
                 showline=True, linewidth=1, linecolor='black')


fig.update_layout(title = dict(text="House Price Based on Distance from the Nearest MRT", 
                             font=dict(color='black')))

fig.update_traces(marker = dict(size = 3))

fig.show()

In [13]:
X = df['X3 distance to the nearest MRT station'].values.reshape(-1,1)
x = df['X3 distance to the nearest MRT station'].values
y = df['Y house price of unit area']

model1 = LinearRegression()
LR = model1.fit(X, y)

x_range = np.linspace(X.min(), X.max(), 20)
y_range = model1.predict(x_range.reshape(-1,1))

y_hat1 = lowess(y, x)
y_hat2 = lowess(y, x, frac = 1/5)

In [14]:
# Create a scatter plot
fig = px.scatter(df, x=df['X3 distance to the nearest MRT station'], y=df['Y house price of unit area'], 
                 opacity=0.8, color_discrete_sequence=['black'])

# Add the prediction line
fig.add_traces(go.Scatter(x=x_range, y=y_range, name='Linear Regression', line=dict(color='limegreen')))
fig.add_traces(go.Scatter(x=y_hat1[:,0], y=y_hat1[:,1], name='LOWESS, frac=2/3', line=dict(color='red')))
fig.add_traces(go.Scatter(x=y_hat2[:,0], y=y_hat2[:,1], name='LOWESS, frac=1/5', line=dict(color='orange')))

# Change chart background color
fig.update_layout(dict(plot_bgcolor = 'white'))

# Update axes lines
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
                 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
                 showline=True, linewidth=1, linecolor='black')

fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
                 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
                 showline=True, linewidth=1, linecolor='black')

# Set figure title
fig.update_layout(title=dict(text="House Price Based on Distance from the Nearest MRT with Model Predictions", 
                             font=dict(color='black')))

# Update marker size
fig.update_traces(marker=dict(size=3))

fig.show()

## Predicting new values
---
the lowess algorithm from statsmodels does not provide us with a predict() method. 
Fortunately we can use scipy library as a workaround nu using interpolation.

In [15]:
f_linear = interp1d(y_hat1[:,0], y=y_hat1[:,1], bounds_error=False, kind='linear', fill_value='extrapolate') 
f_nearest = interp1d(y_hat1[:,0], y=y_hat1[:,1], bounds_error=False, kind='nearest', fill_value='extrapolate') 

xnew = [300, 600, 900, 1200, 1500, 1800, 2100, 6400]
ynew_linear = f_linear(xnew)
ynew_nearest = f_nearest(xnew)
print(ynew_linear)
print(ynew_nearest)

[45.17484583 38.88067785 33.63954152 30.70005122 28.90428712 27.31620311
 26.02059902 11.5419846 ]
[45.02258129 38.86385487 33.43419447 31.09566559 28.91596696 27.30837281
 26.0121316  11.55394747]
