# BASICS OF SCIKIT-LEARN
* Scikit-learn is a dedicated machine-learning library, providing a rich collection of algorithms and tools specifically designed for machine-learning tasks, where the main functionality includes classification, regression, clustering, dimensionality reduction, model selection and pre-processing. The library is very simple to use and most importantly efficient as it is built on NumPy, SciPy and matplotlib.
* We will make use of a gradient descent regression model of scikit-learn, which is  **sklearn.linear_model.SGDRegressor**. The model performs best with normalized inputs. 
* The **sklearn.preprocessing.StandardScaler** will perform z-score normalization for getting the normalized inputs.

In [3]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler

np.set_printoptions(precision=2)
# Load the data
data = np.loadtxt("assets/data/houses.txt",delimiter=',') 
x_train,y_train=data[:,0:4], data[:,4] 

# Preprocessing, normalization
scaler = StandardScaler()
x_norm = scaler.fit_transform(x_train)

# gradient descent algorithm works.
sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(x_norm, y_train)
print(sgdr)
print(f"completed iterations: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}")

b_optim = sgdr.intercept_
w_optim = sgdr.coef_
print(f"model parameters->w: {w_optim}, b:{b_optim}")

# There are 2 different ways of making predictions
# make a prediction using sgdr.predict()
y_pred_sgd = sgdr.predict(x_norm)
# make a prediction using w,b and matrix operations 
y_pred = np.dot(x_norm, w_optim) + b_optim  

print(f"Prediction on training set:\n{y_pred[:4]}" )
print(f"Prediction on training set:\n{y_pred_sgd[:4]}" )
print(f"Target values \n{y_train[:4]}")

SGDRegressor()
completed iterations: 141, number of weight updates: 14101.0
model parameters->w: [110.37 -21.35 -32.5  -37.83], b:[362.27]
Prediction on training set:
[248.64 295.59 485.83 389.71]
Prediction on training set:
[248.64 295.59 485.83 389.71]
Target values 
[271.5 300.  509.8 394. ]
