## Linear Regression using Scikit-Learn

This is an open-source, commercially usable machine learning toolki: [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms seen in other notebooks of this project

In [1]:
import numpy as np
import matplotlib as plt
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler

The following is a helper function to load the house data features and prices dataset

In [4]:
def load_house_dataset():
    '''
    Loads both features and targer values from the raw dataset, as specified in the path below.
    Example of data in the file:

    1.244000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,6.400000000000000000e+01,3.000000000000000000e+02
    1.947000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.700000000000000000e+01,5.098000000000000114e+02

        Returns: X matrix and y vector, numpy structures.

    '''
    data = np.loadtxt("./data/raw/houses_features_prices.txt", delimiter=',', skiprows=1)
    """
    This line selects all rows (indicated by the : before the first comma) and all columns except the last one (indicated by :-1). 
    The :-1 means "from the beginning to the second last element" (or "from the beginning to the last element minus one"). 
    
    This creates a new numpy array X containing all the features (i.e., all columns except the last one).
    """
    X = data[:, :4]
    """
    This line selects all rows (indicated by the : before the comma) and only the last column (indicated by -1). 
    In numpy, -1 refers to the last index. This creates a new numpy array y containing only the target variable (i.e., the last column).
    """
    y = data[:, 4]
    return X, y 

# Gradient Descent in Scikit-Learn
Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor).  Like previous implementations of gradient descent in this project, the model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization . In the library it is referred to as 'standard score'.

Below code loads the dataset

In [5]:
data = load_house_dataset()

FileNotFoundError: ./data/raw/houses_features_prices.txt not found.