## Linear Regression API simply application

In [6]:
from sklearn.linear_model import LinearRegression

# load data
x = [[80, 86],
     [82, 80],
     [85, 78],
     [90, 90],
     [86, 82],
     [82, 90],
     [78, 80],
     [92, 94]]

y = [84.2, 80.6, 80.1, 90, 83.2, 87.6, 79.4, 93.4]

# create model
estimator = LinearRegression()

# train model
estimator.fit(x, y)

# print coef
print(estimator.coef_)

# use model to predict
estimator.predict([[100, 80]])

[0.3 0.7]


array([86.])

## Boston house price forecast

In [7]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import SGDRegressor

def linear_model_normal_equation():
    """METHOD1: NORMAL EQUATION
    """
    # 1. get data
    data = load_boston()

    # 2. data preparation
    x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22)

    # 3. Normalization
    transformer = StandardScaler()
    x_train = transformer.fit_transform(x_train)
    x_test = transformer.transform(x_test)

    # 4. create model
    estimator = LinearRegression()
    # train model
    estimator.fit(x_train, y_train)

    # 5. evaluation
    y_predict = estimator.predict(x_test)

    print('predict result:', y_predict)
    print('coefficients:', estimator.coef_)
    print('intercept in the model:', estimator.intercept_)

    error = mean_squared_error(y_test, y_predict)
    print('error:', error)

linear_model_normal_equation()

predict result: [28.22944896 31.5122308  21.11612841 32.6663189  20.0023467  19.07315705
 21.09772798 19.61400153 19.61907059 32.87611987 20.97911561 27.52898011
 15.54701758 19.78630176 36.88641203 18.81202132  9.35912225 18.49452615
 30.66499315 24.30184448 19.08220837 34.11391208 29.81386585 17.51775647
 34.91026707 26.54967053 34.71035391 27.4268996  19.09095832 14.92742976
 30.86877936 15.88271775 37.17548808  7.72101675 16.24074861 17.19211608
  7.42140081 20.0098852  40.58481466 28.93190595 25.25404307 17.74970308
 38.76446932  6.87996052 21.80450956 25.29110265 20.427491   20.4698034
 17.25330064 26.12442519  8.48268143 27.50871869 30.58284841 16.56039764
  9.38919181 35.54434377 32.29801978 21.81298945 17.60263689 22.0804256
 23.49262401 24.10617033 20.1346492  38.5268066  24.58319594 19.78072415
 13.93429891  6.75507808 42.03759064 21.9215625  16.91352899 22.58327744
 40.76440704 21.3998946  36.89912238 27.19273661 20.97945544 20.37925063
 25.3536439  22.18729123 31.13342301 


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

fit_transform()方法用于计算特征数据的均值和标准差，并将原始特征数据进行标准化处理。该方法包含两个步骤，先fit()计算特征数据的均值和标准差，再transform()进行标准化处理。在训练集上调用fit_transform()，是为了根据训练集的数据计算均值和标准差，然后将训练集数据进行标准化处理。

而在测试集上，我们只需要使用训练集得到的均值和标准差对测试集进行相同的标准化处理，而不需要重新计算。因此，我们只需要调用transform()方法，使用之前fit_transform()在训练集上计算得到的均值和标准差，对测试集进行标准化处理。

In [8]:
def linear_model2():
    """ METHOD2:SGDRegressor
    """
    # 1. get data
    data = load_boston()

    # 2. data preparation
    x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22)

    # 3. normalization
    transformer = StandardScaler()
    x_train = transformer.fit_transform(x_train)
    x_test = transformer.transform(x_test)

    # 4. create Model
    estimator = SGDRegressor(max_iter=1000)
    estimator.fit(x_train, y_train)

    # 5. model evaluation
    y_predcit = estimator.predict(x_test)
    print('predict result:', y_predcit)
    print('coefficients:', estimator.coef_)
    print('intercept in the model:', estimator.intercept_)

    error = mean_squared_error(y_test, y_predcit)
    print('error:', error)

linear_model2()

predict result: [28.24148337 31.60628939 21.38679428 32.65086631 20.09356296 19.03278764
 21.33491509 19.43174612 19.61939941 32.84028009 21.32033085 27.37369297
 15.60597634 19.88232337 36.9160298  18.81111025  9.50079384 18.55538839
 30.68198512 24.30430341 19.0206954  34.06115355 29.56248987 17.47183193
 34.80385339 26.59624302 34.47816845 27.37269689 19.08375715 15.42714316
 30.82125655 14.90392722 37.41315497  8.34928826 16.31477472 17.00379992
  7.60795117 19.86905989 40.45794344 29.02054052 25.23365367 17.75890517
 38.94674495  6.80588288 21.67732106 25.14824719 20.66784488 20.56910675
 17.11253759 26.18763432  9.39491428 27.28763897 30.57427586 16.62694159
  9.50285435 35.4821122  31.81566213 22.60460191 17.58336501 21.85055174
 23.64355677 24.03295133 20.24924818 38.25135451 25.39307792 19.6920338
 14.02491124  6.76086093 42.20920966 21.89133297 16.89574689 22.47557293
 40.77315033 21.6466213  36.86132536 27.19851606 21.48770503 20.70434434
 25.31096125 23.23034823 31.37903854


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho