## Sci-kit learn docs

In [2]:
import matplotlib as plt
%matplotlib inline
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

### Linear regression

In [20]:
dataset = datasets.load_boston()
attributes = datasets.load_boston().keys()
feature_names = dataset.get('feature_names')

In [15]:
attributes

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])

In [19]:
print(dataset.DESCR)

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu

In [25]:
reg = linear_model.LinearRegression(fit_intercept=True, normalize=True, n_jobs=3)
reg.fit(dataset.data, dataset.target)
reg.coef_

array([-1.08011358e-01,  4.64204584e-02,  2.05586264e-02,  2.68673382e+00,
       -1.77666112e+01,  3.80986521e+00,  6.92224640e-04, -1.47556685e+00,
        3.06049479e-01, -1.23345939e-02, -9.52747232e-01,  9.31168327e-03,
       -5.24758378e-01])

In [29]:
dict(zip(feature_names, reg.coef_))

{'CRIM': -0.10801135783679654,
 'ZN': 0.04642045836688112,
 'INDUS': 0.020558626367071745,
 'CHAS': 2.686733819344878,
 'NOX': -17.76661122830015,
 'RM': 3.8098652068092136,
 'AGE': 0.0006922246403444446,
 'DIS': -1.475566845600255,
 'RAD': 0.306049478985174,
 'TAX': -0.01233459391657445,
 'PTRATIO': -0.9527472317072883,
 'B': 0.00931168327379387,
 'LSTAT': -0.5247583778554885}

In [31]:
print(datasets.load_boston.__doc__)

Load and return the boston house-prices dataset (regression).

    Samples total               506
    Dimensionality               13
    Features         real, positive
    Targets           real 5. - 50.

    Read more in the :ref:`User Guide <boston_dataset>`.

    Parameters
    ----------
    return_X_y : boolean, default=False.
        If True, returns ``(data, target)`` instead of a Bunch object.
        See below for more information about the `data` and `target` object.

        .. versionadded:: 0.18

    Returns
    -------
    data : Bunch
        Dictionary-like object, the interesting attributes are:
        'data', the data to learn, 'target', the regression targets,
        'DESCR', the full description of the dataset,
        and 'filename', the physical location of boston
        csv dataset (added in version `0.20`).

    (data, target) : tuple if ``return_X_y`` is True

        .. versionadded:: 0.18

    Notes
    -----
        .. versionchanged:: 0.20
           