In [0]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression

In [0]:
data = load_diabetes()

In [3]:
# fit a linear regression model to the data
lr_model = LinearRegression()
lr_model.fit(data.data, data.target)
lr_model.coef_

array([ -10.01219782, -239.81908937,  519.83978679,  324.39042769,
       -792.18416163,  476.74583782,  101.04457032,  177.06417623,
        751.27932109,   67.62538639])

The objective is to find the best parameters (w1, w2 …, wn) that will get the predictions, ŷ̂, very close to the actual target values, y. So, once you have trained your model and are getting good predictive performance without much overfitting, you can use these parameters (or coefficients) to understand which variables largely impacted the predictions.

A large positive or a large negative number for a feature coefficient means it has a strong influence on the outcome. On the other hand, if the coefficient is close to 0, this means the variable does not have much impact on the prediction.

In [4]:
import pandas as pd
coeff_df = pd.DataFrame()
coeff_df['feature'] = data.feature_names
coeff_df['coefficient'] = lr_model.coef_
coeff_df.head()

Unnamed: 0,feature,coefficient
0,age,-10.012198
1,sex,-239.819089
2,bmi,519.839787
3,bp,324.390428
4,s1,-792.184162


From this table, we can see that column s1 has a very low coefficient (a large negative number) so it negatively influences the final prediction. If s1 increases by a unit of 1, the prediction value will decrease by -792.184162. On the other hand, bmi has a large positive number (519.839787) on the prediction, so the risk of diabetes is highly linked to this feature: an increase in body mass index (BMI) means a significant increase in the risk of diabetes.