### Linear Regression for Feature Ranking

Linear regression is a commonly used model for **interpreting and ranking features**, especially when the goal is to understand the relationship between input variables and a continuous target.

---

### Linear Regression Model

Linear regression assumes a linear relationship between the input features and the target:

$$
y = X\beta + \varepsilon
$$

where:
- $X \in \mathbb{R}^{n \times p}$ is the feature matrix,
- $\beta \in \mathbb{R}^{p}$ is the vector of coefficients,
- $y \in \mathbb{R}^{n}$ is the target variable,
- $\varepsilon$ is the noise term.

The model is fitted by solving the ordinary least squares (OLS) problem:

$$
\min_{\beta} \; \|y - X\beta\|_2^2
$$

---

### Regularization and Linear Regression

Unlike Ridge or Lasso regression, **linear regression does not include a regularization term**. There is no penalty added to the objective function, and therefore no coefficient shrinkage or sparsity is imposed.

This is important when the goal is **feature ranking rather than automatic feature elimination**.

In practice, this means:
- We **do not want regularization to distort coefficient magnitudes**.
- We want coefficients to reflect the **true contribution of each feature** to the target.
- Feature selection is performed **afterward**, based on coefficient magnitudes or thresholds.

---

### How do we reduce regularization?

In linear regression, **there is nothing to reduce**.

By simply using:

```python
LinearRegression()


In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import SelectFromModel
from sklearn.datasets import load_diabetes

In [3]:
X = pd.DataFrame(load_diabetes().data, columns=load_diabetes().feature_names)
y = pd.Series(load_diabetes().target)

In [4]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

In [5]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

In [6]:
selector = SelectFromModel(
    LinearRegression()       # This corresponds to zero regularization itself
)

In [7]:
selector.fit(X_train_scaled, y_train)

0,1,2
,"estimator  estimator: object The base estimator from which the transformer is built. This can be both a fitted (if ``prefit`` is set to True) or a non-fitted estimator. The estimator should have a ``feature_importances_`` or ``coef_`` attribute after fitting. Otherwise, the ``importance_getter`` parameter should be used.",LinearRegression()
,"threshold  threshold: str or float, default=None The threshold value to use for feature selection. Features whose absolute importance value is greater or equal are kept while the others are discarded. If ""median"" (resp. ""mean""), then the ``threshold`` value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., ""1.25*mean"") may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, ""mean"" is used by default.",
,"prefit  prefit: bool, default=False Whether a prefit model is expected to be passed into the constructor directly or not. If `True`, `estimator` must be a fitted estimator. If `False`, `estimator` is fitted and updated by calling `fit` and `partial_fit`, respectively.",False
,"norm_order  norm_order: non-zero int, inf, -inf, default=1 Order of the norm used to filter the vectors of coefficients below ``threshold`` in the case where the ``coef_`` attribute of the estimator is of dimension 2.",1
,"max_features  max_features: int, callable, default=None The maximum number of features to select. - If an integer, then it specifies the maximum number of features to  allow. - If a callable, then it specifies how to calculate the maximum number of  features allowed. The callable will receive `X` as input: `max_features(X)`. - If `None`, then all features are kept. To only select based on ``max_features``, set ``threshold=-np.inf``. .. versionadded:: 0.20 .. versionchanged:: 1.1  `max_features` accepts a callable.",
,"importance_getter  importance_getter: str or callable, default='auto' If 'auto', uses the feature importance either through a ``coef_`` attribute or ``feature_importances_`` attribute of estimator. Also accepts a string that specifies an attribute name/path for extracting feature importance (implemented with `attrgetter`). For example, give `regressor_.coef_` in case of :class:`~sklearn.compose.TransformedTargetRegressor` or `named_steps.clf.feature_importances_` in case of :class:`~sklearn.pipeline.Pipeline` with its last step named `clf`. If `callable`, overrides the default feature importance getter. The callable is passed with the fitted estimator and it should return importance for each feature. .. versionadded:: 0.24",'auto'

0,1,2
,"fit_intercept  fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).",True
,"copy_X  copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.",True
,"tol  tol: float, default=1e-6 The precision of the solution (`coef_`) is determined by `tol` which specifies a different convergence criterion for the `lsqr` solver. `tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when fitting on sparse training data. This parameter has no effect when fitting on dense data. .. versionadded:: 1.7",1e-06
,"n_jobs  n_jobs: int, default=None The number of jobs to use for the computation. This will only provide speedup in case of sufficiently large problems, that is if firstly `n_targets > 1` and secondly `X` is sparse or if `positive` is set to `True`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.",
,"positive  positive: bool, default=False When set to ``True``, forces the coefficients to be positive. This option is only supported for dense arrays. For a comparison between a linear regression model with positive constraints on the regression coefficients and a linear regression without such constraints, see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`. .. versionadded:: 0.24",False


In [8]:
X_train.columns[selector.get_support()]

Index(['bmi', 'bp', 's1', 's2', 's5'], dtype='object')

In [9]:
selector.estimator_.coef_

array([  1.35246724, -12.45426893,  26.21004615,  18.61443344,
       -43.26039442,  24.2556288 ,   5.73862584,  13.96342685,
        31.57521526,   1.98339354])