Least Angle Regression - or LARS is regression algorithm for high dimensional data. LARS is similar to forward stepwise regression

In [3]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

# 1. Load the dataset
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names

print(X)

# 2. Initialize the estimator (classifier)
knn = KNeighborsClassifier(n_neighbors=3)

# 3. Set up the Sequential Feature Selector
#    - k_features=2: tells the selector to find the two best features.
#    - forward=True: specifies forward selection (as opposed to backward elimination).
#    - scoring='accuracy': sets the evaluation metric for the model.
#    - cv=5: uses 5-fold cross-validation to evaluate model performance,
#            preventing overfitting to the training data.
sfs = SFS(knn, 
          k_features=2, 
          forward=True, 
          floating=False, 
          scoring='accuracy',
          cv=5)

# 4. Perform the feature selection
sfs = sfs.fit(X, y)

# 5. Review the results
# The 'sfs.k_feature_idx_' attribute contains the indices of the selected features.
print('Selected feature indices:', sfs.k_feature_idx_)

# The 'sfs.k_feature_names_' attribute returns the names of the selected features.
selected_features = sfs.k_feature_names_
print('Selected feature names:', selected_features)

# The 'sfs.k_score_' attribute shows the cross-validated performance of the final feature subset.
print('Accuracy of the final model:', sfs.k_score_)

# 6. Transform the data to keep only the selected features
X_selected = sfs.transform(X)


[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

LARS Lasso, a model implemented using the LARS algorith and unlike the impleation based on co ordinate descent, it yields the exact solution which is a piecewise linear fundtion of nor of its coeffecients

Othogonal Matching Pursuit implements the OMP algorith for approximating the fit of linear model with constraints on non zero coefficients

Bayesian Regression is a technique to include regularization parameters in the estimation process. The reg params are not hard set but derived from the data at hand. Similar to this is Bayesian Ridge Regression that uses alpha and lambda

Automatic Relevance Determination or ARD is a kind of linear model which is very simular to Bayesian Ridge but leads to sparser co effecients

Logistic regression, a model implemented for classification then regression