# LarsCV

Least-angle regression (LARS) is a regression algorithm for high-dimensional data, developed by Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani. LARS is similar to forward stepwise regression. At each step, it finds the feature most correlated with the target. When there are multiple features having equal correlation, instead of continuing along the same feature, it proceeds in a direction equiangular between the features.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('/kaggle/input/mushroom-classification/mushrooms.csv')
df

# Describing Null values

In [None]:
df.isnull().sum()

# Taking only 500 values from dataset

In [None]:
X = df.iloc[:500]
x = X.drop(columns=['habitat'])
x

In [None]:
Y = df.iloc[:500]
y = Y['habitat']
y

# Label Encoding

In [None]:
from sklearn import preprocessing 
label_encoder = preprocessing.LabelEncoder()  
x= x.apply(label_encoder.fit_transform)
print(x)
y= label_encoder.fit_transform(y)
print(y)

# LarsCV Algorithm

In [None]:
from sklearn.linear_model import LarsCV
#x, y = make_regression(n_samples=200, noise=4.0, random_state=0)
reg = LarsCV(cv=5).fit(x, y)
reg.score(x, y)

# Pros

* It is numerically efficient in contexts where the number of features is significantly greater than the number of samples.
* It is easily modified to produce solutions for other estimators, like the Lasso.

# Cons

* Because LARS is based upon an iterative refitting of the residuals, it would appear to be especially sensitive to the effects of noise.

# Reference

https://docs.w3cub.com/scikit_learn/modules/generated/sklearn.linear_model.larscv/
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LarsCV.html