<h2>Function Body</h2>

In [1]:
# importing libraries
from sklearn.model_selection import train_test_split, RandomizedSearchCV # train-test split
from sklearn.linear_model import Lars                                    # Least Angle Regression model
from sklearn.metrics import r2_score, mean_squared_error, make_scorer    # performance metrics for regression
from sklearn.preprocessing import StandardScaler                         # preprocessing utility for scaling features
import numpy as np                                                       # numerical Python library for array operations
import pandas as pd                                                      # data science essentials
import scipy.stats                                                       # statistical functions and distributions from SciPy
import warnings                                                          # warnings from code                                      
from sklearn.exceptions import FitFailedWarning                          # specific warning for fit failures during model training

# suppressing warnings
warnings.filterwarnings('ignore')

# defining the function
def lars_model(x_data, y_data, seed=1, tuning=True):
    """
 | Description of the Least Angle Regression (Lars) model. 
 | 
 | Model Details:
 |  
 |  Lars was designed primarily to address regression problems in high-dimensional data scenarios, where the number of predictors significantly exceeds the number of observations. 
 |  Traditional regression models often struggle with the "curse of dimensionality," but Lars is specifically tailored to manage such conditions efficiently[1][2].
 | 
 |  Lars can be likened to human decision-making, particularly how an individual would assess multiple choices sequentially, adjusting their preferences as more information is gathered.
 |  Initially, Lars starts without any predictors, similar to a decision-making process with no biases, and then incrementally adds variables, mirroring the way humans refine their judgments progressively[1][2].
 | 
 |  The advantages of Lars are evident in its capability to process high-dimensional data and handle multicollinearity with stability and efficiency. 
 |  Its unique approach to weighting predictors provides valuable insights into the importance of each variable, aiding in variable selection and enhancing the interpretability of complex datasets[1][2].
 |  Nonetheless, Lars also comes with disadvantages. Due to its sequential processing nature, it can be computationally demanding when dealing with an extensive number of predictors. 
 |  While it manages multicollinearity effectively, its performance might degrade due to an overwhelming number of correlated variables. 
 |  Lars may not be suitable for datasets with a substantial number of irrelevant features and operates under the presumption of a linear relationship between predictors and the response, 
 |  which is a limitation in scenarios where this assumption does not apply[1][2].
 | 
 | Model Calculations:
 |  Lars calculates by minimizing the residual sum of squares (RSS). It progressively incorporates predictors based on their correlation with the residual. 
 |  This is done strategically by choosing the predictor most correlated with the current residual and adjusting its coefficient in the direction of a least squares fit. 
 |  The process continues by minimizing the angle between the coefficient vector and each predictor's correlation with the residual. 
 |  This is why the method is called "Least Angle Regression." The methodology allows for simultaneous adjustments of all predictors' coefficients, 
 |  maintaining equiangular conditions between the steps, which contributes to the model's stability and interpretability[1][2][3][4].
 |  
 |  Lars has an automated feature selection mechanism. It adds predictors to the model one by one, based on their correlation with the residual. 
 |  This stepwise addition of predictors continues until a stopping criterion is met, which could be a predefined number of non-zero coefficients or no further reduction in RSS. 
 |  This method of feature selection allows for an interpretable model that highlights the most significant predictors[1][2].
 | 
 |  Standardizing or normalizing data is not strictly necessary before using Lars, but doing so can improve the model's interpretability and stability. 
 |  Standardization ensures that each feature contributes equally to the model, preventing variables with larger scales from having a disproportionate influence on the model's behavior,
 |  which is especially important when the model includes regularization terms that assume features are on the same scale[5].
 | 
 |  Key hyperparameters: 
 |   •	eps: This determines the precision of the solution, impacting when the coefficient path is considered to have converged, which is crucial for the stability and accuracy of the model output.
 |   •	fit_intercept: This decides whether an intercept should be included in the model, which can alter the bias of the predictions.
 |   •	normalize: This indicates whether the input variables should be normalized, affecting model performance, particularly when dealing with variables of different scales[1][2][5].
 |   •	max_iter: This controls the maximum number of iterations and thus affects the convergence of the model solution.
 | 
 | Model Results:
 |  Upon completion,Lars outputs an extensive set of results, including coefficients for each predictor in the model. 
 |  These coefficients elucidate the strength and direction of the relationship between predictors and the response variable. 
 |  The model also generates an array of alphas, representing the penalty values at which each predictor is introduced, offering insights into their importance[7]. 
 |  Furthermore, Lars provides R-squared values for both training and testing datasets, illustrating the model's explanatory power.
 |  Mean squared error (MSE) metrics for the training and testing phases aid in evaluating the model's prediction accuracy and generalization capabilities[8]. 
 |  The difference between training and testing R-squared values, known as the train-test gap, serves as an indicator of potential overfitting or underfitting. 
 |  Additionally, the intercept and the evolutionary path of coefficients, available through the fit_path parameter, enrich the model's interpretability, 
 |  assisting in informed decision-making regarding predictor selection[1][9].
 | 
 | 
 | References:
 |  1.	Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). "Least Angle Regression." Annals of Statistics, 32(2), 407-499. 
 | https://www-jstor-org.hult.idm.oclc.org/stable/3448465
 |  2.	Hastie, T., Tibshirani, R., & Friedman, J. (2009). "The Elements of Statistical Learning: Data Mining, Inference, and Prediction." Springer. 
 | https://link.springer.com/book/10.1007/978-0-387-84858-7
 |  3.	Tibshirani, R. (1996). "Regression Shrinkage and Selection via the Lasso." Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288. 
 | https://webdoc.agsci.colostate.edu/koontz/arec-econ535/papers/Tibshirani%20(JRSS-B%201996).pdf
 |  4.	Hoerl, A.E., & Kennard, R.W. (1970). "Ridge Regression: Biased Estimation for Nonorthogonal Problems." Technometrics, 12(1), 55-67. 
 | https://homepages.math.uic.edu/~lreyzin/papers/ridge.pdf
 |  5.	Friedman, J., Hastie, T., & Tibshirani, R. (2010). "Regularization Paths for Generalized Linear Models via Coordinate Descent." Journal of Statistical Software, 33(1), 1-22. 
 | https://www.jstatsoft.org/article/view/v033i01
 |  6.	Zou, H., & Hastie, T. (2005). "Regularization and Variable Selection via the Elastic Net." Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320. 
 | https://web-p-ebscohost-com.hult.idm.oclc.org/ehost/pdfviewer/pdfviewer?vid=0&sid=47bff9ae-303b-48e9-8647-48d621896f8d%40redis
 |  7.	Meinshausen, N., & Bühlmann, P. (2010). "Stability Selection." Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473.
 | https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2010.00740.x
 |  8.	Breiman, L. (1996). "Heuristics of Instability and Stabilization in Model Selection." Annals of Statistics, 24(6), 2350-2383. https://www-jstor-org.hult.idm.oclc.org/stable/2242688
 |  9.	Tibshirani, R., & Taylor, J. (2012). "Degrees of Freedom in Lasso Problems." The Annals of Statistics, 40(2), 1198-1232.
 | https://www-jstor-org.hult.idm.oclc.org/stable/41713670
    
    """

    # train-test split with stratification
    X_train, X_test, y_train, y_test = train_test_split(
             x_data, 
             y_data, 
             test_size=0.2, 
             random_state=seed)

    # instantiating model object
    lars = Lars()

    # performing hyperparameter tuning 
    if tuning:
        param_grid = { 'fit_intercept': [True, False],               # whether to calculate the intercept for this model                                                                             
                       'normalize': [True, False],                   # whether to normalize the regressors (X) before fitting                                                                          
                       'eps': scipy.stats.reciprocal(1e-5, 1e-2)}    # range for epsilon, stopping criteria's tolerance                                                                              

        scorer = make_scorer(r2_score)
       
        # randomizedSearchCV object
        search = RandomizedSearchCV(estimator           = lars, 
                                    param_distributions = param_grid, 
                                    scoring             =scorer, 
                                    cv                  =5, 
                                    n_iter              =100,
                                    random_state        =seed, 
                                    n_jobs              =-1)
        # fit the hyperparameter search model using the training data
        search.fit(X_train, y_train)
       
        # printing the optimal parameters and number of models evaluated
        best_params = search.best_params_
        best_params['eps'] = round(best_params['eps'], 7)
        print('Best Parameters: ', best_params)
        print("Number of Models Evaluated:", search.n_iter)
        lars.set_params(**best_params)

        
    # training the model
    lars.fit(X_train, y_train)

    # assesing the model
    y_train_pred = lars.predict(X_train)
    y_test_pred = lars.predict(X_test)
    
    # zipping each feature name to its coefficient
    model_coefficients = zip(X_train.columns, lars.coef_.round(decimals = 4))
    # setting up a placeholder list to store model features
    coefficient_lst = [('intercept', lars.intercept_.round(decimals = 4))]
    # printing out each feature-coefficient pair one by one
    for coefficient in model_coefficients:
        coefficient_lst.append(coefficient)
    
    # dynamically printing results
    # every numeric result is rounded (for the end user)
    results = f"""\
    
    R-Square (training set): {r2_score(y_train, y_train_pred).round(2)}
    R-Square (testing set): {r2_score(y_test, y_test_pred).round(2)}
    Train-Test Gap: {(abs(r2_score(y_train, y_train_pred) - r2_score(y_test, y_test_pred))).round(2)}
    AIC (training set): {(len(y_train) * np.log(mean_squared_error(y_train, y_train_pred)) + 2 * (len(lars.coef_)+1)).round(0)}
    AIC (testing set): {(len(y_test) * np.log(mean_squared_error(y_test, y_test_pred)) + 2 * (len(lars.coef_)+1)).round(0)}
    BIC (training set): {(len(y_train) * np.log(mean_squared_error(y_train, y_train_pred)) + np.log(len(y_train)) * (len(lars.coef_)+1)).round(0)}
    BIC (testing set): {(len(y_test) * np.log(mean_squared_error(y_test, y_test_pred)) + np.log(len(y_test)) * (len(lars.coef_)+1)).round(0)}
    
    Coefficents
    -----------
    {pd.DataFrame(data = coefficient_lst, columns = ["Feature", "Coefficient"])}"""

    print(results)
       
        
    return 


<h2>Model Testing</h2>

In [2]:
help(lars_model)

Help on function lars_model in module __main__:

lars_model(x_data, y_data, seed=1, tuning=True)
    | Description of the Least Angle Regression (Lars) model. 
    | 
    | Model Details:
    |  What type of problem(s) was Lars originally designed to solve?
    |  The Lars model was designed primarily to address regression problems in high-dimensional data scenarios, where the number of predictors significantly exceeds the number of observations. 
    |  Traditional regression models often struggle with the "curse of dimensionality," but Lars is specifically tailored to manage such conditions efficiently[1][2].
    | 
    |  The Lars model can be likened to human decision-making, particularly how an individual would assess multiple choices sequentially, adjusting their preferences as more information is gathered.
    |  Initially, Lars starts without any predictors, similar to a decision-making process with no biases, and then incrementally adds variables, mirroring the way humans refi

In [3]:
# Y variables is Sale_Price

# specifying the path and file name
file = 'housing_feature_rich.xlsx'

# reading the file into Python
housing = pd.read_excel(io     = file,
                        header = 0   )

# remove the 'property_id' column
housing.drop(labels  = ['property_id'], axis    = 1, inplace = True)

# setting columns
x_original = list(housing.loc[ : , 'Lot_Area' : 'Porch_Area' ])
original_y = 'Sale_Price'

# preparing x-data & y-data
x_data = housing[ x_original ]
y_data = housing[ original_y ]

lars_model(x_data, y_data, seed=1, tuning=True)

Best Parameters:  {'eps': 0.0001783, 'fit_intercept': True, 'normalize': True}
Number of Models Evaluated: 100
    
    R-Square (training set): 0.69
    R-Square (testing set): 0.76
    Train-Test Gap: 0.07
    AIC (training set): 50234.0
    AIC (testing set): 12400.0
    BIC (training set): 50286.0
    BIC (testing set): 12440.0
    
    Coefficents
    -----------
             Feature  Coefficient
0      intercept  -12393.1613
1       Lot_Area       0.0930
2   Mas_Vnr_Area      62.4633
3  Total_Bsmt_SF      48.5242
4   First_Flr_SF     100.1824
5  Second_Flr_SF     105.6911
6    Gr_Liv_Area     -44.9923
7    Garage_Area      93.4549
8     Porch_Area      35.8498


In [4]:
# Y variables is log_Sale_Price

# specifying the path and file name
file = 'housing_feature_rich.xlsx'

# reading the file into Python
housing = pd.read_excel(io     = file,header = 0   )

# remove the 'property_id' column
housing.drop(labels  = ['property_id'], axis    = 1, inplace = True)

# setting columns
x_original = list(housing.loc[ : , 'Lot_Area' : 'Porch_Area' ])
original_y = 'log_Sale_Price'

# preparing x-data & y-data
x_data = housing[ x_original ]
y_data = housing[ original_y ]

lars_model(x_data, y_data, seed=1, tuning=True)

Best Parameters:  {'eps': 2.76e-05, 'fit_intercept': True, 'normalize': False}
Number of Models Evaluated: 100
    
    R-Square (training set): 0.67
    R-Square (testing set): 0.72
    Train-Test Gap: 0.05
    AIC (training set): -6822.0
    AIC (testing set): -1790.0
    BIC (training set): -6770.0
    BIC (testing set): -1751.0
    
    Coefficents
    -----------
             Feature  Coefficient
0      intercept      11.0219
1       Lot_Area      -0.0000
2   Mas_Vnr_Area       0.0001
3  Total_Bsmt_SF       0.0003
4   First_Flr_SF       0.0005
5  Second_Flr_SF       0.0006
6    Gr_Liv_Area      -0.0003
7    Garage_Area       0.0006
8     Porch_Area       0.0002
