Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key Error when using bidirectional feature selector. #842

Closed
joeanton719 opened this issue Aug 21, 2021 · 2 comments
Closed

Key Error when using bidirectional feature selector. #842

joeanton719 opened this issue Aug 21, 2021 · 2 comments
Labels

Comments

@joeanton719
Copy link

Describe the bug

I am trying to select the right features for a regression problem using bi-directional feature selection. When I run the program I get the following error.

Steps/Code to Reproduce

Insert your example code here.

%%time
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

cat_full = make_pipeline(
    (RareLabelEncoder(0.002, variables = ['brand', 'model'])),
    (MeanEncoder(variables = high_card_cols)),
    (OrdinalEncoder(variables = cat_cols)), 
    (CatBoostRegressor(learning_rate = 0.1, depth = 6, random_seed = seed, silent = True))
)

bi = SFS(cat_full, 
         k_features="best", 
         scoring = 'neg_root_mean_squared_error', 
         forward=True, 
         floating=True, 
         cv=10)

bi.fit(X_train, y_train)

 
print() 
print(f'Best RMSE: {bi.k_score_*-1:.3f}')
    
bi.k_feature_names_

Expected Results

Actual Results

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<timed exec> in <module>

~\anaconda3\lib\site-packages\mlxtend\feature_selection\sequential_feature_selector.py in fit(self, X, y, custom_feature_names, groups, **fit_params)
    566                     best_subset = k
    567             k_score = max_score
--> 568             k_idx = self.subsets_[best_subset]['feature_idx']
    569 
    570             if self.k_features == 'parsimonious':

KeyError: None

Versions

MLxtend 0.18.0
Windows-10-10.0.19041-SP0
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]
Scikit-learn 0.24.2
NumPy 1.19.5
SciPy 1.5.2

@joeanton719
Copy link
Author

I found a workaround. I think using feature engineering transformers within the pipeline for the model was creating the problem. To solve this, I just created a separate dataset, applied the feature engineering transformations to the dataset, and used that for the sfs inputs.

@rasbt
Copy link
Owner

rasbt commented Aug 22, 2021

Glad you solved the issue/found out what the root cause was. If there's still an issue with the SFS, please don't hesitate to reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants