Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #971 turned off joblib when n_jobs == 1 #985

Merged
merged 2 commits into from Nov 12, 2022
Merged

Fixed #971 turned off joblib when n_jobs == 1 #985

merged 2 commits into from Nov 12, 2022

Conversation

NimaSarajpoor
Copy link
Contributor

This PR fixes issue #971

Performance Code

seed = 0
X = np.random.rand(10000, 10) # 10k samples, with 10 features
y = np.random.choice([0, 1], size=10000)

lst = []
for i in range(5):
    tic = time.time()
    efs = EFS(RandomForestClassifier()).fit(X, y) # EFS: ExhaustiveFeatureSelector
    toc = time.time()
    lst.append(toc - tic)

np.mean(lst)

Computing Time

  • branch main: 103 sec
  • this branch: 93 sec

@codecov
Copy link

codecov bot commented Nov 8, 2022

Codecov Report

Base: 77.43% // Head: 77.43% // No change to project coverage 👍

Coverage data is based on head (7599ebf) compared to base (423d217).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 7599ebf differs from pull request most recent head e912885. Consider uploading reports for the commit e912885 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #985   +/-   ##
=======================================
  Coverage   77.43%   77.43%           
=======================================
  Files         198      198           
  Lines       11165    11165           
  Branches     1406     1406           
=======================================
  Hits         8646     8646           
  Misses       2305     2305           
  Partials      214      214           
Impacted Files Coverage Δ
mlxtend/__init__.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@NimaSarajpoor
Copy link
Contributor Author

I will fix this in the upcoming days.

@NimaSarajpoor
Copy link
Contributor Author

@rasbt
I think it is ready. if there is something that I missed, please let me know.

Copy link
Owner

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot, that looks great! Neat & clean solution!

@rasbt
Copy link
Owner

rasbt commented Nov 12, 2022

Was just testing the code and it definitely improved the startup time. When I am trying an example like

import numpy as np
from sklearn.linear_model import LogisticRegression
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

seed = 0
X = np.random.rand(10000, 10) # 10k samples, with 10 features
y = np.random.choice([0, 1], size=10000)

model = LogisticRegression()

efs1 = EFS(model, 
           min_features=1,
           max_features=10,
           scoring='accuracy',
           print_progress=True,
           n_jobs=1,
           cv=5)

efs1 = efs1.fit(X, y)

print('Best accuracy score: %.2f' % efs1.best_score_)
print('Best subset (indices):', efs1.best_idx_)
print('Best subset (corresponding names):', efs1.best_feature_names_)

it still seems to be a bit stuck though. I.e., it would not show any output for like 2-3 min and then iterate through the 1k possibilities in like 1 sec.

I wonder if that's an issue with the verbose display functionality though 🤔

EDIT: No worries, it was a computer issue. It works perfectly now. Actually it solves the problem. Before, a user could not see the progress printed to the command line until all combinations were evaluated. Now, you get the feedback immediately if n_jobs==1

@rasbt rasbt merged commit 55359c7 into rasbt:master Nov 12, 2022
@NimaSarajpoor
Copy link
Contributor Author

EDIT: No worries, it was a computer issue. It works perfectly now. Actually it solves the problem. Before, a user could not see the progress printed to the command line until all combinations were evaluated. Now, you get the feedback immediately if n_jobs==1

Thanks for the info :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants