Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joblib makes callstacks impossible to decipher #12322

Open
amueller opened this issue Oct 8, 2018 · 6 comments
Open

Joblib makes callstacks impossible to decipher #12322

amueller opened this issue Oct 8, 2018 · 6 comments

Comments

@amueller
Copy link
Member

amueller commented Oct 8, 2018

So I'm trying to work with a script of mine which is a pipeline containing a ColumnTransformer in cross-validation. I'm trying to do some profiling, but each of these three things calls into joblib, which adds 3-4 layers each to the stack. Similarly if I want to do interactive debugging, this is pretty hard to manage.

This is for n_jobs=None. I'm not sure if n_jobs=1 would make it better?

I suggest we either hard-code a separate path for n_jobs=1 into scikit-learn, or there could be a shortcut in joblib with a single level on the callstack.
Either way, right now this is very difficult to work with.

@rth
Copy link
Member

rth commented Oct 8, 2018

This is for n_jobs=None. I'm not sure if n_jobs=1 would make it better?
I suggest we either hard-code a separate path for n_jobs=1 into scikit-learn,

It probably would as then one wouldn't have to call the functions determining the number of n_jobs in a given context. Though I have a hard time seeing how we could hard code n_jobs=1 and at the same time support the use cases of n_jobs=None with context managers.

or there could be a shortcut in joblib with a single level on the callstack.

Maybe it could be worth opening an issue about it at joblib?

Making joblib 0.11, that did have a fast path for n_jobs=1, if I understand correctly, compatible with scikit learn 0.20 and use a site joblib 0.11 could be another imperfect solution joblib/joblib#786

@GaelVaroquaux
Copy link
Member

It's fun, because many years ago I wrote joblib because I kept having such pairs of paths in my code, and it turned out that it was a continuous source of bugs.

I would advice avoiding that, and trying to find a fix in joblib.

@ogrisel
Copy link
Member

ogrisel commented Oct 9, 2018

n_jobs=1 should behave exactly as n_jobs=None by default. We could probably makes some effort to flatten the call trace when the SequentialBackend is active though (which is the case by default).

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Oct 9, 2018 via email

@amueller
Copy link
Member Author

amueller commented Oct 9, 2018

I'm not sure there's a way around the nesting, though?

So with the call to delayed there will always be at least 2 levels, right?
So if I have cross-validation and pipeline and column transformer I will have 6 levels from Parallel, and probably two more levels from joblib.memory, and then some sklearn indirections (_fit_transform_one etc)...

hrm...

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Oct 9, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants