Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for caching to fail silently in event of pickling failure #473

Closed
tyarkoni opened this issue Jan 1, 2017 · 3 comments
Closed

Option for caching to fail silently in event of pickling failure #473

tyarkoni opened this issue Jan 1, 2017 · 3 comments

Comments

@tyarkoni
Copy link

tyarkoni commented Jan 1, 2017

At the moment, caching relies on pickle, and consequently it's not possible to cache functions that contain lambdas, return generators, etc. I'm aware of the long-term goal to replace pickle with dill (e.g., #240), but in the interim, it would be nice to have an option to let caching fail silently rather than raising an exception. As far as I can tell, there's currently no way to ignore caching failures.

The context here is that I have a decorated function, embedded within a fairly complex pipeline, that usually returns serializable classes, but sometimes needs to return generators. In the latter case, pickling fails, and hence so does the entire pipeline. In such cases, it would be helpful if the cache decorator took an optional argument that suppressed exceptions generated by pickle--effectively doing no caching, but allowing the Memorized function to proceed normally. I understand that this kind of thing is not ideal, as it makes it difficult to know whether or not a particular result was cached--and one certainly wouldn't want this behavior as a default. But in my case (and I suspect in many other applications), the caching is more of a nice feature than a mission-critical one. I.e., I would rather 90% of my calls get cached properly, and have 10% see no benefit at all, than have to radically refactor my code to make sure that cached functions can't return generators.

As far as I can tell, this seems like it could be handled by (i) adding an argument like Memory.cache(...., fail_silently=False), (ii) catching pickling errors surrounding the _persist_output call, and (iii) re-raise them conditionally. I'm happy to submit a PR if there's interest in something like this. Alternatively, if I've missed something obvious, any suggestions would be much appreciated.

@lesteve
Copy link
Member

lesteve commented Jan 19, 2017

I forgot to answer this one sorry. This is certainly doable and your use case seems reasonable. Any strong opinions on this @ogrisel @GaelVaroquaux?

Other than this maybe a few name for the parameter off the top of my head: ignore_pickling_error or ignore_result_persistence_error.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 19, 2017

I am not sure this use case is shared enough to justify this extension of the joblib public API.

I think in your case it be should quite easy to subclass the Memory class to override the cache method to wrap it with suitable try / except logic, no?

@tyarkoni
Copy link
Author

Yes, I can certainly do that. I guess my feeling was that I'm probably not the only person to run into such a situation, and it would be good to have the option to fail gracefully, in general. But I can appreciate that my use case might be less common than I think, so I'll close the issue. Thanks for the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants