Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 set mappings_provided to list type so object is picklable #258

Conversation

davidhopkinson26
Copy link
Contributor

fix for issue #257

Changes mappings_provided attribute to a list type rather than dict.keys() as the latter causes an error when pickling the transformer.

I have added tests that object is serialisable both to base_tests.py and the to test_DateTimeInfoExtractor.py (as this module has not yet been updated to the new testing framework)

These currently use a slightly hacky clean up using os.remove and a try except block to avoid persisting pickle files. I tried using the pytest tmpdir fixture instead but it didn't play well with joblib and kept giving me unicode errors when calling joblib.dump:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Another approach to this test would be to directly check the dtypes of each of the transformers attributes to check they are serialisable objects i.e. one of number/ list/ string/ dict which i think should catch the issue too. Please mention in review if you think this would be better.

At on point i had the test loading the pickle and comparing to original object, however i removed this as comparing transformer objects with eq doesn't work and also because it was straying into unit testing joblib! If we really wanted to do this we would need to implement something to iterate over attributes and compare one by one.

Copy link

@Decima2014 Decima2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the general approach of not testing the dtype of every single attribute here and comparing transformers between joblib dump and load. Would like to play around a bit more with using tmp path but if that doesnt work either then happy to move on

Thank you for picking up these changes

tests/base_tests.py Outdated Show resolved Hide resolved
CHANGELOG.rst Show resolved Hide resolved
CHANGELOG.rst Show resolved Hide resolved
tests/dates/test_DateTimeInfoExtractor.py Outdated Show resolved Hide resolved
@Decima2014
Copy link

Happy with the changes, have approved, thanks again for picking this up :)

@davidhopkinson26 davidhopkinson26 merged commit 5546831 into main Jul 1, 2024
12 checks passed
@davidhopkinson26 davidhopkinson26 deleted the 257-bug-datetimeinfoextractor-prevents-pipeline-from-being-saved-using-joblib-or-pickle branch July 1, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: DatetimeInfoExtractor prevents pipeline from being saved using joblib or pickle
2 participants