Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dill instead of pickle for processing .pkl files #354

Merged
merged 18 commits into from
Apr 23, 2024

Conversation

maronuu
Copy link
Contributor

@maronuu maronuu commented Feb 29, 2024

Introduce dill library as a serializer instead of pickle for all .pkl files.

gokart has its own file processors for various file formats. For .pkl files, we have used standard pickle library. However, it cannot handle a class or function whose metadata is dynamically determined when initialization.

For example, the following code is a class that update its own method run when initialization by using wrapper plus1. pickle library cannot handle such cases. Thus we introduce dill, which is built on pickle and can handle more various objects.

def plus1(func: Callable[[], int]) -> Callable[[], int]:
    @functools.wraps(func)
    def wrapped() -> int:
        ret = func()
        return ret + 1
    
    return wrapped

class A:
    run: Callable[[], int]
    
    def __init__(self) -> None:
        self.run = plus1(self.run)
    
    def run(self) -> int:
        return 1

cloudpickle is also another potential candidate, but in terms of longer history and more users, we adopt dill. Note that objects that can be serialized by pickle are also serialized by dill (https://dill.readthedocs.io/en/latest/#basic-usage ).

Compatibility

dill is a drop-in replacement for pickle. Existing code can be updated to allow complete pickling using:

As mentioned in doc, objects that can be serialized by pickle are serialized by dill. Additionally, we confirm the objects dumped by pickle are loaded via dill.load.

For the storage size, we confirm that the sizes of objects serialized by pickle or dill are the same.

@maronuu maronuu changed the title [Draft] use dill instead of pickle for file processor [Draft] Use dill instead of pickle for processing .pkl files Feb 29, 2024
@maronuu maronuu marked this pull request as ready for review February 29, 2024 06:38
@maronuu maronuu changed the title [Draft] Use dill instead of pickle for processing .pkl files Use dill instead of pickle for processing .pkl files Feb 29, 2024
@maronuu maronuu changed the title Use dill instead of pickle for processing .pkl files [Draft] Use dill instead of pickle for processing .pkl files Feb 29, 2024
@maronuu maronuu marked this pull request as draft February 29, 2024 06:39
@maronuu maronuu changed the title [Draft] Use dill instead of pickle for processing .pkl files Use dill instead of pickle for processing .pkl files Feb 29, 2024
@maronuu maronuu marked this pull request as ready for review February 29, 2024 06:44
@kitagry
Copy link
Member

kitagry commented Feb 29, 2024

@maronuu Thank you for the suggestion!

I have some questions.

  • How much the storage usage will increase?
  • Can dill load the object dumped by pickle library? I think it is very important for compatibility

@maronuu
Copy link
Contributor Author

maronuu commented Mar 12, 2024

@kitagry Thank you for the comment! I added some notes about the storage usage and compatibility in the PR description.

Copy link
Member

@kitagry kitagry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

poetry.lock Outdated Show resolved Hide resolved
Copy link
Collaborator

@hirosassa hirosassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@kitagry
Copy link
Member

kitagry commented Apr 23, 2024

Thank you!

@kitagry kitagry merged commit 7a00e34 into m3dev:master Apr 23, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants