-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add not-in-place implementations for several dataset transforms #1883
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks !
I think it would be bool to deprecate the in-place functions indeed for consistency.
Also I think there is also dictionary_encode_column_ that is in-place
src/datasets/arrow_dataset.py
Outdated
""" | ||
dataset = copy.deepcopy(self) | ||
dataset._fingerprint = new_fingerprint | ||
dataset.flatten_(max_depth=max_depth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flatten_ already updates the fingerprint so we're updating the fingerprint twice here.
We could simply copy paste the code from flatten_. I think it's ok since we may deprecate flatten_ at one point, and the code is short and straightforward.
This also applies for the other transforms. Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added a @deprecated
decorator that emits a DeprecationWarning when calling the methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks !
Although usually deprecation warnings are only emitted once, maybe we can just give an id parameter to the deprecated decorator so that the second time a deprecated function is called we can say that the warning has already been emitted and therefore keep it silent ? We can use a dictionary to keep track of already emitted deprecation warning as in transformers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
@lhoestq I am not sure how to test |
I can take a look at dictionary_encode_column tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the warning emit only once !
Warn only once and add a replaced_by arg Refactor Use logger from logging utils
47f3495
to
0a3b53a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks all good now !
I added some tests (especially about pickling) and they're all passing :)
Thank you so much !
Now let's update the documentation to use the new methods x) |
Should we deprecate in-place versions of such methods?