Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add not-in-place implementations for several dataset transforms #1883

Merged
merged 18 commits into from
Feb 24, 2021

Conversation

SBrandeis
Copy link
Contributor

@SBrandeis SBrandeis commented Feb 15, 2021

Should we deprecate in-place versions of such methods?

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks !
I think it would be bool to deprecate the in-place functions indeed for consistency.

Also I think there is also dictionary_encode_column_ that is in-place

"""
dataset = copy.deepcopy(self)
dataset._fingerprint = new_fingerprint
dataset.flatten_(max_depth=max_depth)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatten_ already updates the fingerprint so we're updating the fingerprint twice here.
We could simply copy paste the code from flatten_. I think it's ok since we may deprecate flatten_ at one point, and the code is short and straightforward.

This also applies for the other transforms. Let me know what you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a @deprecated decorator that emits a DeprecationWarning when calling the methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks !
Although usually deprecation warnings are only emitted once, maybe we can just give an id parameter to the deprecated decorator so that the second time a deprecated function is called we can say that the warning has already been emitted and therefore keep it silent ? We can use a dictionary to keep track of already emitted deprecation warning as in transformers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@SBrandeis
Copy link
Contributor Author

@lhoestq I am not sure how to test dictionary_encode_column (in-place version was not tested before)

@lhoestq
Copy link
Member

lhoestq commented Feb 18, 2021

I can take a look at dictionary_encode_column tomorrow.
Although it's likely that it doesn't work then. It was added at the beginning of the lib and never tested nor used afaik.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the warning emit only once !

src/datasets/utils/deprecation_utils.py Outdated Show resolved Hide resolved
src/datasets/utils/deprecation_utils.py Outdated Show resolved Hide resolved
src/datasets/utils/deprecation_utils.py Outdated Show resolved Hide resolved
Warn only once and add a replaced_by arg

Refactor

Use logger from logging utils
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks all good now !
I added some tests (especially about pickling) and they're all passing :)

Thank you so much !

@lhoestq lhoestq merged commit 7072e1b into huggingface:master Feb 24, 2021
@lhoestq
Copy link
Member

lhoestq commented Feb 24, 2021

Now let's update the documentation to use the new methods x)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants