Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatetimeEncoder fixes #743

Merged
merged 8 commits into from
Oct 2, 2023

Conversation

LeoGrin
Copy link
Contributor

@LeoGrin LeoGrin commented Sep 20, 2023

@jeromedockes

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LeoGrin for this bug fix. This might help fix #741 afterward.
Overall, I'm concerned with the quality of this estimator as we have other bugs like #746. This means that we didn't test it enough.

@LeoGrin LeoGrin changed the title Remove millisecond extraction in DatetimeEncoder DatetimeEncoder fixes Sep 22, 2023
@Vincent-Maladiere
Copy link
Member

Hey @LeoGrin, I'd advocate not to change the scope of PRs once started because it's confusing to review. To fix new issues, let's open new PRs instead :)

@Vincent-Maladiere Vincent-Maladiere dismissed their stale review September 27, 2023 09:13

Dismissing my approval since the scope of the PR has changed.

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning the total number of seconds since Epoch when self.extract_until=None is misleading for the user.
I'd rather have self.extract_until=None as a no-op (not returning anything), and have an additional keyword init parameter like self.seconds_since_epoch.
What do you both think?

Otherwise, I haven't specific remarks on the PR itself, but I plan to refactorize a lot of the DatetimeEncoder implementation that I'm not super happy with in a subsequent PR.

@jeromedockes
Copy link
Contributor

jeromedockes commented Sep 28, 2023 via email

@LeoGrin
Copy link
Contributor Author

LeoGrin commented Sep 28, 2023

I think returning the total number of seconds since Epoch when self.extract_until=None is misleading for the user.
I'd rather have self.extract_until=None as a no-op (not returning anything), and have an additional keyword init parameter like self.seconds_since_epoch.
What do you both think?

Yes, I think that's a good idea. Thinking about it a bit more, it would better for the shape of the output to not depend on the data, for instance if we want to use this inside a grid search, so it may be better to get rid of the logic we have now (check if columns are constant).

@jeromedockes
Copy link
Contributor

jeromedockes commented Sep 28, 2023 via email

@Vincent-Maladiere
Copy link
Member

I guess the idea was to avoid adding eg an hour column full of 0 when
the column we are transforming contains dates not times. But for that it
may be better to rely on the date format detected when parsing the dates
than to check the variance a posterior

Alternatively, we could let the user switch between these two behaviors with an additional drop_if_constant keyword argument.

I think that's a good idea. Dropping constant columns can be done later
in the pipeline with a VarianceThreshold transformer from scikit-learn.

We definitely need to add the VarianceThreshold estimator to the example!

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the behavior of extract_until=None to be a no-op and add an extra add_total_seconds_since_epoch or add_seconds_since_epoch keyword parameter (default True) as discussed :)

@@ -137,6 +137,10 @@ Minor changes
which provides some more information about the job title.
:pr:`581` by :user:`Lilian Boulard <LilianBoulard>`

* Fix bugs which was triggered when `extract_until` was "year", "month", "microseconds"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Fix bugs which was triggered when `extract_until` was "year", "month", "microseconds"
* Fix bugs which where triggered when `extract_until` was "year", "month", "microseconds"

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge so that we can do the refactoring soon.

@Vincent-Maladiere Vincent-Maladiere merged commit 2c70ae5 into skrub-data:main Oct 2, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants