-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace missing values with random values from dataset #715
Replace missing values with random values from dataset #715
Conversation
Codecov ReportAll modified lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #715 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 17 17
Lines 1791 1799 +8
=========================================
+ Hits 1791 1799 +8
☔ View full report in Codecov by Sentry. |
@@ -1353,8 +1379,9 @@ def test_random_seed(self): | |||
|
|||
# Run | |||
ht.fit(data) | |||
transformed = ht.transform(data) | |||
reversed1 = ht.reverse_transform(transformed) | |||
ht.reset_randomization() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should be solved before going any further, this should be called after fit
on the hypertransformer
as per #716
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@@ -73,7 +75,7 @@ def _get_missing_value_replacement(self, data): | |||
if self._missing_value_replacement is None: | |||
return None | |||
|
|||
if self._missing_value_replacement in {'mean', 'mode'} and pd.isna(data).all(): | |||
if self._missing_value_replacement in {'mean', 'mode', 'random'} and pd.isna(data).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update docstring to include random
l73, l21
rdt/transformers/null.py
Outdated
self._min_value = data.min() | ||
self._max_value = data.max() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of an edge case, but what happens if all the values are nan? I think the min and max would be set to nan as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all values are nan we do 0
if self._missing_value_replacement in {'mean', 'mode', 'random'} and pd.isna(data).all():
msg = (
f"'missing_value_replacement' cannot be set to '{self._missing_value_replacement}'"
' when the provided data only contains NaNs. Using 0 instead.'
)
LOGGER.info(msg)
return 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolves #606