Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG+1] MissingIndicator transformer #8075
MissingIndicator transformer for the missing values indicator mask.
What does this implement/fix? Explain your changes.
The current implementation returns a indicator mask for the missing values.
Any other comments?
It is a very initial attempt and currently no tests are present. Please do have a look and give suggestions on the design. Thanks !
Hi, I am not entirely aware of the
Thanks. This is heading the right direction.
Once it's looking good, we'll talk about integrating it into
Imputer, and adding a summary feature which indicates the presence of any missing values in a row.
Also, it might be a good idea to add something to the narrative documentation at this point, explaining the motivation for such features, and briefly describing the operation.
It would be good to add an example here in the docstring too.
Perhaps you should add a task list to the PR description
Yes, I guess so.…
On 28 December 2016 at 15:51, Maniteja Nandana ***@***.***> wrote: Thanks @jnothman <https://github.com/jnothman> , just one more clarification. In case of sparse matrix and missing values = 0 currently a dense matrix is returned. Should it be the same even when sparse='auto' ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8075 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6xkVk34wY8miIBGIVcwTNxsC40gcks5rMermgaJpZM4LQF9P> .
Yes, return type should be a numpy array, except if it should be a sparse matrix, in which case the returned format should be specified in the docstring, but need not be the same as the input type.
check_array will convert to acceptable types.
I'll have a look over the transformation and tests once you're happy you know how to determine handling of
transform output types etc.
Non zero missing values
Zero missing values*
Hi @jnothman , sorry for the delay. Could you look at the above return types and let me know if it works ? Thanks.
@@ Coverage Diff @@ ## master #8075 +/- ## ========================================== + Coverage 95.48% 95.48% +<.01% ========================================== Files 342 342 Lines 60987 61096 +109 ========================================== + Hits 58233 58339 +106 - Misses 2754 2757 +3
Please implement fit_transform to avoid duplicating the work.
On 16 July 2018 at 10:05, Alexandre Gramfort ***@***.***> wrote: @maniteja123 <https://github.com/maniteja123> can you rebase now that #11391 <#11391> is merged ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8075 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHG9PxeR-xiPD2GIhwNdqTANSoV7SjK0ks5uHElVgaJpZM4LQF9P> .
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/