Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Expose isnull/isnan #976

Closed
quasiben opened this issue Feb 18, 2019 · 6 comments · Fixed by #1508
Closed

[FEA] Expose isnull/isnan #976

quasiben opened this issue Feb 18, 2019 · 6 comments · Fixed by #1508
Assignees
Labels
cuDF (Python) Affects Python cuDF API. feature request New feature or request

Comments

@quasiben
Copy link
Member

Data is often missing and messy. It would be nice to expose isna and isnull methods on series objects. These methods are often used for during filter operations like those below:

cdf = cudf.read_csv(...)
cdf[cdf.isna()]

cdf[cdf.isnull()]
@quasiben quasiben added Needs Triage Need team to review and classify feature request New feature or request labels Feb 18, 2019
@quasiben quasiben changed the title [FEA] [FEA] Expose Isnull/isnan Feb 18, 2019
@quasiben quasiben changed the title [FEA] Expose Isnull/isnan [FEA] Expose isnull/isnan Feb 18, 2019
@kkraus14 kkraus14 added cuDF (Python) Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Feb 19, 2019
@kkraus14 kkraus14 added this to Issue-Needs prioritizing in v0.6 Release via automation Feb 19, 2019
@kkraus14 kkraus14 moved this from Issue-Needs prioritizing to Issue-P1 in v0.6 Release Feb 20, 2019
@kkraus14
Copy link
Collaborator

kkraus14 commented Mar 7, 2019

@dillon-cullinan You should be able to tackle this as part of #1126 since I think you already generate the boolean mask.

@kkraus14 kkraus14 added this to Needs prioritizing in Feature Planning via automation Mar 8, 2019
@kkraus14 kkraus14 removed this from Issue-P1 in v0.6 Release Mar 8, 2019
@kkraus14 kkraus14 added this to Issue-Needs prioritizing in v0.7 Release via automation Mar 8, 2019
@randerzander randerzander moved this from Issue-Needs prioritizing to Issue-P1 in v0.7 Release Mar 11, 2019
@randerzander randerzander moved this from Issue-P1 to Issue-P0 in v0.7 Release Mar 15, 2019
@randerzander randerzander moved this from Issue-P0 to Issue-P1 in v0.7 Release Apr 2, 2019
@beckernick
Copy link
Member

It would also be good (for downstream usage and pandas API compatibility) to expose this functionality as a top level function. isna and isnull can be access via pd.isnull and pd.isna.

@beckernick
Copy link
Member

Lack of the isnull method is now partially blocking a dask change to permit cumulative aggregation operations in dask-cudf

@randerzander
Copy link
Contributor

@dillon-cullinan are you able to tackle isnull/isna as part of #1126?

@beckernick can you confirm if we need to add special logic for handling strings too?

@beckernick
Copy link
Member

beckernick commented Apr 24, 2019

In terms of blocking cumulative aggregations, since the cumulative aggregations are are only operating on numeric columns these methods could be implemented at the series level and raise NotImplementedErrors (for now) if the column is a StringColumn.

In general, I think whether we need different logic than the numba kernel work in #1126 depends on whether the StringColumn null mask behaves the same way as a cudf::column. If it does, I think we're fine.

@beckernick
Copy link
Member

As an update, this specific issue is not necessarily blocking the dask functionality. Either isnull or notna methods (and top-level functions) would suffice as long as isnull could be used with the negation operator such as ser[~ser.isnull()].

Feature Planning automation moved this from Needs prioritizing to Closed May 2, 2019
v0.7 Release automation moved this from Issue-P1 to Done May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuDF (Python) Affects Python cuDF API. feature request New feature or request
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants