Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify pandas Series of strings that contain missing values #1428

Merged
merged 3 commits into from Aug 1, 2023

Conversation

dfalbel
Copy link
Member

@dfalbel dfalbel commented Jul 28, 2023

Improves string arrays simplification by automatically handling common missing value types from pandas.
The following is now simplified into an R character vector.

pd <- import("pandas", convert = FALSE)
x <- pd$Series(list("a", pd$`NA`, NULL, np$nan))

It's very common for pandas series containg any of pd.NA, None and np.nan and strings otherwise are using those types as missing values indicators. Also, .isna() reports True for all of them.

>   x$isna()
0    False
1     True
2     True
3     True
dtype: bool

Copy link
Member

@t-kalinowski t-kalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I really like this pattern of initializing static PyObject/PyObjectPtr! One downside is that if the first initialization fails, subsequently the function will always throw until the R session restarts, and the error message will not be helpful, since py_fetch_error() won't be able to fetch the Python error a 2nd time. It should be a rare occurrence, though, and easily fixable by restarting the R session.

@t-kalinowski t-kalinowski merged commit 90e8a18 into rstudio:main Aug 1, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants