Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA_character_ not identified as NaN after importing it into Python #983

Closed
psads-git opened this issue Jan 24, 2023 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@psads-git
Copy link

I am using the following code inside a R magic cell:

%%R -o df

library(tibble)

df <- tibble(x = c("a", "b", NA))

However, when I run in another cell (a Python one):

df.isna()

I get

       x
1  False
2  False
3  False

In fact, the imported dataframe is

               x
1              a
2              b
3  NA_character_

The following fixes the problem:

df['x'] = df['x'].map(lambda val: np.nan if isinstance(val, rpy2.rinterface_lib.sexp.NACharacterType) else val)

My question is: Should not this be done automatically by rpy2?

(For more details, please see: https://stackoverflow.com/questions/75223099/na-character-not-identidied-as-nan-after-importing-it-into-python-with-rpy2 )

@psads-git psads-git added the bug Something isn't working label Jan 24, 2023
@ievgennaida
Copy link

Might be related:
#979

@psads-git psads-git changed the title NA_character_ not identidied as NaN after importing it into Python NA_character_ not identified as NaN after importing it into Python Jan 27, 2023
@lgautier
Copy link
Member

Thanks. This seems like a mistake in the conversion rules. The type of the column is object, and it leaves the special R value NA_character_ as a Python object (and pandas does not consider it a missing value).

In [8]: df.dtypes
Out[8]: 
x    object
dtype: object

lgautier added a commit that referenced this issue Feb 5, 2023
* The numpy converter did not list CHARSXP R objects as vectors.

(issue #983)

Also made the lookup for vector types a set (constant lookup time).
@lgautier
Copy link
Member

lgautier commented Feb 5, 2023

With the PR #989 merged one now gets:

In [5]: df
Out[5]: 
      x
1     a
2     b
3  None

In [6]: df.isna()
Out[6]: 
       x
1  False
2  False
3  True

@lgautier lgautier closed this as completed Feb 5, 2023
@FedericoCozziUni
Copy link

Hello,
I get this behavior with Python 3.11.3 & rpy2 3.5.14

>>> import rpy2.robjects as ro
>>> ro.r("b = c(NA,'def')")
>>> ro.r("df = data.frame(b)")
>>> rdf = ro.r('df')
>>> print(rdf)
     b
1 <NA>
2  def
>>> from rpy2.robjects.conversion import localconverter
>>> from rpy2.robjects import pandas2ri
>>> with localconverter(ro.default_converter + pandas2ri.converter):
...     df = ro.conversion.rpy2py(rdf)
>>> print(df)
               b
1  NA_character_
2            def

I would like to get a Python value (e.g. None) instead of NA_character_
Which rpy2 version should I use?

@D3SL
Copy link

D3SL commented Jun 3, 2024

I'm having this bug with RPY 3.5.16. I was able to get around it briefly by using an older version of rpy2 (due to #1106) and converting through pandas2ri.activate(). In 3.5.16 there doesn't seem any way around this.

3.5.16 seems to be a severe regression in general when it comes to conversions. Previously I could run code like foo=ro.r('''RCODE''') or ro.globalenv['bar']=pydata without issue. Now I have to use the below undocumented patterns to convert between R and Python:

with (ro.default_converter + pandas2ri.converter).context():
   ro.r.assign('foo',ro.conversion.py2rpy(data) )

 with (ro.default_converter + pandas2ri.converter).context():
   foo=ro.conversion.rpy2py(ro.globalenv['bar']) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants