Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows/Linux rpy2=3.5.7: Converted pandas dataframe has NA instead of None and throws exception on access the line #979

Open
ievgennaida opened this issue Jan 16, 2023 · 7 comments
Assignees
Labels
bug Something isn't working Windows

Comments

@ievgennaida
Copy link

ievgennaida commented Jan 16, 2023

Describe the issue or bug

Windows rpy2=3.5.3: R returns cells with NA that are converted to None
Windows rpy2=3.5.7: R returns cells with NA_Character instead of None.
tested with a single thread
Attempt to access those values are throwing next error:

"R value for missing character value"

pip install rpy2=3.5.7
Convertor used:

        with localconverter(ro.default_converter + pandas2ri.converter):
            return ro.conversion.get_conversion().rpy2py(r_df)

Converted pandas grid:
image

pip install rpy2=3.5.3
Convertor used:

        with localconverter(ro.default_converter + pandas2ri.converter):
            return ro.conversion.rpy2py(r_df)

NA is properly converted:
image

Test


 r_df = ro.DataFrame(
        {
            "str_column": ro.StrVector([ro.NA_Character]),
        }
    )
        with localconverter(ro.default_converter + pandas2ri.converter):
            pandas_df = ro.conversion.get_conversion().rpy2py(r_df)
# line will fail because contains NA that has no support of 'not' operator. "not None" in python works well. 
assert not pandas_df["str_column"][0] 

Expected behavior
NA should be converted to None without any errors accessing a data frame.

Error
A copy of the error message.

R value for missing character value

Additional context

rpy2 version:
3.5.7
Python version:
3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
Looking for R's HOME:
    Environment variable R_HOME: None
    InstallPath in the registry: C:\Program Files\R\R-4.1.2
    Environment variable R_USER: None
    Environment variable R_LIBS_USER: C:\dev\backend\env\r-libs
R version:
    In the PATH: R version 4.1.2 (2021-11-01) -- "Bird Hippie"
    Loading R library from rpy2: OK


pandas                        1.5.2

@lgautier
Copy link
Member

Hi, self-contained examples to reproduce are easier on my end. This includes imports, create of test data or objects, etc. This means a example that is a minimal as possible and I can just copy/paste to see what is happening.

@ievgennaida
Copy link
Author

@lgautier
Sorry, for the vague example and missing imports.
Please check "Test" section it should reproduce the issue.

@ievgennaida
Copy link
Author

ievgennaida commented Feb 11, 2023

@lgautier Unfortunately bug still can be reproduced on windows with version 3.5.8.
Also, I have retested 3.5.7 with the versions of the new packages and STR:

Failed on Windows

  1. Install R on windows. R-4.2.2 for Windows
    https://cran.r-project.org/bin/windows/base/
  2. Install python on windows Python 3.11.0 https://www.python.org/downloads/release/python-3110/
  3. Go to folder C:\python-test and Create a python virtual env using CMD
  4. python -m venv env
  5. activate python virtual environment
    env\Scripts\activate.bat
  6. Install python dependencies for the test:
  7. pip install rpy2==3.5.7
  8. pip install pandas
  9. pip install numpy
    10. Create a python file with the content test-python.py
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter

r_df = ro.DataFrame(
        {
            "str_column": ro.StrVector([ro.NA_Character]),
        }
    )
with localconverter(ro.default_converter + pandas2ri.converter):
	pandas_df = ro.conversion.get_conversion().rpy2py(r_df)

print(pandas_df["str_column"])
# line will fail because contains NA that has no support of 'not' operator. "not None" in python works well. 
assert not pandas_df["str_column"][0] 
  1. Run command in CDM python test-python.py

See the error in the console log (both for 3.5.8 and 3.5.7):

1    NA_character_
Name: str_column, dtype: object
Traceback (most recent call last):
  File "C:\python-test\test-python.py", line 20, in <module>
    assert not pandas_df["str_column"][0]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\python-test\env\Lib\site-packages\rpy2\rinterface_lib\sexp.py", line 986, in __bool__
    raise ValueError('R value for missing character value')
ValueError: R value for missing character value
  1. Install pip install rpy2==3.5.3 to see expected results:

Expected results:Works fine in: 3.5.3.
13. Run command in CDM python test-python.py
Console output Output from script

1    None
Name: str_column, dtype: object

@ievgennaida
Copy link
Author

ievgennaida commented Feb 11, 2023

@lgautier NOTE: The same bug is reproduced on Linux (R-Base Docker Image) with 3.5.8/3.5.7!

Failed on Linux

The next steps are followed:

  • Install Docker on windows
  • Run the command to create and start using R container
    docker run r-base:4.2.2 -it bash
    -Install python by running the next commands
apt install curl build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev wget libbz2-dev
wget https://www.python.org/ftp/python/3.10.0/Python-3.10.0.tgz
tar -xf Python-3.10.*.tgz
cd Python-3.10.*/
./configure --enable-optimizations
make -j 4
make altinstall
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3.10 get-pip.py
  • Install required python packages
pip install wheel
pip install rpy2==3.5.8
pip install pandas
pip install numpy
  • Create python file with the content. Ex: using nano

apt install nano

  • Run nano and paste next content:
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter

r_df = ro.DataFrame(
        {
            "str_column": ro.StrVector([ro.NA_Character]),
        }
    )
with localconverter(ro.default_converter + pandas2ri.converter):
	pandas_df = ro.conversion.get_conversion().rpy2py(r_df)

print(pandas_df["str_column"])
# line will fail because contains NA that has no support of 'not' operator. "not None" in python works well. 
assert not pandas_df["str_column"][0] 
  • Save with the name test-python.py

  • Run command to
    python3.10 test-python.py

  • See exception:

1    NA_character_
Name: str_column, dtype: object
Traceback (most recent call last):
  File "/Python-3.10.0/test-python.py", line 19, in <module>
    assert not pandas_df["str_column"][0]
  File "/usr/local/lib/python3.10/site-packages/rpy2/rinterface_lib/sexp.py", line 986, in __bool__
    raise ValueError('R value for missing character value')
ValueError: R value for missing character value

Package 3.5.3. works fine

@ievgennaida ievgennaida changed the title Windows rpy2=3.5.7: Converted pandas dataframe has NA instead of None and throws exception on access the line Windows/Linux rpy2=3.5.7: Converted pandas dataframe has NA instead of None and throws exception on access the line Feb 11, 2023
lgautier added a commit that referenced this issue Feb 12, 2023
@lgautier
Copy link
Member

Looks like there is an issue indeed. A patch is normally on the way.

@lgautier lgautier self-assigned this Feb 12, 2023
lgautier added a commit that referenced this issue Feb 13, 2023
* Conversion of NA_character_ to numpy string array.

(issue #979).

* Moved check for NA_character to CharSexp dispatch.

* Fixed issue with Converters are context managers.
@ievgennaida
Copy link
Author

@lgautier can this new context "with conversion_rules.context():" fix also to run convertors in another thread?
-> #978

@lgautier
Copy link
Member

@lgautier can this new context "with conversion_rules.context():" fix also to run convertors in another thread? -> #978

Not that exact error. I answered about addressing the error reported in #978 directly in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Windows
Projects
None yet
Development

No branches or pull requests

2 participants