Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in converting ordered factors R → Python (pandas) #1010

Closed
krassowski opened this issue Mar 31, 2023 · 4 comments
Closed

Regression in converting ordered factors R → Python (pandas) #1010

krassowski opened this issue Mar 31, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@krassowski
Copy link
Member

Describe the issue or bug

Using pandas 1.5.x

rpy2 3.4.5:

In [1]: %load_ext rpy2.ipython
   ...: %R ordered(c('a', 'b'))
Out[1]: 
['a', 'b']
Categories (2, object): ['a' < 'b']

rpy2 3.5.0 and 3.5.10:

In [1]: %load_ext rpy2.ipython
   ...: %R ordered(c('a', 'b'))
Out[1]: array([1, 2], dtype=int32)

Conversion the other way works fine (although there is no conversion rule for standalone Categorical object which would be a nice improvement):

In [1]: %load_ext rpy2.ipython
   ...: from pandas import Categorical, DataFrame
   ...: categorical = DataFrame({'col': Categorical(['a', 'b'], ordered=True)})
   ...: %R -i categorical
   ...: %R class(categorical$col)
Out[1]: array(['ordered', 'factor'], dtype='<U7')

To Reproduce

  1. Start ipython from command line (or a Jupyter notebook with ipykernel)
  2. Copy-paste:
%load_ext rpy2.ipython
%R ordered(c('a', 'b'))

Expected behavior
Ordered factor gets converted to ordered categorical, not to numpy array with codes only.

Error
None

Additional context

rpy2 version:
3.5.10
Python version:
3.9.5 (default, Jun  9 2021, 20:32:03) 
[GCC 10.3.0]
Looking for R's HOME:
    Environment variable R_HOME: None
    Calling `R RHOME`: /usr/lib/R
    Environment variable R_LIBS_USER: None
R's additions to LD_LIBRARY_PATH:
/usr/lib/R/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server
R version:
    In the PATH: R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
    Loading R library from rpy2: OK
Additional directories to load R packages from:
None
C extension compilation:
  include:
  ['/usr/share/R/include']
  libraries:
  ['R', 'pcre2-8', 'lzma', 'bz2', 'z', 'tirpc', 'rt', 'dl', 'm', 'icuuc', 'icui18n']
  library_dirs:
  ['/usr/lib/R/lib']
  extra_compile_args:
  ['-std=c99']
  extra_link_args:
  ['-Wl,--export-dynamic', '-fopenmp', '-Wl,-Bsymbolic-functions', '-flto=auto', '-ffat-lto-objects', '-flto=auto', '-Wl,-z,relro']
Directory for the R shared library:
lib
CFFI extension type
  Environment variable: RPY2_CFFI_MODE
  Value: CFFI_MODE.ANY
  ABI: PRESENT
  API: PRESENT
@krassowski krassowski added the bug Something isn't working label Mar 31, 2023
@krassowski
Copy link
Member Author

Relevant code:

R → pandas

def _to_pandas_factor(obj):
codes = [x-1 if x > 0 else -1 for x in numpy.array(obj)]
res = pandas.Categorical.from_codes(
codes,
categories=list(obj.do_slot('levels')),
ordered='ordered' in obj.rclass
)
return res
converter._rpy2py_nc_map.update(
{
rinterface.IntSexpVector: conversion.NameClassMap(
numpy2ri.rpy2py,
{'factor': _to_pandas_factor}
),
rinterface.ListSexpVector: conversion.NameClassMap(
numpy2ri.rpy2py_list,
{'data.frame': lambda obj: rpy2py(DataFrame(obj))}
)
}
)

Looking at NEWS:

  • Initialization and update steps for rpy2.robjects.conversion.NameClassMap were updated to ensure type hints are correct. This probably solved cryptic bugs with conversion system.

Could it be related? I do not see anything obvious in blame view.

pandas → R (this works, just for context):

def py2rpy_categoryseries(obj):
for c in obj.cat.categories:
if not isinstance(c, str):
raise ValueError('Converting pandas "Category" series to '
'R factor is only possible when categories '
'are strings.')
res = IntSexpVector(list(rinterface.NA_Integer if x == -1 else x+1
for x in obj.cat.codes))
res.do_slot_assign('levels', StrSexpVector(obj.cat.categories))
if obj.cat.ordered:
res.rclass = StrSexpVector(('ordered', 'factor'))
else:
res.rclass = StrSexpVector(('factor',))
return res

@lgautier
Copy link
Member

lgautier commented Apr 2, 2023

This seems like a regression is the R magic's default converter.

In [22]: conv = robjects.default_converter+pandas2ri.converter

In [23]: %%R -o foo -c conv
    ...: foo <- ordered(c('a', 'b'))
    ...: 
    ...: 

In [24]: foo
Out[24]: 
['a', 'b']
Categories (2, object): ['a' < 'b']

@lgautier
Copy link
Member

lgautier commented Apr 2, 2023

Confirmed. The R magic converter is mostly identical to cell 43 below.

In [40]: with robjects.default_converter.context():
    ...:   print(type(robjects.r("ordered(c('a', 'b'))")))
    ...: 
<class 'rpy2.robjects.vectors.FactorVector'>

In [41]: import rpy2.robjects.numpy2ri as numpy2ri

In [42]: with (robjects.default_converter + numpy2ri.converter).context():
    ...:   print(type(robjects.r("ordered(c('a', 'b'))")))
    ...: 
<class 'numpy.ndarray'>

In [43]: with (robjects.default_converter + numpy2ri.converter + pandas2ri.converter).context():
    ...:   print(type(robjects.r("ordered(c('a', 'b'))")))
    ...: 
<class 'numpy.ndarray'>

lgautier added a commit that referenced this issue Apr 10, 2023
…1011)

* Make the numpy converter use R class mapping for R integer vectors.

* Add conversion from pandas.Categorical to R factor.

Fix for issue #1010.


Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
@vlulla
Copy link

vlulla commented Sep 11, 2023

I asked a question here on the merged branch but it appears to have not been picked up. Please pardon the cross posting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants