In [1]:
from functools import partial
from rpy2.ipython import html
html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils")

# `R` and `pandas` data frames

R `data.frame` and :class:`pandas.DataFrame` objects share a lot of
conceptual similarities, and :mod:`pandas` chose to use the class name
`DataFrame` after R objects.

In a nutshell, both are sequences of vectors (or arrays) of consistent
length or size for the first dimension (the "number of rows").
if coming from the database world, an other way to look at them is
column-oriented data tables, or data table API.

rpy2 is providing an interface between Python and R, and a convenience
conversion layer between :class:`rpy2.robjects.vectors.DataFrame` and
:class:`pandas.DataFrame` objects, implemented in
:mod:`rpy2.robjects.pandas2ri`.

In [2]:
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr 
from rpy2.robjects import pandas2ri

from rpy2.robjects.conversion import localconverter

## From `pandas` to `R`

Pandas data frame:

In [3]:
pd_df = pd.DataFrame({'int_values': [1,2,3],
                      'str_values': ['abc', 'def', 'ghi']})

pd_df

Unnamed: 0,int_values,str_values
0,1,abc
1,2,def
2,3,ghi


R data frame converted from a `pandas` data frame:

In [4]:
with localconverter(ro.default_converter + pandas2ri.converter):
  r_from_pd_df = ro.conversion.py2ro(pd_df)

r_from_pd_df

int_values,str_values
1,'abc'
2,'def'


The conversion is automatically happening when calling R functions.
For example, when calling the R function `base::summary`:

In [5]:
base = importr('base')

with localconverter(ro.default_converter + pandas2ri.converter):
  df_summary = base.summary(pd_df)
df_summary

0,1,2,3,4,5,6,7,8
'Min. :...,'1st Qu.:...,'Median :...,'Mean :...,...,'Mode :c...,NA_character_,NA_character_,NA_character_


Note that a `ContextManager` is used to limit the scope of the
conversion. Without it, rpy2 will not know how to convert a pandas
data frame:

In [6]:
try:
  df_summary = base.summary(pd_df)
except NotImplementedError as nie:
  print('NotImplementedError:')
  print(nie)

NotImplementedError:
Conversion 'py2ri' not defined for objects of type '<class 'pandas.core.frame.DataFrame'>'


## From `R` to `pandas`

Starting from an R data frame this time:

In [7]:
r_df = ro.DataFrame({'int_values': ro.IntVector([1,2,3]),
                     'str_values': ro.StrVector(['abc', 'def', 'ghi'])})

r_df

int_values,str_values
1,'abc'
2,'def'


It can be converted to a pandas data frame using the same converter:

In [8]:
with localconverter(ro.default_converter + pandas2ri.converter):
  pd_from_r_df = ro.conversion.ri2py(r_df)

pd_from_r_df

Unnamed: 0,int_values,str_values
0,1,abc
1,2,def
2,3,ghi
