# Work with R in Python

### `rpy2`

Given the versatility of Python, it is often used for all kinds of analysis, including statistical analysis. When you have all the data organized with libraries like pandas or numpy but you need some operation that you know is optimized in R, it is very useful to call R packages and have them return a result to continue with python. If this is not the case, you can consult the official anaconda documentation to consider other alternatives [https://docs.anaconda.com/anaconda/navigator/tutorials/r-lang/](https://docs.anaconda.com/anaconda/navigator/tutorials/r-lang/).

`rpy2` is an interface that allows us to communicate information between R and Python and to access R functionality from Python [https://rpy2.github.io/doc/v2.9.x/html/index.html].

For Windows users with python version >+3.8 and R >+4. If R is directly installed, without RStudio for example, make sure to check "save version number in registry" for the systems to recognize it. Then install for your environment (anaconda prompt) `pip install rpy2`. If any errors related to DLL would appear, add R root folder to the Environment Variables of your OS.

*This did not work for me*: For Windows users the recommended approach is to create a conda environment with python version 3.5 (latest stable), and install the numeric packages of numpy and pandas along with other required packages, besides R version =>3.2 should be installed. Then go to anaconda prompt for `conda install -c r rpy2`.

The next step is to import the required functionalities or packages:

In [1]:
import pandas as pd # data management
import numpy as np # operations and data format

import rpy2.robjects as R
# interact with R and create objects

import rpy2.robjects.numpy2ri
import rpy2.robjects.pandas2ri
# allow to convert between pd and Ro objects
rpy2.robjects.numpy2ri.activate()
rpy2.robjects.pandas2ri.activate()
# automatize previous conversion

import matplotlib.pyplot as plt
%matplotlib inline

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# show all ouputs of the same cell in the notebook, not only the last

Evaluate R code directly just by using it as a string:

In [2]:
R.r.assign('var1',25)
R.r('var2 = var1*4')
var2 = R.r('var2')
# reassign to python again
print(var2[0])
print()

R_code = """
sayhi = function(inhere){
    return(paste("Hi ",inhere))
}
"""
R.r(R_code)
# available in global space names of R, r method

R_code_py = R.globalenv['sayhi']
# assign to python
print(R_code_py.r_repr())
# shows R syntax
print()

calling = R_code_py('Jeremy')
# call function in python syntax
print(type(calling))
# shows numpy object of type class
print()

print(calling[0]) # string
print(calling) # numpy object

array([25], dtype=int32)

array([100.])

100.0



<rpy2.robjects.functions.SignatureTranslatedFunction object at 0x000001DEB0FC0080> [RTYPES.CLOSXP]
R classes: ('function',)

function (inhere) 
{
    return(paste("Hi ", inhere))
}

<class 'rpy2.robjects.vectors.StrVector'>

Hi  Jeremy
[1] "Hi  Jeremy"



Let's reverse this operation, and allow R to access variables created in python:

In [4]:
pythonvar = R.FloatVector(np.arange(0,1,0.1))
# create a vector of floats
print(pythonvar)
print()

mat1 = R.r.matrix(np.array(range(9)), nrow=3, ncol=3)
R.r.assign('mat1', mat1)
R.r('print(mat1)')

R.globalenv["pythonvar"] = pythonvar
# assign a name to the variable space of R to a variable created
print(R.r("pythonvar"))
# show R value of variable

 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9




array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]], dtype=int32)

     [,1] [,2] [,3]
[1,]    0    3    6
[2,]    1    4    7
[3,]    2    5    8


array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]], dtype=int32)

[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]


The new variable created in python is available for R:

In [5]:
print(R.r('mean(pythonvar)'))
print(np.mean(pythonvar))

[0.45]
0.45000000000000007


### Examples of hybrid working between languages

For working with any library of R, the library should be previously installed in the usual fashion of R `install.packages("ggplot2")`. This will not work with rpy2, a secure approach is to do it in the R console.

In [7]:
from rpy2.robjects.packages import importr
# function used to import R libraries
ggp = importr('ggplot2')
datab = importr('datasets')
# equivalent to library('rapportools') in R

getdata = R.r("iris")
# getdata.head(5)
# data = pd.read_csv('data/wind_data.txt', engine='python',
#     sep = '\s+', skiprows = 1, parse_dates = [[0, 1]],
#     names = ['date','time','wspd'], index_col = 0)
# read sample file, separator is space (\s+), ignore first row, magic parsing dates (column 0 and column 1 date_time), name of columns, and index for rows the first column 0
# data.head(5)

df = pd.DataFrame.from_dict({'a': [8, 21, 23], 'b': [7, 8, 4], 'c': [400, 83, 98]})
R.r.assign('df', df)
R.r('print(df)')


Unnamed: 0,a,b,c
0,8,7,400
1,21,8,83
2,23,4,98


   a b   c
0  8 7 400
1 21 8  83
2 23 4  98


Unnamed: 0,a,b,c
0,8,7,400
1,21,8,83
2,23,4,98


In [10]:
normR = """
normalR = function(colu){
    return(shapiro.test(colu))
}
"""
R.r(normR)
# available in global space names of R, r method
normRpy = R.globalenv['normalR']

nresults = normRpy(getdata['Sepal.Length'])

<rpy2.robjects.functions.SignatureTranslatedFunction object at 0x000001DEB6A39A40> [RTYPES.CLOSXP]
R classes: ('function',)

Now check the ouput and in you are interested on the pvalue or other statistic parameter, just retrieve it.

In [15]:
nresults.names # check the output possibilities
mypvalue = nresults.rx2('p.value')[0]
print(f'My p-value {mypvalue:.3f}')

0,1,2,3
'statistic','p.value','method','data.name'


My p-value 0.010
