## Rpy2

Rpy2 is a python package that can be used to interface with R. In your projects you may face situations (depending on your fluency with python) where an R function seems like a better way of doing things. The following tutorial will provide some illustrative examples. 

In [8]:
import rpy2.robjects as robjects

### R objects
_robjects_ is a high-level interface that tries to hide R behind a Python-like behavior. This will be your go to command for working with R in python.

## R packages
### Check for and install package names

In [1]:
# R package names
packnames = ('ggplot2', 'hexbin', 'dplyr ')

# import rpy2's package module
import rpy2.robjects.packages as rpackages

if all(rpackages.isinstalled(x) for x in packnames):
    have_tutorial_packages = True
else:
    have_tutorial_packages = False
    
print(have_tutorial_packages)

False


Downloading and installing R packages is usually performed by fetching R packages from a package repository and installing them locally. 

In [2]:
if not have_tutorial_packages:
    # import R's utility package
    utils = rpackages.importr('utils')
    # select a mirror for R packages
    utils.chooseCRANmirror(ind=1) # select the first mirror in the list


Now to install packages using R’s own function _install.package_:

In [9]:
if not have_tutorial_packages:
    # R vector of strings
    from rpy2.robjects.vectors import StrVector
    # file
    packnames_to_install = [x for x in packnames if not rpackages.isinstalled(x)]
    if len(packnames_to_install) > 0:
        utils.install_packages(StrVector(packnames_to_install))

(as ‘lib’ is unspecified)

  res = super(Function, self).__call__(*new_args, **new_kwargs)


## Communicate with R
### The _r_ instance

_r_ is a little like a communication channel from Python to R.

### Getting R objects

In [6]:
%%R
pi

ERROR: Cell magic `%%R` not found.


In [31]:
#However with r
print(robjects.r['pi'])

[1] 3.141593



<FloatVector - Python:0x7fafbe102988 / R:0x3d2ba78>
[4.000000]

### Evaluating R code

A benign example

In [32]:
robjects.r('2 + 2')

<FloatVector - Python:0x7fafbe0feac8 / R:0x3cc7668>
[4.000000]

Something more interesting

In [11]:
robjects.r('''
pi_power <- function(x, verbose=FALSE) {
        if (verbose) {
            cat("I am calling pi_power().\n")
        }
        pi ^ x
        }
        pi_power(4)
''')

<FloatVector - Python:0x7f3aa88c6808 / R:0x585c638>
[97.409091]

### A subtlety

Notice that the _operation_1_ produces a vector while _operation_2_ produces the result of an arithmatic operation 

In [33]:
piplus2 = robjects.r('pi') + 2
piplus2.r_repr()


'c(3.14159265358979, 2)'

In [34]:
pi0plus2 = robjects.r('pi')[0] + 2
print(pi0plus2)

5.141592653589793


## Some data structures in R
### Creating Rpy2 vectors

In [13]:
res = robjects.StrVector(['abc', 'def'])
print(res)
print(res.r_repr())
#Notice the use of r_repr() and the output below. Looks familiar?

[1] "abc" "def"

c("abc", "def")


In [48]:
res = robjects.IntVector([1, 2, 3]) 
print(res.r_repr()) 
res = robjects.FloatVector([1.1, 2.2, 3.3]) 
print(res.r_repr())

1:3
c(1.1, 2.2, 3.3)


Recall: Now look back at the _A subtlety_ section above for adding elements to a vector

### Matrices

A Matrix is a special case of Array/ Rpy2 vector. It is essentially just a vector with dimension attributes (number of rows, number of columns).

In [14]:
m = robjects.r.matrix(robjects.IntVector(range(4)), nrow=2)
print(m.ro + 1)

     [,1] [,2]
[1,]    1    3
[2,]    2    4



### Data Frames

Creating an DataFrame can be done by:

- Using the constructor for the class
- Create the data.frame through R
- Read data from a file using the instance method from_csvfile()


In [15]:
#empty data frame
dataf = robjects.DataFrame({})
print(dataf.ro + 1)

'''data.frame with 2 two columns (not that the order of the columns in the resulting DataFrame 
can be different from the order in which they are declared) '''

d = {'value': robjects.IntVector((1,2,3)), 'letter': robjects.StrVector(('x', 'y', 'z'))}
dataf = robjects.DataFrame(d)
print(dataf)


'''We can use an ordered dictionary to comeover this issue, if we so wish'''
import rpy2.rlike.container as rlc
od = rlc.OrdDict([('value', robjects.IntVector((1,2,3))),
                      ('letter', robjects.StrVector(('x', 'y', 'z')))])
dataf = robjects.DataFrame(od)
print(dataf)

data frame with 0 columns and 0 rows

  letter value
1      x     1
2      y     2
3      z     3

  value letter
1     1      x
2     2      y
3     3      z



### Calling R functions

In [16]:
rsum = robjects.r['sum'] 
print(rsum)
rsum(robjects.IntVector([1,2,3, robjects.NA_Integer]), na_rm = True)[0] 

function (..., na.rm = FALSE)  .Primitive("sum")



## Conclusion - Online example

I found an excellent [demonstration](http://www.r-bloggers.com/ggplot2-in-python-a-major-barrier-broken/) of Rpy2 for a purpose I truly believe is better served in R than python - Plotting. I urge you to read the link as the author succintly justifies the need to use R.

In [17]:
# Import the necessary modules
import numpy as np
import pandas as pd
import rpy2.robjects as robj
import rpy2.robjects.pandas2ri # for dataframe conversion
from rpy2.robjects.packages import importr
 
# First, make some random data
x = np.random.normal(loc = 5, scale = 2, size = 10)
y = x + np.random.normal(loc = 0, scale = 2, size = 10)
 
# Make these into a pandas dataframe. I do this because
# more often than not, I read in a pandas dataframe, so this
# shows how to use a pandas dataframe to plot in ggplot
testData = pd.DataFrame( {'x':x, 'y':y} )
# it looks just like a dataframe from R
print(testData)
 
# Next, you make an robject containing function that makes the plot.
# the language in the function is pure R, so it can be anything
# note that the R environment is blank to start, so ggplot2 has to be
# loaded
plotFunc = robj.r("""
 library(ggplot2)
 
function(df){
 p <- ggplot(df, aes(x, y)) +
 geom_point( )
 
print(p)
 }
""")
 
# import graphics devices. This is necessary to shut the graph off
# otherwise it just hangs and freezes python
gr = importr('grDevices')
 
# convert the testData to an R dataframe
robj.pandas2ri.activate()
testData_R = robj.conversion.py2ri(testData)
 
# run the plot function on the dataframe
plotFunc(testData_R)
 
# ask for input. This requires you to press enter, otherwise the plot
# window closes immediately
input()
 
# shut down the window using dev_off()
gr.dev_off()
 
# you can even save the output once you like it
plotFunc_2 = robj.r("""
 library(ggplot2)
 
function(df){
 p <- ggplot(df, aes(x, y)) +
 geom_point( ) +
 theme(
 panel.background = element_rect(fill = NA, color = 'black')
 )
 
ggsave('rpy2_magic.pdf', plot = p, width = 6.5, height = 5.5)
 }
""")
 
plotFunc_2(testData_R)

          x          y
0  3.295511   3.199273
1  3.204440   3.775582
2  6.538979  10.122847
3  5.481967   1.176281
4  2.126166   1.343560
5  5.450126   5.545861
6  4.552411   6.335435
7  6.066880   7.215155
8  4.411735   3.960410
9  3.717540   1.206754



rpy2.rinterface.NULL