## How to use R with Python in Jupyter notebooks

The setup for this can be a little tricky. 
1. First of all, install Anaconda (installation instructions [here](https://conda.io/docs/user-guide/install/)). 
2. Once you've got Anaconda installed, we'll want to set up a conda environment to operate within (an environment is like a self-contained coding sandbox - set up so you don't have to make root changes to your system, and so that if something gets majorly screwed up, you can just delete the environment and start over, with no ill effects to your system). Instructions on setting up conda environments are [here](https://conda.io/docs/user-guide/tasks/manage-environments.html). Call this environment whatever you like (mine is called bayes_env). Make sure to install it with the latest version of python (3.6 at the time of this writing).
3. Once that's installed and activated, go ahead and use conda install to install all major python packages (like matplotlib, numpy, pandas, jupyter, scipy, etc.)
4. The great thing about Anaconda is that you can install R directly into your conda environment. Make sure your environment is activated and then follow the instructions [here](https://conda.io/docs/user-guide/tasks/use-r-with-conda.html). This R will only be able to be used in this environment (along with all of the packages that it uses) will only be able to used with this environment.
5. Next, conda install rpy2 (still in your environment).
6. R uses a FORTRAN compiler to compile many of its packages. Unfortunately, now that R is in a conda environment, it won't use the default FORTRAN compiler already on your system, since Anaconda recently switched to using its own compilers, instead of system-based compilers. Therefore, run [these commands](https://anaconda.org/anaconda/gfortran_linux-64) to get the Anaconda-friendly FORTRAN compiler on your system that R can talk with.
7. Now that gfortran is installed, most R packages should be available to you for installation.
8. Lastly, many R packages can be directly installed using conda - remember that this is an option if installing them directly from R is getting messy (as, for example, the NLoptr library was for me - you can install the NLoptr library by following these instructions [here](https://anaconda.org/conda-forge/r-nloptr)).
9. The rest of this notebook is sort of a template about how to convert functions in R to functions in Python, and convert from R dataframes to Python dataframes. You can also find a template in R magics [here](http://simecek.xyz/blog/2017/04/03/r-magic-in-jupyter-notebooks/).

In [1]:
%load_ext rpy2.ipython

In [2]:
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
import numpy as np

In [3]:
# This is supposed to be a R dataframe to pandas dataframe converter but I haven't quite gotten it to work yet.
# Right now I am just casting R output as an np.array
import pandas as pd
from rpy2.robjects import pandas2ri
pandas2ri.activate()

In [4]:
# How to load R packages directly into python to use their functionality
base = importr('base')

Can use R magic (%R) for one line R commands, as below:

In [5]:
%R require(mvtnorm)




array([1], dtype=int32)

Can also use R magic (%%R) for making a whole cell run with R, this magic needs to be the first thing in the cell, even preceding comments. %%R -i {input} -o {output}

In [6]:
%%R -o example_fit

source('~/Desktop/research/sparsebayes/bcr/BayesPen.R')
library(mvtnorm)
rho = 0.9
sigma = 1
n=2500
p=50
times = 1:p
H = abs(outer(times, times, "-"))
V = sigma * rho^H
set.seed(7)

beta = rep(0,p)
# beta[11:15] = runif(5)
# beta[36:40] = runif(5)
beta[11] = runif(5)
beta[3] = runif(5)
beta[20] = runif(5)
beta[49] = runif(5)
beta[9] = runif(5)
x = rmvnorm(n,rep(0,p),V)
y = x%*%beta + rnorm(n)

# Fit the model
prior = list(varE=list(df=3,S=1),varBR=list(df=3,S=1))
example_fit = Bayes.pen(y, x, prior=prior, nIter=10000)
# example_fit










Iter:  200 time/iter:  0.001 varE:  0.97 varB:  0.068
------------------------------------------------------------
Iter:  400 time/iter:  0 varE:  1.009 varB:  0.056
------------------------------------------------------------
Iter:  600 time/iter:  0.001 varE:  0.983 varB:  0.043
------------------------------------------------------------
Iter:  800 time/iter:  0 varE:  0.978 varB:  0.066
------------------------------------------------------------
Iter:  1000 time/iter:  0.001 varE:  0.988 varB:  0.046
------------------------------------------------------------
Iter:  1200 time/iter:  0 varE:  0.948 varB:  0.054
------------------------------------------------------------
Iter:  1400 time/iter:  0.001 varE:  0.987 varB:  0.069
------------------------------------------------------------
Iter:  1600 time/iter:  0.001 varE:  0.984 varB:  0.057
------------------------------------------------------------
Iter:  1800 time/iter:  0.001 varE:  0.983 varB:  0.065
-------------------------

In [7]:
orders = np.array(example_fit.rx('order.joint'))
df_p=pd.DataFrame({'orders':orders.flatten()})
df_p[:5]

Unnamed: 0,orders
0,11
1,3
2,9
3,20
4,49


Can also pass in a function as an 

In [8]:
rstring = """
function(){
    rm(list=ls(all=TRUE))
    source('~/Desktop/research/sparsebayes/bcr/BayesPen.R')
    library(mvtnorm)
    rho = 0.9
    sigma = 1
    n=2500
    p=50
    times = 1:p
    H = abs(outer(times, times, "-"))
    V = sigma * rho^H
    set.seed(7)

    beta = rep(0,p)
    # beta[11:15] = runif(5)
    # beta[36:40] = runif(5)
    beta[11] = runif(5)
    beta[3] = runif(5)
    beta[20] = runif(5)
    beta[49] = runif(5)
    beta[9] = runif(5)
    x = rmvnorm(n,rep(0,p),V)
    y = x%*%beta + rnorm(n)

    # Fit the model
    prior = list(varE=list(df=3,S=1),varBR=list(df=3,S=1))
    example_fit = Bayes.pen(y, x, prior=prior, nIter=10000)
    # example_fit<-data.frame(example_fit)
    example_fit
    }
"""

In [9]:
# This parses string objects to turn them into R functions
rfunc = robjects.r(rstring)

In [10]:
fit = rfunc()

Iter:  200
 
time/iter:  0
 
varE:  0.97
 
varB:  0.068


------------------------------------------------------------


Iter:  400
 
time/iter:  0.001
 
varE:  1.009
 
varB:  0.056


------------------------------------------------------------


Iter:  600
 
time/iter:  0.001
 
varE:  0.983
 
varB:  0.043


------------------------------------------------------------


Iter:  800
 
time/iter:  0.001
 
varE:  0.978
 
varB:  0.066


------------------------------------------------------------


Iter:  1000
 
time/iter:  0.001
 
varE:  0.988
 
varB:  0.046


------------------------------------------------------------


Iter:  1200
 
time/iter:  0
 
varE:  0.948
 
varB:  0.054


------------------------------------------------------------


Iter:  1400
 
time/iter:  0
 
varE:  0.987
 
varB:  0.069


------------------------------------------------------------


Iter:  1600
 
time/iter:  0
 
varE:  0.984
 
varB:  0.057


------------------------------------------------------------


Iter:

In [11]:
fit.names[6]

'order.joint'

In [12]:
# Extract a vector from df using .rx, then turn to np.array, and then to pandas df
orders = np.array(fit.rx('order.joint'))
df_p=pd.DataFrame({'orders':orders.flatten()})
df_p[:5]

Unnamed: 0,orders
0,11
1,3
2,9
3,20
4,49


Look into runtime performance for magic vs rfunc