## 从 Python 调用 R

使用 rpy2 包在 Python 中使用 R

使用 Anaconda 安装

```bash
conda install -c conda-forge r-base=4.0.3 rpy2
```



Windows 下需要单独设置

```py
import os
os.environ["R_HOME"] = r"C:\Users\windroc\Anaconda3\envs\nwpc-data\lib\R"
os.environ["PATH"]   = r"C:\Users\windroc\Anaconda3\envs\nwpc-data\lib\R\bin\x64" + ";" + os.environ["PATH"]
```

以下代码来自 rpy2 官方文档：

https://rpy2.github.io/doc/latest/html/introduction.html

In [28]:
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr

### 载入 R 包

In [2]:
r_base = importr('base')

### 执行 R 代码

执行字符串形式的代码

In [3]:
pi = robjects.r("pi")
pi[0]

3.141592653589793

In [4]:
robjects.r('''
    # create a function `f`
    f <- function(r, verbose=FALSE) {
        if (verbose) {
            cat("I am calling f().\n")
        }
        2 * pi * r
    }
    # call the function `f` with argument value 3
    f(3)
''')

0
18.849556


In [5]:
r_f = robjects.globalenv['f']
print(r_f.r_repr())

function (r, verbose = FALSE) 
{
    if (verbose) {
        cat("I am calling f().\n")
    }
    2 * pi * r
}


In [6]:
res = r_f(3)
res

0
18.849556


### 向量

In [7]:
len(robjects.r["pi"])

1

In [8]:
robjects.r["pi"][0]

3.141592653589793

创建向量

In [9]:
res = robjects.StrVector([
    "abc",
    "def"
])
print(res.r_repr())

c("abc", "def")


In [10]:
res = robjects.IntVector([
    1, 2, 3
])
print(res.r_repr())

1:3


In [11]:
res = robjects.FloatVector([
    1.1, 2.2, 3.3
])
print(res.r_repr())

c(1.1, 2.2, 3.3)


In [12]:
v = robjects.FloatVector([
    1.1, 2.2, 3.3,
    4.4, 5.5, 6.6,
])
m = robjects.r["matrix"](v, nrow=2)
print(m)

     [,1] [,2] [,3]
[1,]  1.1  3.3  5.5
[2,]  2.2  4.4  6.6



### 调用 R 函数

In [13]:
rsum = robjects.r["sum"]
rsum(robjects.IntVector([1, 2, 3]))[0]

6

In [14]:
rsort = robjects.r["sort"]
res = rsort(robjects.IntVector([1, 2, 3]), decreasing=True)
print(res.r_repr())

3:1


### 示例

In [30]:
from rpy2.robjects import r

In [31]:
r_stats = importr("stats")

In [48]:
a = robjects.IntVector([5, 12, 13])
b = robjects.IntVector([10, 28, 30])
robjects.globalenv["v1"] = a
robjects.globalenv["v2"] = b
lm_result = stats.lm("v2 ~ v1")
print(r_base.summary(lm_result))


Call:
(function (formula, data, subset, weights, na.action, method = "qr", 
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
    contrasts = NULL, offset, ...) 
{
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action", 
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    if (method == "model.frame") 
        return(mf)
    else if (method != "qr") 
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w)) 
        stop("'weights' must be a numeric vector")
    offset <- model.offset(mf)
    mlm <- is.matrix(y)
    ny <- if (mlm) 
        nrow(y)
    else length(y)
    if (!is.null(offset)) {
        if (!mlm) 
         

In [38]:
print(stats.anova(lm_result))

Analysis of Variance Table

Response: v2
          Df Sum Sq Mean Sq F value  Pr(>F)  
v1         1 242.53  242.53    1728 0.01531 *
Residuals  1   0.14    0.14                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



In [49]:
print(lm_result.names)

 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        



In [50]:
print(lm_result.rx('coefficients'))

$coefficients
(Intercept)          v1 
  -2.596491    2.526316 


