In [1]:
library(trustyai)
library(rJava)

# Simple linear regression

We'll start with a simple linear regression using the `trees`[^1] dataset.

[^1]: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/trees.html

In [27]:
data(trees)
head(trees)

Unnamed: 0_level_0,Girth,Height,Volume
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
1,8.3,70,10.3
2,8.6,65,10.3
3,8.8,63,10.2
4,10.5,72,16.4
5,10.7,81,18.8
6,10.8,83,19.7


In [28]:
regression <- lm(Volume ~ Girth * Height, data = trees)

In [29]:
summary(regression)


Call:
lm(formula = Volume ~ Girth * Height, data = trees)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.5821 -1.0673  0.3026  1.5641  4.6649 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  69.39632   23.83575   2.911  0.00713 ** 
Girth        -5.85585    1.92134  -3.048  0.00511 ** 
Height       -1.29708    0.30984  -4.186  0.00027 ***
Girth:Height  0.13465    0.02438   5.524 7.48e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.709 on 27 degrees of freedom
Multiple R-squared:  0.9756,	Adjusted R-squared:  0.9728 
F-statistic: 359.3 on 3 and 27 DF,  p-value: < 2.2e-16


We create an R "prediction" function, in this case taking a dataframe as the input.

In [30]:
prediction_fn <- function(df) {
  return(predict(regression, df))
}

Create an input and store the prediction.

In [31]:
input <- trees[1,1:2]
pred <- as.double(prediction_fn(input))

In [32]:
input

Unnamed: 0_level_0,Girth,Height
Unnamed: 0_level_1,<dbl>,<dbl>
1,8.3,70


In [33]:
pred

We wrap the prediction function in a TrustyAI `Model`.

In [34]:
model <- Model(prediction_fn)

We convert the input to features and the prediction as an `Output`.

In [37]:
features <- df_to_features(input)

In [38]:
output <- c(create_output("Volume", pred))

Request the saliencies for this input/output to the LIME explainer:

In [39]:
saliencies <- lime(features, output, model)

In [40]:
cat(saliencies$asTable())

  Feature      Value |  Saliency  | Confidence
----------------------------------------------
 Girth =       8.300 |     2.833         0.000
----------------------------------------------
          Prediction |     8.231              

# Random forests

For the Boston housing dataset.

In [41]:
require(randomForest)
require(MASS)

In [42]:
set.seed(23)
head(Boston)

Unnamed: 0_level_0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.00632,18,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
2,0.02731,0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
3,0.02729,0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
4,0.03237,0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
5,0.06905,0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2
6,0.02985,0,2.18,0,0.458,6.43,58.7,6.0622,3,222,18.7,394.12,5.21,28.7


In [43]:
train <- sample(1:nrow(Boston),300)

In [44]:
rf <- randomForest(medv ~ . , data = Boston , subset = train)

In [45]:
prediction_fn <- function(df) {
  return(predict(rf, df))
}

In [46]:
input <- Boston[1,1:13]

pred <- as.double(prediction_fn(input))

In [47]:
input

Unnamed: 0_level_0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.00632,18,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98


In [48]:
pred

In [49]:
model <- Model(prediction_fn)

In [50]:
features <- df_to_features(input)
output <- c(create_output("medv", pred))

In [51]:
saliencies <- lime(inputs=features, output=output, model=model)

In [52]:
cat(saliencies$asTable())

  Feature      Value |  Saliency  | Confidence
----------------------------------------------
 indus =       2.310 |     8.732         0.000
   nox =       0.538 |     8.732         0.000
 lstat =       4.980 |     8.732         0.000
  crim =       0.006 |     0.000         0.000
    zn =      18.000 |     0.000         0.000
----------------------------------------------
          Prediction |    27.508              