In [1]:
library(trustyai)
library(rJava)

# Simple linear regression

We'll start with a simple linear regression using the `trees`[^1] dataset.

[^1]: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/trees.html

In [2]:
data(trees)
head(trees)

Unnamed: 0_level_0,Girth,Height,Volume
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
1,8.3,70,10.3
2,8.6,65,10.3
3,8.8,63,10.2
4,10.5,72,16.4
5,10.7,81,18.8
6,10.8,83,19.7


In [3]:
regression <- lm(Volume ~ Girth * Height, data = trees)

In [4]:
summary(regression)


Call:
lm(formula = Volume ~ Girth * Height, data = trees)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.5821 -1.0673  0.3026  1.5641  4.6649 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  69.39632   23.83575   2.911  0.00713 ** 
Girth        -5.85585    1.92134  -3.048  0.00511 ** 
Height       -1.29708    0.30984  -4.186  0.00027 ***
Girth:Height  0.13465    0.02438   5.524 7.48e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.709 on 27 degrees of freedom
Multiple R-squared:  0.9756,	Adjusted R-squared:  0.9728 
F-statistic: 359.3 on 3 and 27 DF,  p-value: < 2.2e-16


We create an R "prediction" function, in this case taking a dataframe as the input.

In [5]:
prediction_fn <- function(df) {
  return(predict(regression, df))
}

Create an input and store the prediction.

In [6]:
input <- data.frame(Girth = 18.2, Height = 72)
pred <- as.double(prediction_fn(input))

In [7]:
input

Girth,Height
<dbl>,<dbl>
18.2,72


In [8]:
pred

We wrap the prediction function in a TrustyAI `Model`.

In [9]:
model <- Model(prediction_fn)

We convert the input to features and the prediction as an `Output`.

In [10]:
features <- c(
  feature(name="Girth", type="number", value=18.2),
  feature(name="Height", type="number", value=72.0))

In [11]:
output <- c(create_output("Volume", pred))

Request the saliencies for this input/output to the LIME explainer:

In [12]:
saliencies <- lime(features, output, model)

In [13]:
cat(saliencies$asTable())

  Feature      Value |  Saliency  | Confidence
----------------------------------------------
 Girth =      18.200 |     0.000         0.000
----------------------------------------------
          Prediction |    45.881              

# Random forests

For the Boston housing dataset.

In [14]:
require(randomForest)
require(MASS)

Loading required package: randomForest

randomForest 4.7-1.1

Type rfNews() to see new features/changes/bug fixes.

Loading required package: MASS



In [15]:
attach(Boston)
set.seed(23)

train <- sample(1:nrow(Boston),300)

In [16]:
rf <- randomForest(medv ~ . , data = Boston , subset = train)

In [17]:
prediction_fn <- function(df) {
  return(predict(rf, df))
}

In [18]:
input <- data.frame(crim=0.02731, zn=0, indus=7.07, chas=0, nox=0.469,
                    rm=6.421, age=78.9, dis=4.9671, rad=2, tax=242,
                    ptratio=17.8, black=396.9, lstat=9.14)

pred <- as.double(prediction_fn(input))

In [19]:
model <- Model(prediction_fn)

In [20]:
features <- df_to_features(input)
output <- c(create_output("medv", pred))

In [21]:
saliencies <- lime(inputs=features, output=output, model=model)

In [22]:
cat(saliencies$asTable())

  Feature      Value |  Saliency  | Confidence
----------------------------------------------
  crim =       0.027 |     4.713         0.000
 indus =       7.070 |     4.713         0.000
    rm =       6.421 |     4.713         0.000
   dis =       4.967 |     4.713         0.000
   tax =     242.000 |     4.713         0.000
----------------------------------------------
          Prediction |    22.985              