/
tidymodels-interface.Rmd
110 lines (83 loc) · 2.25 KB
/
tidymodels-interface.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
title: "Fitting tabnet with tidymodels"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Fitting tabnet with tidymodels}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = FALSE
)
```
```{r setup}
library(tabnet)
library(tidymodels)
library(modeldata)
```
In this vignette we show how to create a TabNet model using the tidymodels interface.
We are going to use the `lending_club` dataset available
in the `modeldata` package.
First let's split our dataset into training and testing so we can later access performance of our model:
```{r}
set.seed(123)
data("lending_club", package = "modeldata")
split <- initial_split(lending_club, strata = Class)
train <- training(split)
test <- testing(split)
```
We now define our pre-processing steps. Note that TabNet handles categorical variables, so we don't need to do any kind of transformation to them. Normalizing the numeric variables is a good idea though.
```{r}
rec <- recipe(Class ~ ., train) %>%
step_normalize(all_numeric())
```
Next, we define our model. We are going to train for 50 epochs with a batch size of 128. There are other hyperparameters but, we are going to use the defaults.
```{r}
mod <- tabnet(epochs = 50, batch_size = 128) %>%
set_engine("torch", verbose = TRUE) %>%
set_mode("classification")
```
We also define our `workflow` object:
```{r}
wf <- workflow() %>%
add_model(mod) %>%
add_recipe(rec)
```
We can now define our cross-validation strategy:
```{r}
folds <- vfold_cv(train, v = 5)
```
And finally, fit the model:
```{r}
fit_rs <- wf %>%
fit_resamples(folds)
```
After a few minutes we can get the results:
```{r}
collect_metrics(fit_rs)
```
```
# A tibble: 2 x 5
.metric .estimator mean n std_err
<chr> <chr> <dbl> <int> <dbl>
1 accuracy binary 0.946 5 0.000713
2 roc_auc binary 0.732 5 0.00539
```
And finally, we can verify the results in our test set:
```{r}
model <- wf %>% fit(train)
test %>%
bind_cols(
predict(model, test, type = "prob")
) %>%
roc_auc(Class, .pred_bad)
```
```
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.710
```