Skip to content

3. Sample Workflow

Steven Paul Sanderson II, MPH edited this page Nov 16, 2023 · 1 revision

Workflow

Here we will go over some sample workflows that will show how things work.

Fast Regression

In it's simplest form the fast_regression() function will create 39 different model specifications (provided the packages are installed and loaded) and make predictions on the data. The function is referred to as fast because all of the model parameters are left to their defaults, so there is no model tuning happening.

Let's take a look at a sample fast regression workflow in it's simplest form.

library(recipes)
library(dplyr)
library(tidyAML)

rec_obj <- recipe(mpg ~ ., data = mtcars)
frt_tbl <- fast_regression(
  .data = mtcars, 
  .rec_obj = rec_obj, 
  .parsnip_eng = c("lm","glm","gee"),
  .parsnip_fns = "linear_reg"
)

glimpse(frt_tbl)
#> Rows: 3
#> Columns: 8
#> $ .model_id       <int> 1, 2, 3
#> $ .parsnip_engine <chr> "lm", "gee", "glm"
#> $ .parsnip_mode   <chr> "regression", "regression", "regression"
#> $ .parsnip_fns    <chr> "linear_reg", "linear_reg", "linear_reg"
#> $ model_spec      <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
#> $ wflw            <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
#> $ fitted_wflw     <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
#> $ pred_wflw       <list> [<tbl_df[8 x 1]>], <NULL>, [<tbl_df[8 x 1]>]

> frt_tbl
# A tibble: 3 × 8
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw       fitted_wflw pred_wflw
      <int> <chr>           <chr>         <chr>        <list>     <list>     <list>      <list>   
1         1 lm              regression    linear_reg   <spec[+]>  <workflow> <workflow>  <tibble> 
2         2 gee             regression    linear_reg   <spec[+]>  <NULL>     <NULL>      <NULL>   
3         3 glm             regression    linear_reg   <spec[+]>  <workflow> <workflow>  <tibble> 

Here we see that for the gee parsnip engine that nothing was created. What this means is that the fundamental structure of the way the models are build is in its present state, flawed. Fortunately, the way these functions are written is that they utilize purrr::safely behind the scenes so that where something fails, it does so with a modicum of grace. This does not mean however that the lm and the glm models are not useful. In fact as we see they have been generated successfully. Given this, let us examine each part of those models. Let's first check all of the model specs.

Model Specifications

> frt_tbl |> pull(model_spec)
[[1]]
Linear Regression Model Specification (regression)

Computational engine: lm 


[[2]]
! parsnip could not locate an implementation for `linear_reg` regression model specifications using
  the `gee` engine.The parsnip extension package multilevelmod implements support for this specification.Please install (if needed) and load to continue.

Linear Regression Model Specification (regression)

Computational engine: gee 


[[3]]
Linear Regression Model Specification (regression)

Computational engine: glm 

The reason the gee method failed is because that library multilevelmod was not loaded. There are a few helper functions that can be used for this like load_deps().

Workflow

> frt_tbl |> pull(wflw)
[[1]]
══ Workflow ═══════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 


[[2]]
NULL

[[3]]
══ Workflow ═══════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: glm 

Again because of the previous failure for gee no workflow was created.

Fitted Workflow

> frt_tbl |> pull(fitted_wflw)
[[1]]
══ Workflow [trained] ═════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ──────────────────────────────────────────────────────────────────────────────────────────────

Call:
stats::lm(formula = ..y ~ ., data = data)

Coefficients:
(Intercept)          cyl         disp           hp         drat           wt         qsec  
   42.72540     -1.99677     -0.02254      0.03581      1.90888     -0.35753     -0.14563  
         vs           am         gear         carb  
    0.23074      3.58125     -2.93809     -1.26310  


[[2]]
NULL

[[3]]
══ Workflow [trained] ═════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ──────────────────────────────────────────────────────────────────────────────────────────────

Call:  stats::glm(formula = ..y ~ ., family = stats::gaussian, data = data)

Coefficients:
(Intercept)          cyl         disp           hp         drat           wt         qsec  
   42.72540     -1.99677     -0.02254      0.03581      1.90888     -0.35753     -0.14563  
         vs           am         gear         carb  
    0.23074      3.58125     -2.93809     -1.26310  

Degrees of Freedom: 23 Total (i.e. Null);  13 Residual
Null Deviance:	    936.9 
Residual Deviance: 59.11 	AIC: 113.7

Again, gee fails for the aforementioned reason.

Predictions

Let's get the predictions:

> frt_tbl |> pull(pred_wflw)
[[1]]
# A tibble: 8 × 1
  .pred
  <dbl>
1  30.2
2  18.4
3  28.9
4  16.2
5  17.3
6  14.7
7  27.4
8  29.6

[[2]]
NULL

[[3]]
# A tibble: 8 × 1
  .pred
  <dbl>
1  30.2
2  18.4
3  28.9
4  16.2
5  17.3
6  14.7
7  27.4
8  29.6

Again we see that gee failed.

Using broom

Since this package is based off of and build off of parsnip it fits nicely within the tidymodels ecosystem. This means we can use things like broom on the models. Let's take a look:

broom::tidy

> frt_tbl |> pull(fitted_wflw) |> map(broom::tidy)
[[1]]
# A tibble: 11 × 5
   term        estimate std.error statistic p.value
   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 (Intercept)  42.7      20.6        2.07   0.0589
 2 cyl          -2.00      1.20      -1.67   0.120 
 3 disp         -0.0225    0.0174    -1.30   0.218 
 4 hp            0.0358    0.0246     1.46   0.169 
 5 drat          1.91      1.66       1.15   0.272 
 6 wt           -0.358     1.89      -0.189  0.853 
 7 qsec         -0.146     0.773     -0.188  0.853 
 8 vs            0.231     2.02       0.114  0.911 
 9 am            3.58      2.09       1.71   0.111 
10 gear         -2.94      1.66      -1.77   0.100 
11 carb         -1.26      0.738     -1.71   0.111 

[[2]]
# A tibble: 0 × 0

[[3]]
# A tibble: 11 × 5
   term        estimate std.error statistic p.value
   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 (Intercept)  42.7      20.6        2.07   0.0589
 2 cyl          -2.00      1.20      -1.67   0.120 
 3 disp         -0.0225    0.0174    -1.30   0.218 
 4 hp            0.0358    0.0246     1.46   0.169 
 5 drat          1.91      1.66       1.15   0.272 
 6 wt           -0.358     1.89      -0.189  0.853 
 7 qsec         -0.146     0.773     -0.188  0.853 
 8 vs            0.231     2.02       0.114  0.911 
 9 am            3.58      2.09       1.71   0.111 
10 gear         -2.94      1.66      -1.77   0.100 
11 carb         -1.26      0.738     -1.71   0.111 

broom::glance

> frt_tbl |> pull(fitted_wflw) |> map(broom::glance)
[[1]]
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic   p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>     <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1     0.937         0.888  2.13      19.3   3.35e-6    10  -44.9  114.  128.     59.1          13    24

[[2]]
# A tibble: 0 × 0

[[3]]
# A tibble: 1 × 8
  null.deviance df.null logLik   AIC   BIC deviance df.residual  nobs
          <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1          937.      23  -44.9  114.  128.     59.1          13    24

broom::augment

> frt_tbl |> pull(fitted_wflw) |> map(\(x) x |> broom::augment(new_data = mtcars))
[[1]]
# A tibble: 32 × 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb .pred
 * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4  22.0
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4  21.8
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1  30.2
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1  20.9
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2  15.9
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1  20.7
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4  16.1
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2  22.6
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2  23.9
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4  18.4
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows

[[2]]
# A tibble: 0 × 0

[[3]]
# A tibble: 32 × 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb .pred
 * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4  22.0
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4  21.8
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1  30.2
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1  20.9
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2  15.9
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1  20.7
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4  16.1
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2  22.6
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2  23.9
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4  18.4
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows