Model tree diagram #673

barrettk · 2024-03-30T18:52:46Z

Introduces new model_tree() function, allowing you to create a tree diagram for your modeling directory

Some Example function calls

model_tree(model_dir)
model_tree(log_df)
model_tree(log_df, add_summary = FALSE)
model_tree(log_df, color_by = "star")
model_tree(log_df, static = TRUE)

log_df %>% add_summary() %>%
  model_tree(
    include_info = c("tags", "param_count", "eta_pval_significant"),
    color_by = "any_heuristics"
  )

log_df %>% add_config() %>%
  model_tree(
    include_info = c("model_has_changed", "data_has_changed", "nm_version"),
    color_by = "data_has_changed"
  )

Example output

model_tree(log_df,  color_by = "run")

Logical values will be colored as white and red, and are referred to as 'border colors'. The more unique values are added, the more gradient colors between those two border colors are generated.

model_tree(log_df, color_by = "star")

This example illustrates how NA values are handled. Here we color by description (which wouldn't make sense to do in practice). The two distinct descriptions are colored white and red, where NA values are colored grey.

run_log(model_dir) %>%
  add_config() %>%
  model_tree(
    include_info =  c("tags", "description", "model_has_changed", "data_has_changed"),
    color_by = "description", add_summary = FALSE
  )

Previous PR for reference: #610

andersone1 · 2024-06-26T16:06:46Z

@barrettk

Can webshot::install_phantomjs() be added to the top of test-model-tree.R

andersone1 · 2024-06-27T15:25:43Z

andersone1 · 2024-06-27T15:58:55Z

R/model-tree.R

+      existing_sum_cols <- sum_cols[sum_cols %in% names(full_log)]
+      sum_log <- sum_log %>% dplyr::select(-all_of(existing_sum_cols))
+    }
+    full_log <- dplyr::left_join(full_log, sum_log, by = req_cols)


@barrettk should n row of full_log always be the same before and after this left_join? if so, can add a check for that?

@andersone1 Shouldn't the join enforce that? If we join by req_cols I cant think of any way the number of rows would change...unless you made the run_log and then added a new model or deleted an old one before running model_tree(), though that seems like an edge case (I think they would still be the same number of rows though). Can you think of any other way that could happen?

@barrettk I can't really think of a way it could happen, but left_join by does not enforce that.

- silence these warnings when creating a run_log or adding columns to it via one of the bbr::add_*() functions. - eliminates the need to suppress these warnings within model_tree() related functions - this warning doesnt provide any value and should be silenced in more places, but those can be done in a separate, more dedicated PR

… column to a run log.

- Defaults to an interactive HTML plot and optionally a static plot - Control tooltip settings and optionally append model_summary() output (defaults to including this) - Tags and other elements are extracted and nicely formatted. - Control how to color coat the nodes. Uses bbr's logo colors and colors by the model run by default, though one could see this being useful for quickly identifying which models are starred, have heuristics, etc. - supports multiple based_on flags per model - supports multiple origin models (models with no based_on flag) - supports broken links (when based_on points to a given model, but the model cannot be found) - Uses S3 methods, and can take in either a run log or a modeling directory

- needed for coloring the tree nodes. Should already be installed with `ggplot2`, which has been a `bbr` suggested package for a while. If we remove `ggplot2` as a suggested package, perhaps we will want to revisit this as well.

- add zoomable argument, allowing you to pan and zoom with the interactive plot - this can be valuable in the event any tooltips are cutoff. Defaults to FALSE because the zoom feature can be annoying if you're not expecting it.

- digits argument for rounding numeric values - works with newly defined columns via mutate calls - Can work with multiple working directories (i.e. if you row binded mutliple run_logs, it can handle the multiple origins) - color_by bug fixes. Can also pass `NULL` as to not color by any column. - more checks for helper functions - better handling for static plots. Instead of plotting it within the function, we now return the data and implement a print method. I referenced what ggplot2 did for this handling and I think its notably more portable now. - fill out remaining tests

- This was somewhat of a large refactor. Before this refactor, nested models could not be properly linked. Nested models have `based_on` values of `../2` or something similar, which broke the ability to map it to its parent model - To address this, we first have to determine which models are stored in a nested fashion by seeing if any of the model directories are subdirectories of another, and extracting the relative file path if so. After doing this, we have to overwrite the `based_on` attribute to just be the model id, and make the `run` column include the nested directory. This will cause the subdirectory to show in the tooltip as part of the model run, and will allow the models to be linked. - Given the complexity of some of the logic here, I opted to write a combined test that simulates a variety of scenarios all happening at once to ensure there are no interactions between them (broken links, recursive model directory, additional based on references, etc.).

- This was only caught with local R CMD Check. The `R_CHECK_LENGTH_1_CONDITION` environment variable controls this, though apparently this was ditched in R 4.2, in favor of making this concern an error instead of a warning. Not sure if either of those things are relevant as to why it would pass on drone and not locally, but just documenting what I found on the subject.

- In making the vignette I found a bug when coloring by a logical column. If all values were `TRUE`, all nodes would be colored white. Logical columns should be colored the same regardless of the unique values, so a refactor was necessary to explicitly set FALSE values to white and TRUE values to red. - Additional tests were added to ensure we catch this in the future.

- also improve one of the examples

- dont call model_summary() when possible - skip tests if required bbi version not met and model_summary() is required

- model type is now displayed - bootstrap runs will show up in the model tree - `--simulation attached--` will appear if an attached simulation is found and `add_summary` is TRUE

- now includes bootstrap and simulation examples

- we could add a step to install this in the test suite, but we havent historically done that, and this aligns more with the current methods of skipping tests if suggested packages aren't installed

- wasnt meant to display in vignette

- skip_if_not_drone_or_metworx --> skip_if_not_ci_or_metworx

kyleam · 2024-06-27T18:01:46Z

Most of the check failures are straightforward. I can help take a look at the oldest job failure:

https://github.com/metrumresearchgroup/bbr/actions/runs/9701317535/job/26774698747

   ℹ Creating lockfile '.github/pkg.lock'
  
  ✔ Updated metadata database: 3.47 MB in 10 files.
  
  ℹ Creating lockfile '.github/pkg.lock'
  ℹ Updating metadata database
  ✔ Updating metadata database ... done
  
  ℹ Creating lockfile '.github/pkg.lock'
  ✖ Creating lockfile '.github/pkg.lock' [6.3s]
  
  Error: 
  ! error in pak subprocess
  Caused by error: 
  ! Could not solve package dependencies:
  * deps::.: Can't install dependency scales (>= 1.3.0)
  * scales: scales conflicts with scales, to be installed
  * any::sessioninfo: dependency conflict
  * any::pkgdown: dependency conflict
  * any::rcmdcheck: dependency conflict

kyleam · 2024-06-27T18:21:16Z

scales (>= 1.3.0)

1.3.0 isn't on MPN 2023-01-25 (it wasn't released until Nov 2023).

@barrettk is this constraint just copied from ggplot2? Or are we using scales directly in a way that requires 1.3.0 or later? If the former, I think we can drop the constraint, and hopefully that will resolve the pak failure.

barrettk · 2024-06-27T18:55:06Z

@kyleam It wasn't copied from ggplot2 and I believe I did that because of scales::pal_gradient_n, and there wasn't really an easy workaround. That being said I do have functions set up to skip tests and not run the functions if that version isnt there, so dropping it may still be a good solution.

See req_tree_pkgs, check_for_model_tree_pkgs, and skip_if_tree_missing_deps for details if interested

kyleam · 2024-06-27T19:02:57Z

I believe I did that because of scales::pal_gradient_n

I see. Then the options I can think of are

remove the constraint and add guards to check it at runtime for the specific spots that need a given version
install a newer scales from elsewhere (as we do for nmrec or mrgmisc)

I think you're going with the first option (no objections from me).

- move to print.R for consistency

kyleam · 2024-06-27T19:53:15Z

Highlighting this note, since it won't fail the build:

https://github.com/metrumresearchgroup/bbr/actions/runs/9702122713/job/26777261543

* checking Rd \usage sections ... NOTE
Documented arguments not in \usage in Rd file 'format_model_type.Rd':
  ‘.mod’ ‘msg’

barrettk · 2024-06-27T20:06:16Z

@kyleam can you take a look at the CI before and after I added collapsibleTree to github actions? A) it seems like it was being installed on mpn-oldest before I added it, and B) clicking the green check is now displaying different/less options.

Edit: the B) I reported was because I accidentally hit 'enter' and didnt realize when I went to commit that change. A) is the only uncertainty

For reference: expected to see collabsibleTree here too

- must have accidentally hit 'enter'

kyleam · 2024-06-27T20:11:23Z

A) it seems like it was being installed on mpn-oldest before I added it

You don't need the custom step as long as it can be retrieved from the date-pinned RSPM URL.

barrettk · 2024-06-27T20:12:37Z

You don't need the custom step as long as it can be retrieved from the date-pinned RSPM URL.

Would you recommend reverting this change then, or is there potentially a reason to keep it (some form of risk mitigation)?

kyleam · 2024-06-27T20:13:24Z

Would you recommend reverting this change then

yes

- From KyleM: You don't need the custom step as long as it can be retrieved from the date-pinned RSPM URL - Note: we could add scales 1.3.0 at some point instead (to run more tests in CI), but we have functions available now to skip these. mpn-latest will execute them too so its not much of a risk

- 0.1.7 is available from the date-pinned RSPM URL in (2023-01-25 cran snapshot). This version also included important bug fixes related to the new model_tree() function, so it makes sense to add before merging

barrettk mentioned this pull request Mar 30, 2024

Add model_tree() function #610

Closed

barrettk requested a review from andersone1 April 1, 2024 21:28

barrettk force-pushed the model-tree-diagram branch from c0748e2 to 3626ef8 Compare June 24, 2024 17:29

barrettk requested review from andersone1 and removed request for andersone1 June 25, 2024 17:20

andersone1 approved these changes Jun 27, 2024

View reviewed changes

barrettk added 22 commits June 27, 2024 13:43

add required dependencies

e4cf4f5

add util helper function for adding the model status as an additional…

ff9241c

… column to a run log.

add scales as a suggested package

11407ff

- needed for coloring the tree nodes. Should already be installed with `ggplot2`, which has been a `bbr` suggested package for a while. If we remove `ggplot2` as a suggested package, perhaps we will want to revisit this as well.

add helper functions for testing

c83dcf2

add examples and improve documentation

421f0cd

- add zoomable argument, allowing you to pan and zoom with the interactive plot - this can be valuable in the event any tooltips are cutoff. Defaults to FALSE because the zoom feature can be annoying if you're not expecting it.

add coloring for the start node and NA values

bd144ef

add model tree vignette

24cd7c6

fix vignette title

07eac57

- also improve one of the examples

vignette fix: replace ABS_MOD_PATH with "absolute_model_path"

11edfb1

vignette: show loading of packages

da77794

load glue library for model-tree vignette

465bf3b

add sizing arguments to model_tree() and specify in vignette

77fa20a

test fixes due to bbi version restrictions:

bff23a0

- dont call model_summary() when possible - skip tests if required bbi version not met and model_summary() is required

set bbi path in model_tree tests

623019f

only build vignette if bbi is greater than 3.3.0

2725798

vignette fix: set minimum bbi version

fe0240c

barrettk added 9 commits June 27, 2024 13:44

specify internal bbr function

020e8b9

add scales version requirement to skip functions

20a2db0

add support for bootstrap and simulation objects

84d300f

- model type is now displayed - bootstrap runs will show up in the model tree - `--simulation attached--` will appear if an attached simulation is found and `add_summary` is TRUE

update tests and vignette

ddeeb07

- now includes bootstrap and simulation examples

fix: remove devtools::load_all() call from vignette

ede63ff

qualify new_sim_model for use in vignette

870430f

qualify make_sim_spec for use in vignette

9e426b0

fix: skip static test if phantomjs is not installed

e12a258

- we could add a step to install this in the test suite, but we havent historically done that, and this aligns more with the current methods of skipping tests if suggested packages aren't installed

vignette: move devtools::load_all() call to setup chunk

95016ff

- wasnt meant to display in vignette

barrettk force-pushed the model-tree-diagram branch from 9a86b0c to 95016ff Compare June 27, 2024 17:45

barrettk added 2 commits June 27, 2024 13:55

test adjustment from rebase

e08e127

- skip_if_not_drone_or_metworx --> skip_if_not_ci_or_metworx

add model_tree to pkgdown site

5f910e5

remove scales version requirement

3dde028

barrettk added 2 commits June 27, 2024 15:50

add model_tree_static print method to print_bbi docs for pkgdown checks

366cae6

- move to print.R for consistency

add collapsibleTree to github actions

ad342e3

format_model_type: fix mismatch in args/docs

919b700

revert actions change

3809eae

- must have accidentally hit 'enter'

barrettk added 2 commits June 27, 2024 16:16

require minimum version of collapsibleTree

e56edb0

- 0.1.7 is available from the date-pinned RSPM URL in (2023-01-25 cran snapshot). This version also included important bug fixes related to the new model_tree() function, so it makes sense to add before merging

barrettk merged commit c2aa415 into main Jun 27, 2024
6 checks passed

barrettk deleted the model-tree-diagram branch June 27, 2024 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model tree diagram #673

Model tree diagram #673

barrettk commented Mar 30, 2024 •

edited

Loading

andersone1 commented Jun 26, 2024

andersone1 commented Jun 27, 2024 •

edited

Loading

andersone1 Jun 27, 2024

barrettk Jun 27, 2024 •

edited

Loading

andersone1 Jun 28, 2024

kyleam commented Jun 27, 2024

kyleam commented Jun 27, 2024

barrettk commented Jun 27, 2024 •

edited

Loading

kyleam commented Jun 27, 2024

kyleam commented Jun 27, 2024

barrettk commented Jun 27, 2024 •

edited

Loading

kyleam commented Jun 27, 2024

barrettk commented Jun 27, 2024

kyleam commented Jun 27, 2024

Model tree diagram #673

Model tree diagram #673

Conversation

barrettk commented Mar 30, 2024 • edited Loading

Some Example function calls

Example output

andersone1 commented Jun 26, 2024

andersone1 commented Jun 27, 2024 • edited Loading

andersone1 Jun 27, 2024

Choose a reason for hiding this comment

barrettk Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

andersone1 Jun 28, 2024

Choose a reason for hiding this comment

kyleam commented Jun 27, 2024

kyleam commented Jun 27, 2024

barrettk commented Jun 27, 2024 • edited Loading

kyleam commented Jun 27, 2024

kyleam commented Jun 27, 2024

barrettk commented Jun 27, 2024 • edited Loading

kyleam commented Jun 27, 2024

barrettk commented Jun 27, 2024

kyleam commented Jun 27, 2024

barrettk commented Mar 30, 2024 •

edited

Loading

andersone1 commented Jun 27, 2024 •

edited

Loading

barrettk Jun 27, 2024 •

edited

Loading

barrettk commented Jun 27, 2024 •

edited

Loading

barrettk commented Jun 27, 2024 •

edited

Loading