Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model tree diagram #673

Merged
merged 40 commits into from
Jun 27, 2024
Merged

Model tree diagram #673

merged 40 commits into from
Jun 27, 2024

Conversation

barrettk
Copy link
Collaborator

@barrettk barrettk commented Mar 30, 2024

Introduces new model_tree() function, allowing you to create a tree diagram for your modeling directory

Some Example function calls

model_tree(model_dir)
model_tree(log_df)
model_tree(log_df, add_summary = FALSE)
model_tree(log_df, color_by = "star")
model_tree(log_df, static = TRUE)

log_df %>% add_summary() %>%
  model_tree(
    include_info = c("tags", "param_count", "eta_pval_significant"),
    color_by = "any_heuristics"
  )

log_df %>% add_config() %>%
  model_tree(
    include_info = c("model_has_changed", "data_has_changed", "nm_version"),
    color_by = "data_has_changed"
  )

Example output

model_tree(log_df,  color_by = "run")

image

Logical values will be colored as white and red, and are referred to as 'border colors'. The more unique values are added, the more gradient colors between those two border colors are generated.

model_tree(log_df, color_by = "star")

image

This example illustrates how NA values are handled. Here we color by description (which wouldn't make sense to do in practice). The two distinct descriptions are colored white and red, where NA values are colored grey.

run_log(model_dir) %>%
  add_config() %>%
  model_tree(
    include_info =  c("tags", "description", "model_has_changed", "data_has_changed"),
    color_by = "description", add_summary = FALSE
  )

image

Previous PR for reference: #610

@barrettk barrettk requested a review from andersone1 April 1, 2024 21:28
@barrettk barrettk requested review from andersone1 and removed request for andersone1 June 25, 2024 17:20
@andersone1
Copy link

@barrettk

Can webshot::install_phantomjs() be added to the top of test-model-tree.R

@andersone1
Copy link

andersone1 commented Jun 27, 2024

  • model_tree
  • model_tree.character
  • model_tree.bbi_log_df
  • make_tree_data
  • unnest_based_on
  • make_model_network
  • add_log_recurse_dirs
  • check_model_tree
  • make_tree_tooltip
  • color_tree_by
  • style_html
  • format_model_type
  • model_tree_png
  • print.model_tree_static
  • req_tree_pkgs
  • check_for_model_tree_pkgs
  • skip_if_tree_missing_deps
  • stop_if_tree_missing_deps

existing_sum_cols <- sum_cols[sum_cols %in% names(full_log)]
sum_log <- sum_log %>% dplyr::select(-all_of(existing_sum_cols))
}
full_log <- dplyr::left_join(full_log, sum_log, by = req_cols)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barrettk should n row of full_log always be the same before and after this left_join? if so, can add a check for that?

Copy link
Collaborator Author

@barrettk barrettk Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andersone1 Shouldn't the join enforce that? If we join by req_cols I cant think of any way the number of rows would change...unless you made the run_log and then added a new model or deleted an old one before running model_tree(), though that seems like an edge case (I think they would still be the same number of rows though). Can you think of any other way that could happen?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barrettk I can't really think of a way it could happen, but left_join by does not enforce that.

 - silence these warnings when creating a run_log or adding columns to it via one of the bbr::add_*() functions.
 - eliminates the need to suppress these warnings within model_tree() related functions
 - this warning doesnt provide any value and should be silenced in more places, but those can be done in a separate, more dedicated PR
 - Defaults to an interactive HTML plot and optionally a static plot
 - Control tooltip settings and optionally append model_summary() output (defaults to including this)
   - Tags and other elements are extracted and nicely formatted.
 - Control how to color coat the nodes. Uses bbr's logo colors and colors by the model run by default, though one could see this being useful for quickly identifying which models are starred, have heuristics, etc.
 - supports multiple based_on flags per model
 - supports multiple origin models (models with no based_on flag)
 - supports broken links (when based_on points to a given model, but the model cannot be found)
 - Uses S3 methods, and can take in either a run log or a modeling directory
 - needed for coloring the tree nodes. Should already be installed with `ggplot2`, which has been a `bbr` suggested package for a while. If we remove `ggplot2` as a suggested package, perhaps we will want to revisit this as well.
 - add zoomable argument, allowing you to pan and zoom with the interactive plot
    - this can be valuable in the event any tooltips are cutoff. Defaults to FALSE because the zoom feature can be annoying if you're not expecting it.
 - digits argument for rounding numeric values
 - works with newly defined columns via mutate calls
 - Can work with multiple working directories (i.e. if you row binded mutliple run_logs, it can handle the multiple origins)
 - color_by bug fixes. Can also pass `NULL` as to not color by any column.
 - more checks for helper functions
 - better handling for static plots. Instead of plotting it within the function, we now return the data and implement a print method. I referenced what ggplot2 did for this handling and I think its notably more portable now.
 - fill out remaining tests
 - This was somewhat of a large refactor. Before this refactor, nested models could not be properly linked. Nested models have `based_on` values of `../2` or something similar, which broke the ability to map it to its parent model
 - To address this, we first have to determine which models are stored in a nested fashion by seeing if any of the model directories are subdirectories of another, and extracting the relative file path if so. After doing this, we have to overwrite the `based_on` attribute to just be the model id, and make the `run` column include the nested directory. This will cause the subdirectory to show in the tooltip as part of the model run, and will allow the models to be linked.
 - Given the complexity of some of the logic here, I opted to write a combined test that simulates a variety of scenarios all happening at once to ensure there are no interactions between them (broken links, recursive model directory, additional based on references, etc.).
 - This was only caught with local R CMD Check. The `R_CHECK_LENGTH_1_CONDITION` environment variable controls this, though apparently this was ditched in R 4.2, in favor of making this concern an error instead of a warning. Not sure if either of those things are relevant as to why it would pass on drone and not locally, but just documenting what I found on the subject.
 - In making the vignette I found a bug when coloring by a logical column. If all values were `TRUE`, all nodes would be colored white. Logical columns should be colored the same regardless of the unique values, so a refactor was necessary to explicitly set FALSE values to white and TRUE values to red.

 - Additional tests were added to ensure we catch this in the future.
 - also improve one of the examples
 - dont call model_summary() when possible
 - skip tests if required bbi version not met and model_summary() is required
 - model type is now displayed
 - bootstrap runs will show up in the model tree
 - `--simulation attached--` will appear if an attached simulation is found and `add_summary` is TRUE
 - now includes bootstrap and simulation examples
 - we could add a step to install this in the test suite, but we havent historically done that, and this aligns more with the current methods of skipping tests if suggested packages aren't installed
 - wasnt meant to display in vignette
 - skip_if_not_drone_or_metworx --> skip_if_not_ci_or_metworx
@kyleam
Copy link
Collaborator

kyleam commented Jun 27, 2024

Most of the check failures are straightforward. I can help take a look at the oldest job failure:

https://github.com/metrumresearchgroup/bbr/actions/runs/9701317535/job/26774698747

   ℹ Creating lockfile '.github/pkg.lock'
  
  ✔ Updated metadata database: 3.47 MB in 10 files.
  
  ℹ Creating lockfile '.github/pkg.lock'
  ℹ Updating metadata database
  ✔ Updating metadata database ... done
  
  ℹ Creating lockfile '.github/pkg.lock'
  ✖ Creating lockfile '.github/pkg.lock' [6.3s]
  
  Error: 
  ! error in pak subprocess
  Caused by error: 
  ! Could not solve package dependencies:
  * deps::.: Can't install dependency scales (>= 1.3.0)
  * scales: scales conflicts with scales, to be installed
  * any::sessioninfo: dependency conflict
  * any::pkgdown: dependency conflict
  * any::rcmdcheck: dependency conflict

@kyleam
Copy link
Collaborator

kyleam commented Jun 27, 2024

scales (>= 1.3.0)

1.3.0 isn't on MPN 2023-01-25 (it wasn't released until Nov 2023).

@barrettk is this constraint just copied from ggplot2? Or are we using scales directly in a way that requires 1.3.0 or later? If the former, I think we can drop the constraint, and hopefully that will resolve the pak failure.

@barrettk
Copy link
Collaborator Author

barrettk commented Jun 27, 2024

@kyleam It wasn't copied from ggplot2 and I believe I did that because of scales::pal_gradient_n, and there wasn't really an easy workaround. That being said I do have functions set up to skip tests and not run the functions if that version isnt there, so dropping it may still be a good solution.

See req_tree_pkgs, check_for_model_tree_pkgs, and skip_if_tree_missing_deps for details if interested

@kyleam
Copy link
Collaborator

kyleam commented Jun 27, 2024

I believe I did that because of scales::pal_gradient_n

I see. Then the options I can think of are

  • remove the constraint and add guards to check it at runtime for the specific spots that need a given version
  • install a newer scales from elsewhere (as we do for nmrec or mrgmisc)

I think you're going with the first option (no objections from me).

@kyleam
Copy link
Collaborator

kyleam commented Jun 27, 2024

Highlighting this note, since it won't fail the build:

https://github.com/metrumresearchgroup/bbr/actions/runs/9702122713/job/26777261543

* checking Rd \usage sections ... NOTE
Documented arguments not in \usage in Rd file 'format_model_type.Rd':
  ‘.mod’ ‘msg’

@barrettk
Copy link
Collaborator Author

barrettk commented Jun 27, 2024

@kyleam can you take a look at the CI before and after I added collapsibleTree to github actions? A) it seems like it was being installed on mpn-oldest before I added it, and B) clicking the green check is now displaying different/less options.

Edit: the B) I reported was because I accidentally hit 'enter' and didnt realize when I went to commit that change. A) is the only uncertainty

For reference: expected to see collabsibleTree here too
image

 - must have accidentally hit 'enter'
@kyleam
Copy link
Collaborator

kyleam commented Jun 27, 2024

A) it seems like it was being installed on mpn-oldest before I added it

You don't need the custom step as long as it can be retrieved from the date-pinned RSPM URL.

@barrettk
Copy link
Collaborator Author

You don't need the custom step as long as it can be retrieved from the date-pinned RSPM URL.

Would you recommend reverting this change then, or is there potentially a reason to keep it (some form of risk mitigation)?

@kyleam
Copy link
Collaborator

kyleam commented Jun 27, 2024

Would you recommend reverting this change then

yes

 - From KyleM: You don't need the custom step as long as it can be retrieved from the date-pinned RSPM URL

 - Note: we could add scales 1.3.0 at some point instead (to run more tests in CI), but we have functions available now to skip these. mpn-latest will execute them too so its not much of a risk
 - 0.1.7 is available from the date-pinned RSPM URL in (2023-01-25 cran snapshot). This version also included important bug fixes related to the new model_tree() function, so it makes sense to add before merging
@barrettk barrettk merged commit c2aa415 into main Jun 27, 2024
6 checks passed
@barrettk barrettk deleted the model-tree-diagram branch June 27, 2024 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants