Skip to content

identify_core replicates results from Shade & Stopnisek 2019#97

Merged
jibarozzo merged 28 commits intodevelopfrom
patch/06_final_debugging
Apr 25, 2026
Merged

identify_core replicates results from Shade & Stopnisek 2019#97
jibarozzo merged 28 commits intodevelopfrom
patch/06_final_debugging

Conversation

@jibarozzo
Copy link
Copy Markdown
Collaborator

@jibarozzo jibarozzo commented Apr 24, 2026

Summary of Changes since 9305758

The previous #95 implemented vegan::avgdist() in identify_core(). This worked at first glance, but upon further inspection, we encountered problems with the results from unrarefied datasets. With rarefied datasets, the effects were less. The issue was caused by how vegan::avgdist() handles the first-ranked OTU; it can't handle zeros in a matrix of 1. The solution was to work with multiple rarefactions in multi_rarefy() and to calculate the Bray-Curtis dissimilarity with vegan::vegdist(), then average. We essentially split up what vegan::avgdist() does internally. Thanks to @Gian77 for his keen eye in detecting this.

Refactoring

  • identify_core.R — Major rewrite: made rarefied_list optional, extracted .mean_bc_over_iters as a standalone helper with vegan::vegdist() as engine, integrated multi_rarefy output directly, renamed Indexrank
  • Minor changes: Tidy eval compliance enforced throughout using .data$ pronouns and quoted column names, reducing globals.R entries.
  • multi_rarefy.R.as_array = FALSE replaced with .as = "list", single-iteration handling improved to support all .as formats and seeds.
  • globals.R — Significantly trimmed; variables now covered by .data$, .env$, or @importFrom were removed.
  • brcore_theme.R (new file) — Extracted .brcore_theme() as a shared internal helper to unify plot styling across all plotting functions (borders, title sizes, viridis color palettes).

Bug Fixes

  • plot_identified_core.R — Fixed deprecated size → linewidth in panel.border.
  • identify_core.R — Fixed max() instead of last() for proportionBC normalisation; fixed BC ranking and pair alignment using unique time points; fixed rarefied_list validation to require ≥ 2 iterations.
  • fit_neutral_model.R, plot_neutral_model.R — Minor fixes aligned with theme refactor.

New Features / API Changes

  • identify_core() now works with vegan::vegdist() under the hood intead of vegan::avgdist()
  • plot_identified_core() — Now returns a named list with $df and $plot. Accepts optional dataset_name parameter for plot titles.
  • update_otu_table() — Documentation and behavior improvements.
  • find_core — Removed entirely; ground truth logic moved to debugging branch.

Documentation & Tests

  • Vignette (BRCore-vignette.Rmd) revised to reflect the necessary multi-iteration workflow.
  • README updated accordingly.
  • Test suite updated for identify_core, multi_rarefy, plot_identified_core, plot_variance_propagation, update_otu_table, and the full vignette workflow.
  • Test data regenerated (test_vignette_data.rda grew ~250 KB).

✅ End State

rcmdcheck::rcmdcheck() passes with 0 errors | 0 warnings | 0 notes.

Gian77 and others added 24 commits April 20, 2026 15:43
…vious of `identify_core()` - adding some checks to the function to gelp debugging.
… mimulus dataset which still gives me more core OTUs with last 2% increase, and the modified identify_core() to ouput some checks and delta % increase.
…ers` as a standalone helper

works with switchgrass and bcse
…ry plot has borders and same size titles. Standardized color palettes with ggplot2::scale_color_viridis_d(option = "turbo") (continuous or discrete), of ggplot2::scale_color_viridis_c(option = "plasma")

refactor: plot_identified_core()` now returns a list with a df and a plot. New  optional `dataset_name` parameter for adding dataset name to plot title.
docs: updated `identify_core` documentation and plottting functions docs
…n `multi_rarefy()`. Functionality stays the same.

- Updated `test-multi_rarefy.R`.
docs: updated test after rcmdcheck::rcmdcheck()

Based on the current code and session output:

rarefied_list	is_rarefied	Result
valid list, length ≥ 2	either	✅ proceed
valid list, length == 1	either	❌ abort
not a list / empty	either	❌ abort
NULL	TRUE	✅ wrap OTU table as single iteration
NULL	FALSE	❌ abort
…nce in `identify_core` and `add_rarefaction_metrics()`. This reduces the amount of global variables in `globals.R`. The remaining variables are shared among functions.

rcmdcheck() pass
@jibarozzo jibarozzo added this to the First CRAN release milestone Apr 24, 2026
@jibarozzo jibarozzo requested a review from Gian77 April 24, 2026 22:01
@jibarozzo jibarozzo self-assigned this Apr 24, 2026
@jibarozzo jibarozzo added bug Something isn't working enhancement New feature or request High Priority Needs attention now labels Apr 24, 2026
Copy link
Copy Markdown
Collaborator

@Gian77 Gian77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job @jibarozzo, extensive changes, I think we are ready to merge and finish up with manuscript revisions. Working on manuscript figure right now!

@jibarozzo jibarozzo marked this pull request as ready for review April 25, 2026 01:54
@jibarozzo jibarozzo merged commit 3ac6b2a into develop Apr 25, 2026
6 checks passed
jibarozzo added a commit that referenced this pull request Apr 25, 2026
New HEAD after merging `develop` into branch post PR #97
@jibarozzo jibarozzo mentioned this pull request Apr 25, 2026
jibarozzo added a commit that referenced this pull request Apr 25, 2026
* Add abundance occupancy and core distribution plot functins as in Shade and Stopnisek 2019

* fixes for two_plot

* minor fixes and adjustments while rerunning the moch analysis, do_phyloseq was not working correctly, no I fixed it.

* amending previous push: new plotting functions and minor fixes

* new pre-rarefaction functions and other fixes, a bcse phyloseq object with 50 samples as a base test dataset for the package, other small fixes and check messaging implementations

* amending previous push to fix errors

* refining some plot function and fixing minor errors

* adding a few things I forgot to add...

* Update R/plot_core_distribution.R

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update R/plot_core_distribution.R

linewidth

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* adding vignette

* finalized vignette for BRCore

* - Gains overhaul package and NAMSPACE import in functions. Now using `importFrom` more. Importing only what we use.
- Change from %>% pipe to |> base pipe. Plotting functions don't wrangle the data that much so we don't need the added benefits of %>%. This will be applied to other functions throughout.
- identify_core() gains some clarity in function description as well as header cleanup (this helps when viewing outline in VS Code or Positron).
- identify_core()'s "fo_diffs" output parameter updated to "elbow_slope_diffs" as it is more informative of the proccess applied to the data.
- identify_core() changes pipe from %>% to |>
- NAMESPACE cleans up a little bit after all this
- corresponding function tests reformatted usint Air styler

* - Resolving errors in failed checks

* Missing NAMESPACE imports

* Missed add_rarefactions_metrics updates

* Update R/plot_rarefaction_metrics.R

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update man/identify_core.Rd

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixing import from identify_core()

* Combned through all the Copilot comments.
- Added imports via `importFrom` that were missing
- Removed %>% in other functions where it was unnecessary.
- Only `do-phyloseq()` retains %>% in subsection
- Updated NAMESPACE

* Typo fixes and adding `phyloseq` functions to NAMESPACE

* Importing phyloseq to handle GH action failure.

* Reverting addition of phyloseq (complete pkg) to NAMESPACE since GH action is due to failure to connect to Bioconductor.

* Changed `@examplesIf requireNamespace("phyloseq", quietly = TRUE)` for the more explicit `@examples
#' \donttest{
#' library(phyloseq)
#' library(BRCore)` To convey the importance and relience of this package on "phyloseq".
- Kept \dontrun{} in some functions' documentation, like plot_neutral_model() that are only illustrative and depend on other functions.
- Changed name: `do_phyloseq()` is now `update_otu_table()

* vignette with new package updates. amending previous commit

* - identify_core() gains ranking start time,  progress bar and output messaging for users.

* - Gains cleanup of old Rdata objecs, tests and experimental parallel benchmarks.

* Gains STATIC vignette
- `precompile.R` builds static vignette from "vignettes/BRCore-vignette.Rmd.orig" --> "vignettes/BRCore-vignette.Rmd
- `prcompile.R` cleans directory and moves images to necessary direcotry for `devtools::build_vignette()`

* Initial plan

* Create brief README.md with badges, goal note, features, and quick start example

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Update README.Rmd to brief version for render-rmarkdown workflow

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Increment version 1.0.0 for release. Adding new authors and description of package.

* Typo fixes pointed out by Copilot in vignette. Updated vignette.

* Fixing errors in render-rmarkdown.yaml

* Targeting 'main' only

* Update workflow to render only README RMarkdown files

* Increment version number to 1.0.1
- Bringing README workflow updates, vignette fixes and release v.1.0.0 updates from release/1.0.0 to develop.
-  `identify_core()` gains `max_otus = NULL` to select the top OTUs ranked and perform BC calculation on these.
- Gains codecov.yaml
- Fixes to R-CMD-check.yaml
- Fixes and alert messages to plot_core_distribution()
- Removed Unicode charecters from identify_core() documentation to avoid rcmdcheck() warnings

* Implemented suggestions by Copilot review.
- Removed double validation of abundance_weight
- Added tests for `max_otus` parameters in identify_core()

* Improvements to DESCRIPTION and README:
- Added Gian's ORCID
- New, brief description of the package in DESCRIPTION and README

* - Clean up of test-* to remove `testthat::`
- Clean up of BRCore-vignette to rename correctly fig4
- Renamed fig4

* Minor fixes to DESCRIPTION and README.Rmd wording.

* Added comprehensive vignette workflow validation tests
- Add 4 test suites validating complete BRCore analysis pipeline
    - Test data structures, core identification, visualization, and neutral model fitting
    - Compare workflow outputs against reference data in `test_vignette_data.rda`
- Force single-threaded execution on CI for reproducibility

* Implement explicit RNG algorithm for cross-platform reproducibility in multi_rarefy()

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Add cross-platform RNG fix to identify_core() and comprehensive documentation

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Enhanced RNG reproducibility documentation with BLAS analysis and references

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Add documentation README for technical docs directory

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Add comprehensive implementation summary document

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Fix cross-platform RNG reproducibility with pre-generated iteration seeds

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Address code review feedback on comments and documentation

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* Refactor seed handling for rarefaction and add extensive debugging output

Remove cross-platform RNG configuration (RNGkind) and iteration-specific seed pre-generation in favor of simpler per-worker seeding. Add comprehensive debug logging for dataframe dimensions, sample/taxa validation, and filtering behavior to troubleshoot numeric precision issues in rowSums filtering.

* Refactor multi_rarefy: add internal .single_rarefy function use instead of vegan::rrarefy(), and improve code organization

* Improve multi_rarefy CLI output with structured reporting and better user feedback

* Add old multi_rarefy implementation and refactor with custom rarefaction

- Rename original multi_rarefy to multi_rarefy_old with new cli information out put to debug. Similar to new implementation below, but cleaner.
- Implement new multi_rarefy with custom_rrarefy function
- Fix sample_size parameter in rarefaction call
- Add extensive debug output for troubleshooting filtering issues
- Remove RNG state management from identify_core
- Delete IMPLEMENTATION_SUMMARY.md

* Fix floating-point issue in multi_rarefy() using dplyr::near() tolerance

Co-authored-by: jibarozzo <64099468+jibarozzo@users.noreply.github.com>

* - Remove deprecated multi_rarefy_old function and clean up imports.
- Regenerated test_vignette_data.rda based on changes to multi_rarefy()

* Add explanatory comment and update test data regeneration script

- Add comment explaining use of `near()` for floating-point precision in multi_rarefy
- Replace `library(BRCore)` with `devtools::load_all()` in test data regeneration script
- Regenerate test vignette data

* Add tolerance to floating-point comparisons in vignette workflow tests

* Add floating-point tolerance to rarefaction depth tests

* - Fix rarefaction check to handle floating-point precision in sample sums
- identify_core(): Use tolerance-based comparison instead of exact equality when checking if samples are rarefied, preventing false negatives due to floating-point arithmetic. Round the display value and fix corresponding test assertion.

* Refactor rarefied OTU table validation and adjust tolerance in workflow test

* Fix rarefaction seed generation to ensure reproducibility across parallel workers

Generate all iteration seeds upfront using the initial seed, then pass pre-generated seeds to parallel workers instead of computing `set_seed + i` in each worker. This should ensure deterministic results regardless of parallel execution order.

* Update test vignette data binary file

* Relax tolerance for Bray-Curtis ranked comparison in vignette workflow test

* Enable Windows CI checks and add input validation to multi_rarefy

- Enable Windows (latest) R release in GitHub Actions workflow
- Remove branch restriction from push trigger
- Add input validation for depth_level, num_iter, and threads parameters
- Move seed setup earlier in multi_rarefy function
- Simplify sample filtering logic in rarefaction output

* Refactor thread management in tests to use get_available_cores() helper

Replaced manual thread count logic with centralized get_available_cores() function across vignette workflow tests to simplify code and maintain consistent threading behavior.

* Increment version number to 1.0.2

* Bump to v1.0.2 and minor fixes to test file paths

- Replace `here()` with `test_path()` for proper `testthat` file references for flexibility when  using `devtools::check()` or `rcmdcheck::rcmdcheck()`
- Move `test_sets` directory into `tests/testthat` for correct test structure
- Update floating-point tolerance and simplify dplyr syntax in multi_rarefy tests

* Configure R-CMD-check workflow triggers and clean up unused imports brought by merge with "main"

- Add branch-specific triggers for push events (release, patch, feature)
- Add branch filters for pull requests (main, develop)
- Add manual workflow_dispatch trigger
- Remove unused vegan::rrarefy import from NAMESPACE
- Remove obsolete rarefaction metrics test file
- Remove unused elbow plot vignette image

* docs: Add community health files and update README with contribution guidelines

* docs: Add ORCID identifiers for authors and specify minimum package versions in dependencies

* docs: Add CRAN installation instructions to README

* docs: Add GitHub links to DESCRIPTION

* style: Based on `pkgcheck` suggestions
- Add air.toml formatter config, VS Code settings
- apply 2-space indentation style across all R source and test files

* fix: Fix incorrect seq_len usage in ranked samples plot

Replace `seq_len(temp_df)` with `seq_len(nrow(temp_df))` to correctly generate x-axis sequence based on number of rows rather than passing the entire dataframe to seq_len.

* refactor: remove unused imports and improve code formatting

- Remove `import(parallel)` and `import(parallelly)` from NAMESPACE
- Add specific `importFrom(parallelly, makeClusterPSOCK)`
- Replace `sapply` with `vapply` in parallel helper functions
- Reformat documentation line wrapping for better readability
- Clean up code formatting and indentation in examples
- Remove vignette installation instructions from README

* Adding output for number of zeros and sparsity calculation on the raw and rarefied (average across iteration) table result.

* refactor: Refactor sparsity calculation into helper function and improve output formatting

- Extract sparsity counting logic into `.sparsity_count()` helper function
- Add section headers to rarefaction results output (Sample Removal, Taxa Removal, Data Sparsity, Final Data Dimensions)
- Remove trailing whitespace
- Move sparsity reporting after sample/taxa removal reporting

* refactor: multi_rarefy
- Replace custom rarefaction " .single_rarefy"  with `vegan::rrarefy` and add cluster library loading

* refactor: identify_core - Refactor BC calculation into internal helper function and add progress updates
test: regenerate reference data using vegan::rrarefy()
- Extract Bray-Curtis dissimilarity calculation into `.calculate_bc()` helper
- Add progress bar updates in core identification loop
- Remove unused sample pair label generation from main function
- Fix test tolerance parameter name from `tol` to `tolerance`

* refactor: clean up commented code and reorganize BC calculation helper in identify_core()
regenrate error introduced to `tests/testthat/test_sets/test_vignette_data.rda` in previous commit to

Remove commented-out code blocks in identify_core, reorder pair_labels calculation before use in .calculate_bc helper, and fix test data file path

* feat: Add `.summarize` parameter to control rarefaction output format

Added a new `.summarize` parameter (default TRUE) to `multi_rarefy()` function to allow users to choose between summarized mean counts or raw iteration data.

Still a work in progress!

* feat: multi_rarefy

- Adds a new `.summarize` parameter (default TRUE) to control whether rarefaction results are averaged across iterations or returned as individual iteration-level data. When FALSE, sample IDs include "_iter_#" suffix.
- Updates reporting logic to handle both summarized and non-summarized outputs, including sparsity calculations and dimension reporting.
- Refactors result processing and extracts reporting into separate helper function.

* fix: Add support for unsummarized rarefaction output in update_otu_table()

Extended `update_otu_table()` to handle both summarized and unsummarized output from `multi_rarefy()`. When iteration suffixes are detected, sample_data is replicated to match all iteration rows. Refactored sample status reporting into internal helper function and improved documentation with examples for both modes.

* feat: Add identify_core_avgdist as optimized alternative to identify_core; r
refactor: replace deprecated tidyr::gather with pivot_longer in identify_core*; add tests for equivalence, performance, and multi_rarefy unsmarized output

* new fix on the R/identify_core_avgdist.R function. It runs through but output a ranked_bc with NaN even if do not seem to be in the BC_ranked witing the function... I will keep digging, but we are close.

* fix: Refactor rarefaction handling in identify_core_avgdist

- Add is_rarefied check and informative messages for data state
- Use vegdist directly for rarefied data instead of avgdist subsampling
- Maintain consistent normalization across rarefied/unrarefied workflows
- Remove unused .peek() helper function
- Fix variable naming inconsistencies in tests
- Updated tests-identify_core.R and tests-vignette_workflow.R to tests identify_core_avgdist()
- Clean up whitespace and formatting

* fix: Rename parameter from num_iter to num_iterations in avgdist call

* refactor: improve Bray-Curtis calculation with rarefaction handling

- Pass `is_rarefied` parameter to `.calculate_bc_()` helper function
- Filter samples by depth before computing pairs for unrarefied data
- Move sample pair computation inside conditional blocks for efficiency
- Remove debug `print()` statements
- Suppress warnings from `vegan::avgdist()` calls
- Update test to use unrarefied data with explicit iteration count

* fix: Improve floating-point precision handling and update test configurations

- Add floating-point tolerance in depth filtering to handle precision issues
- Improve warning message formatting with newlines for better readability
- Simplify comment for per-pair normalization calculation
- Add max_otus parameter to equivalence tests for better consistency
chore: Comment out benchmark test and add context about fair comparison methodology
- Switch vignette workflow tests back to identify_core for baseline validation
- updated `globals.R` to include "iter" from multi_rarefy

* fix: Fix incorrect phyloseq object used in core microbiome identification

Use `bcse_rare` instead of `bcse` in `identify_core()` call within vignette workflow test

* refactor: Add support for returning unsummarized rarefaction iterations as list

Refactor multi_rarefy to return a list of data frames when .summarize=FALSE, with improved handling of zero-abundance taxa removal per iteration and enhanced reporting including min/max/avg sparsity and dimension statistics across iterations.

* docs: Update multi_rarefy documentation: fix formatting and improve parameter descriptions

* docs: improve multi_rarefy test assertions for unsummarized output

Replace single row count check with comprehensive validation of list structure, iteration naming, sample counts per iteration, row name suffixes, and rarefaction depth consistency.

* I added the a new function, multi_rarefy_new(), that generate the array (or list) with the rarefaction iterations based on vegan::rrarefy, It's incredibly fast, check it out. For the future - maybe using arrays couldbe a way to imporve speed in other functions as well.."

* refactor: multi_rarefy: remove parallelization and summarization, add .as_array option and pre-filter samples by depth

- Replace `.summarize` and `threads` params with `.as_array` flag for 3D array output
- Remove parallel cluster setup (parLapply) in favor of sequential lapply/for loop
- Pre-filter samples below depth_level before rarefaction
- Update `.report_rarefaction_results` to handle both array and list-of-dataframes outputs
- Improve sparsity and dimension reporting for both output modes
- cleaned NAMESPACE and .Rd

* docs: updating multi_rarefy documentation

* New plotting function for rarefaction variance propagation viz

* new additions to the plot_variance_propagation.R . Good for today, more the next week.

* docs: New test-plot_variance_propagation.R

* refactor: phyloseq validation and add utility functions for OTU extraction

Add helper functions `.phyloseq_class_check()` and `.extract_otu_matrix()` to centralize phyloseq validation and OTU table extraction logic. Replace inline validation in `multi_rarefy()` and `plot_variance_propagation()` with these reusable utilities. Also includes code formatting improvements.

* refactor: variance propagation plot with improved Hill number calculation and utility functions

- Optimize `.hill_number()` to handle both vector and matrix input with vectorized operations
- Add `.get_iter_list()` helper to standardize rarefied object conversion
- Improve data flow by combining iteration tracking with bind operations
- Simplify metadata joins to only include required columns
- Move `.validate_group_vars()` to plot file alongside usage
- Update plot title and header text for clarity
- Remove unused helper functions and redundant code
- Add comprehensive documentation for internal functions

* refactor: use shared helper functions for phyloseq validation and OTU extraction in `identify_core_avgdist`

* refactor: move phyloseq class check before OTU extraction in plot_variance_propagation and suppress duplicate check in utility function
- use `.extract_otu_matrix()` helper in `multi_rarefy()`

* refactor: update multi_rarefy tests and clean up identify_core_avgdist comment

* docs: update examples and parameter docs for multi_rarefy

* refactor: `identify_core_avgdist` is now `identify_core`
- Rename `physeq` parameter to `physeq_obj` across all functions for consistency
- Rename `otu_rare` to `rarefied_otus` in `update_otu_table()`
- Add `iteration` parameter to `update_otu_table()` for extracting specific iterations from list/array output
- Update all examples, tests, and documentation to use new parameter names

TODO: update vignette and fix READMEs

* fix: undefined variable, improve documentation, and update tests suggested by Copilot

- Initialize `avg_taxa_removed` to NULL in `multi_rarefy()`
- Enhance `plot_variance_propagation()` documentation and y-axis labels
- Add group variable validation in `plot_variance_propagation()`
- Fix Hill number calculations for Shannon and Simpson diversity
- Remove `threads` parameter from README examples
- Update test expectations for array/list dimension consistency
- Update vignette workflow to use list output and `identify_core()` with iterations

* fix: Clean up section comments and remove commented-out code in identify_core

* fix: suppress duplicate model fitting messages and use rounded N for binomial model
- Added comprehensive messaging to inform user of species used in model fitting.

* refactor: Tests
- migrate to `.as_array = FALSE`
- add `iteration` param, and
- `num_iter` in test workflows and tests/testthat/test_sets/regenerate_test_data.R`
- Regenerated `tests/testthat/test_sets/test_vignette_data.rda`

* fix: add rarefied depth subtitle, fix method factor levels, clean up labels, update docs

* fix: Add group variable validation to `plot_variance_propagation`

* docs: vignette revamp BRCore vignette with TOC, variance propagation plot, and updated workflow
- New figure order
- Streamlined library call

* refactor: replace `sapply` with `vapply`, remove `utils` import, and fix misc issues suggested by pkgcheck::pkgcheck()

* fix: correct singleton typo in `fit_neutral_model()`
refactor:  OTU extraction in `.extract_otu_matrix()` and include in `add_rarefaction_metrics()`
docs: update tests for `add_rarefaction_metrics()` and `fit_neutral_model()`. I applied 80 chracters line limit to function description and docs.

* fix: improve CLI messages for rarefaction depth reporting in `identify_core()`. Regenerated `tests/testthat/test_sets/test_vignette_data.rda`

* docs: New test setup `tests/testthat/setup.R` file to generate `tests/testthat/test_sets/test_vignette_data.rda` platform-specific reference data before tests run.

* fix: write regenerated test data to tempdir on CI to avoid read-only filesystem errors

* fix: separate local and CI test data path logic in setup.R

* fix: source R script instead of .rda in CI setup, and use tempdir for CI output path

* fix: Fix CI condition logic for test data output path

* fix: ensure output directory exists before saving test data in CI. Buildoing rda_path with `testthat::test_path()`

* With thi commit, I fixed:
- the vignettes/BRCore-vignette.Rmd, cleaning and imporving text and adding text. I added a couple of more pragarphs. I imroved math rendering in the html, hopefully pass the tests.
- same changes in the vignettes/BRCore-vignette.R
- a couple of more referneces added to vignettes/references.bib
- minor output message changes in the R/identify_core.R

* created a new branch `patch/06_final_debugging` to debug the new behavious of `identify_core()` - adding some checks to the function to gelp debugging.

* updates before closing

* testing versions of identiy_core() and plot_identified_core() with internal checks to help debugging

* new bug fixes and improvements for main core functions

* new function with the original code developed by Njec, reimported the mimulus dataset which still gives me more core OTUs with last 2% increase, and the modified identify_core() to ouput some checks and delta % increase.

* fix: use unique time points for index, stabilize BC ranking and pair alignment

* refactor: integrate `multi_rarefy` output into `identify_core` BC accumulation

* refactor: make `rarefied_list` optional and extract `.mean_bc_over_iters` as a standalone helper

works with switchgrass and bcse

* fix: use `max()` instead of `last()` for proportionBC normalization

* refactor: extract `.brcore_theme()` helper to unify plot styling. Every plot has borders and same size titles. Standardized color palettes with ggplot2::scale_color_viridis_d(option = "turbo") (continuous or discrete),  of ggplot2::scale_color_viridis_c(option = "plasma")
refactor: plot_identified_core()` now returns a list with a df and a plot. New  optional `dataset_name` parameter for adding dataset name to plot title.
docs: updated `identify_core` documentation and plottting functions docs

* docs: mark `.brcore_theme` as internal with `@noRd`

* refactor: Replace `.as_array` logical with `.as` character argument in `multi_rarefy()`. Functionality stays the same.
- Updated `test-multi_rarefy.R`.

* docs: Update docs and messages for `.as` parameter rename and simplify taxa removal output

* fix: single iteration handling to support `.as` formats and seeds in `multit_rarefy()`

* fix: replace deprecated `size` with `linewidth` in panel.border in `.brcore_theme()`

* refactor: Rename `Index` to `rank` in identify_core and improve documentation

* refactor: Replace `.as_array = FALSE` with `.as = "list"` in docs and tests

* chore: Remove `find_core.R` ground truth function now lives in `debugging` branch

* docs: Update tests for `identify_core` and renegrated test data

* docs: Remove `find_core` function and its exports

* docs: improve formatting, captions, and styling

* docs: update vignette and README to use multi-iteration core identification workflow

* fix: update "rarefied_list" validation to require minimum 2 iterations.
docs: updated test after rcmdcheck::rcmdcheck()

Based on the current code and session output:

rarefied_list	is_rarefied	Result
valid list, length ≥ 2	either	✅ proceed
valid list, length == 1	either	❌ abort
not a list / empty	either	❌ abort
NULL	TRUE	✅ wrap OTU table as single iteration
NULL	FALSE	❌ abort

* refactor: use `.data$` pronoun and quoted names for tidy eval compliance in `identify_core` and `add_rarefaction_metrics()`. This reduces the amount of global variables in `globals.R`. The remaining variables are shared among functions.
rcmdcheck() pass

* fix: NAMESPACE import `all_of` from dplyr

* docs: reduce num_iter and add missing rarefaction step in test 4 of `tests/testthat/test-vignette-workflow.R`

* fix: update `bcse_identified_core` plot access to use list index and nested element

* test: update ggplot check to use positional indexing for identified core

* chore: bump version to 2.0.0, add pkgdown site, and update CI workflows.
New HEAD after merging `develop` into branch post PR #97

* docs: add release notes and CRAN comments for BRCore 2.0.0

* docs: scope `data()` calls to test environment and remove unused library imports. Trying to comply with failing test-coverage.yaml

---------

Co-authored-by: Gian77 <gian.benucci@gmail.com>
Co-authored-by: Gian Nico Benucci <benucci@msu.edu>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
@jibarozzo jibarozzo mentioned this pull request Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request High Priority Needs attention now

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants