Performance
-
BiocParallel-backed per-SangerReadconstruction loop. Replaces allparallel::mclapplycall sites withBiocParallel::bplapply. The constructor now accepts aBPPARAM = bpparam()argument and a.resolveBPPARAM(processorsNum, BPPARAM)helper maps the legacy integerprocessorsNumargument onto the right backend (SerialParamfor 1 worker,MulticoreParamon Unix andSnowParamon Windows for ≥ 2 workers). Cross-platform parallel is now first-class. -
Lazy 3-frame amino-acid translation (new
lazyAA = TRUEdefault). When norefAminoAcidSeqis supplied,Biostrings::translateis no longer called eagerly duringSangerReadconstruction. The slotsprimaryAASeqS{1,2,3}start as emptyAAStrings; the new exported accessorsprimaryAASeqS1(),primaryAASeqS2(),primaryAASeqS3()compute on demand. Eliminates ~35 % of construction wall time on protein-coding ABIF reads. -
Rcpp port of the peak-detection inner loop. New
src/peakvalues.cppexportspeakvalues_cppandpeakvalues_batch_cpp. The latter processes all peak windows for one channel in a single.Call, eliminating ~22,400 per-window R-to-C++ marshalling round-trips perSangerAlignmentbuild. Saves ~320 ms per build on the bundled fixture (≈ 1.28× end-to-end).Cumulative end-to-end timings on the bundled
Allolobophora_chlorotica/ACHLOfixture (8 reads, 4 contigs, mean of 5 repetitions, single thread):Milestone Wall time vs. baseline Pre-refactor baseline (eager AA, R-only) 1.85 s 1.00× Lazy AA + BiocParallel plumbing 1.33 s 1.39× + Rcpp peakvalues_batch_cpp1.07 s (best) / 1.14 s (mean) ~1.62× / 1.62× Full methodology and raw artifacts under
plans/05_e2e_validation_report.md,plans/06_scaling_summary.md, andplans/07_rcpp_optimization_log.md. -
Removed redundant
mclapplyparallelism aroundnPairwiseDiffsandoneAmbiguousColumn— the per-element work was sub-millisecond and fork-setup overhead dominated. They are now seriallapplycalls and ~5 % faster on small alignments.
New features
chromatogram_plotly(obj, max_points = 8000, showtrim = FALSE, colors = "default")— interactive Plotly htmlwidget rendering of Sanger chromatograms viascattergl(WebGL). Uniform-stride downsampling caps points-per-channel; the returned widget carries adownsample_infoattribute reporting the original / rendered counts.globalTrimApp(SA)— Shiny gadget that exposes M1 / M2 trim sliders across an entireSangerAlignment. Each "Apply" click callsupdateQualityParam(SA, ...)which cascades to every child read; live previews of consensus length, contig count, and per-contig stats update reactively. Returns the re-trimmedSangerAlignmenton "Done".primaryAASeqS1(sr),primaryAASeqS2(sr),primaryAASeqS3(sr)— lazy AA accessors that return the cached slot when populated, otherwise compute on demand via the existingcalculateAASeqhelper.BPPARAMargument added toSangerRead(),SangerContig(),SangerAlignment(). Defaults toNULL(derived fromprocessorsNum); pass anyBiocParallelParamto override.lazyAAargument (defaultTRUE) on the same three constructors. PasslazyAA = FALSEfor the legacy direct-slot eager-translation behaviour.
Robustness
- File-extension regex bug fixes. The pre-Phase-4 patterns
".fa$"/".fasta$"/".ab1$"(incheckFASTA_FileandcheckReadFileName) used unescaped., so files likeSanger_all_reads.XfaandAchl_006_F.Xab1slipped through as valid. Now\\.fa(sta)?$and\\.ab1$. Centralised as.FASTA_EXT_REGEXand.AB1_EXT_REGEXconstants. - Validator framework refactor. All 31
check*functions inR/UtilitiesFuncInputChecker.Rnow route through 5 internal helpers (.errAppend,.requireType,.requireEnum,.requireRange,.requireExt). Public signatures and error-type tags (PARAMETER_RANGE_ERROR,FILE_TYPE_ERROR, etc.) are preserved exactly — no consumer change required. File shrunk 603 → 523 lines (-13 %). - S4
setValidityinvariants added onQualityReport,ChromatogramParam, andObjectResults. Catch any code path (includingslot<-mutations after construction) that would land an out-of-range value in the slots. Sanger* user-facing classes intentionally have no validity (preserves the "construction never throws" contract). - Lazy-AA report compatibility. All 30 direct
@primaryAASeqS{1,2,3}slot reads in the Shiny servers and RMarkdown report templates were converted to the accessor functions, so reports rendered againstlazyAA = TRUEobjects no longer produce empty AA tables. (The 12<<-write sites are intentionally preserved — they're reactive caches.) - Removed stray
cat()/print()/message()debug statements fromR/UtilitiesFunc.R,R/UtilitiesFuncInputChecker.R,R/ShinyServerModule.R, andR/ShinySangerContigServer.R. User-facing reporting now useslog_infoconsistently.
Build / compliance
R CMD checkis fully clean: 0 errors / 0 warnings / 0 notes (was 0 / 5 / 7 before Phase 9).- 1360 testthat tests, all passing. New test files added across phases:
test-Validator-EdgeCases.R,test-Validator-Helpers.R,test-Regex-Boundary.R(validator + regex regression).test-OrthogonalAxes.R(input-source × process-method × trim-method matrix).test-SangerRead-SlotInvariants.R,test-S4Validity.R(S4 invariants).test-Rcpp-peakvalues.R(R / C++ equivalence + 200-trial fuzz).test-LazyAA-BiocParallel.R,test-LazyAA-Reports.R(Phase 6 / 8).test-Phase8-PlotlyChromatogram.R,test-Phase8-GlobalTrim.R,test-Phase9-GlobalTrim-testServer.R(UI).test-Phase10-Coverage.R(coverage maximisation).
- Coverage measured by
covr::package_coverage(): 35.7 % overall, > 87 % on every non-Shiny R file (the three Shiny server files at 0 % require a real browser harness; deferred). DESCRIPTIONmodernised:Authors@R(replacing the deprecatedAuthor:+Maintainer:pair),URL,BugReports,License: GPL-2 | file LICENSE,LinkingTo: Rcpp, R version dependency>= 4.0.0(intentionally permissive).Depends:slimmed from 27 entries to the 4 packages whose types are publicly returned (Biostrings, DECIPHER, sangerseqR); the rest moved toImports:(orSuggests:for vignette-only deps).- ASCII-only source: replaced curly apostrophes, em-dashes, arrows, and multiplication signs across
R/UtilitiesFunc.R,R/Class*.R(was aR CMD checkwarning). - Re-saved
data/*.RDatawith xz compression (largest file 1.5 MB → 698 KB).
Bug fixes
GitHub-issue cleanup across three resolution sprints (Phases 15–17):
- #100 (CSV substring contig-name match):
processCSV.Rnow matches contig names exactly instead of viagrepl(name, …), so a contig namedgoodis no longer accidentally absorbed into a contig namedgood_extra. - #92 (forward-only
NULLhandling):SangerContig()acceptsREGEX_SuffixReverse = NULL(orNA_character_) andminReadsNum = 1for forward-only / reverse-only datasets. Previously the constructor errored withargument is of length zero. - #76 (missing
PCON.2block): ABIF reads with emptyPCON.2quality data now get a synthetic Phred-30 score per base with aMISSING_QUALITY_SCORES_WARN, instead of silently constructing an unusable read. - #89 (single-read
writeFasta):writeFastaSCno longer errors onSangerContigs built from a single read. - #94 (low-overlap detection): new
minOverlapBases(default 20) andminOverlapFraction(default 0.4) post-alignment guards incalculateContigSeq. Spurious low-overlap merges emitLOW_OVERLAP_WARNand the contig is rejected before it propagates to the alignment. - #66 (degenerate consensus): IUPAC ambiguity-code handling in
ConsensusSequence(ambiguity = TRUE)is correctly preserved; consumers that calledas.character()on the consensus once again see ambiguity codes rather thanNcollapse. - #65 (multi-contig duplicate reads): when a read is matched into more than one contig (CSV or REGEX),
SangerAlignmentnow logsREAD_ASSIGNED_MULTIPLE_CONTIGS_WARNand assigns the read to the first matching contig only. - #42 (length-1 reads): reads of width < 2 bp are dropped at alignment time with
MIN_READ_LENGTH_DEFENSIVE_DROP, allowing the contig to build from the surviving reads instead of failing the whole alignment. - #91 (M2 trimming on degraded reads): the
QualityReportvalidator now accepts the degenerate "no usable trim window" state (trimmedFinishPos == 0whiletrimmedStartPos > 0) on extremely low-quality reads.
New features (Phase 17 — consensus algorithms)
consensusMethodargument onSangerContig()andSangerAlignment()with three options:"strict"(default; pre-Phase-17 behaviour) — IUPAC ambiguity codes preserved at disagreeing columns."majority"— most-frequent base wins per column; ties break alphabetically. No IUPAC codes in the output."quality_weighted"— votes weighted by per-base Phred from the source reads; falls back to flat Phred 30 (with a warning) when scores are missing or for FASTA inputs.
qualityAware = TRUEis a shorthand forconsensusMethod = "quality_weighted".- Per-position consensus quality scores.
attr(@contigSeq, "qualityScores")is now an integer vector of lengthlength(contigSeq)under"majority"and"quality_weighted"modes (emptyinteger(0)under"strict"for backwards compatibility). Closes the long-standing #87 / #48 / #33 cluster.
Documentation (Phase 18)
- Vignette overhaul.
vignettes/sangeranalyseR.Rmdrewritten end-to-end with a "How to..." recipe gallery (10 recipes covering single-contig assembly, CSV mapping, forward-only data, low-quality trimming, low-overlap detection, consensus methods, secondary peaks, Shiny launch, FASTA / HTML export), a constructor parameter reference (4 tables), a troubleshooting matrix mapping common errors to the Phases 15–17 fixes, and asessionInfo()block. Closes #13, #49, #71, #99. R CMD check --run-donttesthardening. Multiple pre-existing latent example bugs were fixed:inst/rmd/SangerContig_Report.Rmdnow loadslibrary(knitr)sokable()resolves during report rendering.readTable.SangerReadandreadTable.SangerContigexamples no longer callreadTable(sangerAlignmentData)(no method exists forSangerAlignment).globalTrimApp,launchApp,launchAppSC,launchAppSAexamples switched from\donttest{}to\dontrun{}sorunGadget()/ auto-printedshiny.appobjno longer hang the example runner.
- Maintainer tooling.
plans/close_issues.py(Phase 16.5) now parses anAction: close|commentmetadata flag from each issue's reply Markdown — comment-only entries (used for "please retest on devel" responses) skip thestate=closedPATCH. Backwards-compatible with the existing Phase-16 / Phase-17 reply files.