Summary
rustar doesn't write Log.out (verbose run log) or Log.progress.out (per-chunk progress timestamps) — STAR writes both alongside Log.final.out. Consumers that parse these files for parameter dumps, per-phase progress, warnings, memory usage, or chunk-level mapping rates get nothing.
The goal is real content parity — not stubs that mimic STAR's section-header structure with placeholder content. Files that look like STAR's verbose log but carry only a {:#?} Debug dump of params and three timestamps mislead consumers worse than the files being absent (they pass file-existence checks but fail every actual parse).
STAR reference behaviour
Log.out is the verbose run log, written incrementally by source/InOutStreams.cpp plus parameter-dump and per-phase update calls scattered across source/Parameters.cpp, source/Aligner.cpp, and source/sjdbInsertJunctions.cpp. Content:
- Full parameter dump with every default value (STAR's parameter format, one
name<TAB>value line per parameter).
- Per-phase progress messages (
..... loading genome, ..... started mapping, ..... finished mapping, etc.) with timestamps.
- Warnings (
WARNING --X ...) and informational notes emitted during run.
- Final timing and memory usage info.
Log.progress.out is updated periodically (roughly every minute) during alignment, one line per chunk reporting reads processed and mapping speed.
Reproducer
#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-logout && cd /tmp/rustar-mre-logout
BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_1.fastq.gz
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_2.fastq.gz
RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4
mkdir -p idx-rustar idx-star
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
--genomeDir idx-rustar --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
--sjdbOverhang 100 --genomeSAindexNbases 7
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
--genomeDir idx-star --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
--sjdbOverhang 100 --genomeSAindexNbases 7
COMMON=(--readFilesIn SRR6357072_1.fastq.gz SRR6357072_2.fastq.gz --readFilesCommand zcat
--runThreadN 4 --sjdbGTFfile genes.gtf --twopassMode Basic --runRNGseed 0
--outSAMtype BAM Unsorted)
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
--genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix RUS.
docker run --rm -v $PWD:/w -w /w $STAR STAR \
--genomeDir idx-star "${COMMON[@]}" --outFileNamePrefix STAR.
echo "=== STAR Log* files ==="; ls STAR.Log*
echo "=== rustar Log* files ==="; ls RUS./Log*
Observed: STAR writes STAR.Log.final.out, STAR.Log.out, STAR.Log.progress.out. rustar writes only RUS./Log.final.out.
Suggested approach
This is structural — Log.out needs progress hooks during the long-running phases (genome load, suffix-array build, per-chunk alignment) so events can be written as they happen, not at the end. Log.progress.out needs a periodic writer separate from the main alignment loop. Both need STAR-format parameter dumps and warning emission paths.
Not a one-PR drive-by; deferred until someone commits to the content fidelity. Stubs are not the goal — see the rejected approach in the conversation on PR #44.
Severity
Low. Today nf-core/rnaseq works around this with optional: true outputs. Affects provenance / QC tooling that parses STAR's verbose log.
Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). Split out from #28.
Summary
rustar doesn't write
Log.out(verbose run log) orLog.progress.out(per-chunk progress timestamps) — STAR writes both alongsideLog.final.out. Consumers that parse these files for parameter dumps, per-phase progress, warnings, memory usage, or chunk-level mapping rates get nothing.The goal is real content parity — not stubs that mimic STAR's section-header structure with placeholder content. Files that look like STAR's verbose log but carry only a
{:#?}Debug dump of params and three timestamps mislead consumers worse than the files being absent (they pass file-existence checks but fail every actual parse).STAR reference behaviour
Log.outis the verbose run log, written incrementally bysource/InOutStreams.cppplus parameter-dump and per-phase update calls scattered acrosssource/Parameters.cpp,source/Aligner.cpp, andsource/sjdbInsertJunctions.cpp. Content:name<TAB>valueline per parameter)...... loading genome,..... started mapping,..... finished mapping, etc.) with timestamps.WARNING --X ...) and informational notes emitted during run.Log.progress.outis updated periodically (roughly every minute) during alignment, one line per chunk reporting reads processed and mapping speed.Reproducer
Observed: STAR writes
STAR.Log.final.out,STAR.Log.out,STAR.Log.progress.out. rustar writes onlyRUS./Log.final.out.Suggested approach
This is structural —
Log.outneeds progress hooks during the long-running phases (genome load, suffix-array build, per-chunk alignment) so events can be written as they happen, not at the end.Log.progress.outneeds a periodic writer separate from the main alignment loop. Both need STAR-format parameter dumps and warning emission paths.Not a one-PR drive-by; deferred until someone commits to the content fidelity. Stubs are not the goal — see the rejected approach in the conversation on PR #44.
Severity
Low. Today nf-core/rnaseq works around this with
optional: trueoutputs. Affects provenance / QC tooling that parses STAR's verbose log.Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). Split out from #28.