Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
543 commits
Select commit Hold shift + click to select a range
134bb45
fix malformed nextflow schema
OlivierCoen Nov 8, 2025
6445bfd
pass all module tests
OlivierCoen Nov 8, 2025
f5d82d1
remove tag "module" in module tests
OlivierCoen Nov 8, 2025
e02ca8d
add missing module tests
OlivierCoen Nov 8, 2025
80f543f
allow tsv file as input dataset; modify id mapping script to allow that
OlivierCoen Nov 9, 2025
91b9155
allow tsv files in normalisation (deseq2 and edger)
OlivierCoen Nov 9, 2025
64bcdf8
update documentation
OlivierCoen Nov 9, 2025
433dfea
change species parameter to allow more complex species names; add spe…
OlivierCoen Nov 10, 2025
c9b9dd4
update README.md
OlivierCoen Nov 10, 2025
7ddbf89
fix .github/workflows/release-announcements.yml with latest code
OlivierCoen Nov 10, 2025
7bfb57c
update get_geo_dataset_accessions.py to improve output and allow rnas…
OlivierCoen Nov 10, 2025
efbaad3
clean code; remove superseries from geo accessions
OlivierCoen Nov 11, 2025
3b09364
allow geo datasets containing multiple datasets
OlivierCoen Nov 11, 2025
c6bbfad
change base.config times
OlivierCoen Nov 11, 2025
3821964
add troubleshooting doc
OlivierCoen Nov 11, 2025
90d5b87
pass pipeline nf-tests
OlivierCoen Nov 13, 2025
2c07114
allow and improve downloading of RNAseq data from GEO
OlivierCoen Nov 14, 2025
660214c
force integer count values after mapping to ensembl gene IDs
OlivierCoen Nov 16, 2025
404f85a
functional download and parsing of rnaseq data from GEO
OlivierCoen Nov 16, 2025
5aa1acc
set ks threshold pvalue as negative by default
OlivierCoen Nov 16, 2025
459d726
fix issue with deseq2 and edger in case of only one sample; improve l…
OlivierCoen Nov 16, 2025
2e380b1
fix merge_counts environment
OlivierCoen Nov 16, 2025
1976981
fix issue with file count file parsing when first column name is empty
OlivierCoen Nov 16, 2025
81f52a4
fix synthax issue
OlivierCoen Nov 17, 2025
11bb0ba
update doc
OlivierCoen Nov 17, 2025
39c3903
fix issue with data aggregation when gene id mapping or metadata are …
OlivierCoen Nov 18, 2025
7b59db3
removed subsampling in quantile normalisations to allow full reproduc…
OlivierCoen Nov 18, 2025
997b493
fix issue with design schema not taken into account
OlivierCoen Nov 19, 2025
d2f8b58
replace lazyframes by dataframes in clean_count_data.py
OlivierCoen Nov 19, 2025
cccac73
fix most test cases
OlivierCoen Nov 19, 2025
56ca964
check the number of datasets downloaded from Expression Atlas and pre…
OlivierCoen Nov 19, 2025
05abc9f
prevent download of geo suppl data when multiple species are present …
OlivierCoen Nov 20, 2025
2c1b389
separate the counts obtained in different supplementary file columns …
OlivierCoen Nov 20, 2025
857e85a
fix duplicated column names
OlivierCoen Nov 20, 2025
f7cc943
spearate idmapping step in 3 substeps to make it faster and more scal…
OlivierCoen Nov 20, 2025
553f3a9
change error / retry strategy for all processes
OlivierCoen Nov 20, 2025
09f49a9
add new tests for idmapping processes
OlivierCoen Nov 20, 2025
e74b4d2
fix get_candidate_genes.py when ids are Entrez gene ids
OlivierCoen Nov 20, 2025
7293377
remove data cleansing subworkflow from workflow
OlivierCoen Nov 20, 2025
e78dce6
fix issue in aggregate_results.py when gene ids are integers
OlivierCoen Nov 20, 2025
bf29940
fix issues with id mapping reformating
OlivierCoen Nov 20, 2025
2fcdb5a
remove E-GTEX-* Expression Atlas accesssions by default
OlivierCoen Nov 21, 2025
d0d97f7
fix issues with collection of gene ids
OlivierCoen Nov 21, 2025
919b8e1
fix bug in collect_gene_ids.py when ids are integer
OlivierCoen Nov 22, 2025
a94ffcd
remove deseq2 and edger from pipeline; add new steps for computation …
OlivierCoen Nov 22, 2025
1316c54
remove unecessary tag (1) from global processes
OlivierCoen Nov 22, 2025
5aa5945
script to download annotation gff3 file given a species name
OlivierCoen Nov 22, 2025
bb60780
increase to process_high all modules that handle all data
OlivierCoen Nov 22, 2025
803227e
increase dragstically number of retries for OOM related erros
OlivierCoen Nov 23, 2025
6eb7719
make script to compute max cdna length gene per gene from GGF3 file
OlivierCoen Nov 23, 2025
e8a5916
add computation of tpm in workflow and set it as default
OlivierCoen Nov 23, 2025
c1f90eb
replace quantile normalisation by linear scaling to [0,1] in stabilit…
OlivierCoen Nov 23, 2025
ea87986
make script and module to download latest annotation from ncbi
OlivierCoen Nov 24, 2025
1ae9906
big reformating of the steps for fetching accessions and download dat…
OlivierCoen Nov 24, 2025
11d36d4
replace request to NCBI taxonomy to get species taxid in download_lat…
OlivierCoen Nov 24, 2025
583af6a
improved logging of warning and failures in multiqc
OlivierCoen Nov 24, 2025
c80eba6
improve speed of rename_gene_ids.py
OlivierCoen Nov 24, 2025
84c9417
control nb of cpus in eatlas get accessions
OlivierCoen Nov 25, 2025
9a97eee
reduce and better handle base config for cpus and memory to adapt it …
OlivierCoen Nov 25, 2025
d4e9861
fix bug in eatlas get accessions
OlivierCoen Nov 25, 2025
4729cb5
fix issue with input datasets not being used anymore
OlivierCoen Nov 25, 2025
95a894e
add more checks for input accession format
OlivierCoen Nov 25, 2025
b34612a
improve scalability
OlivierCoen Nov 25, 2025
7db7908
add process to clean gene IDs before collecting them
OlivierCoen Nov 26, 2025
4fdde27
improve gene ID logging and QC
OlivierCoen Nov 26, 2025
c5982fc
add QC statistics (ratio zeros and skewness) in multiqc
OlivierCoen Nov 26, 2025
d0ab387
add error handling when no gene ID mapping is possible
OlivierCoen Nov 26, 2025
c9d996c
fix bug in download_geo_data.R when multiple suppl columns are found
OlivierCoen Nov 26, 2025
5b3b350
handle case where there is no library_strategy column in geo metadata…
OlivierCoen Nov 27, 2025
7f010f0
major refactoring of conda/micromamba environments; update some modul…
OlivierCoen Nov 27, 2025
660f52a
add script to normalise CEL files from microarray data
OlivierCoen Nov 27, 2025
a853af3
limit number of polars threads when not using containers
OlivierCoen Dec 3, 2025
d8b5827
fix tests
OlivierCoen Dec 4, 2025
8f4fe1b
replace "Channel" by "channel" to adapt to new synthax
OlivierCoen Dec 4, 2025
3b91c74
Template update for nf-core/tools version 3.5.1
OlivierCoen Dec 8, 2025
2d4581b
merge branch TEMPLATE into dev
OlivierCoen Dec 8, 2025
7007b60
pass linters
OlivierCoen Dec 8, 2025
7b352f7
update citations
OlivierCoen Dec 8, 2025
e8ee9ee
remove profile apptainer from nf-test config in order to fix CI tests
OlivierCoen Dec 9, 2025
2a7de4d
pass prettier and ruff linters
OlivierCoen Dec 9, 2025
a1af98c
add files to skip for files_unchanged during nf-core pipelines lint
OlivierCoen Dec 9, 2025
56899e7
change base config process_high modules
OlivierCoen Dec 9, 2025
82ebd98
add random sampling based on nb of samples direclty in Expression Atl…
OlivierCoen Dec 10, 2025
bca5fc7
reinstalled appropriate version of multiqc
OlivierCoen Dec 10, 2025
7363503
propadata random sampling to geo when not enough samples were collect…
OlivierCoen Dec 10, 2025
b8eda8e
set sampling quota in geo
OlivierCoen Dec 10, 2025
d5a4a8b
set fetching geo accessions as optional
OlivierCoen Dec 11, 2025
413d313
update method description
OlivierCoen Dec 11, 2025
5757d92
fix issue with gene id mapping stats in multiqc report
OlivierCoen Dec 11, 2025
34587ce
remove set statements to prepare for strict synthax
OlivierCoen Dec 11, 2025
d3bd377
rename top stable genes by most stable genes
OlivierCoen Dec 11, 2025
ce578b3
remove access to params in merge_data and multiqc subworkflows
OlivierCoen Dec 11, 2025
026a840
add default hard limits for resources in base.config
OlivierCoen Dec 12, 2025
079be55
fix some tests
OlivierCoen Dec 15, 2025
d9b58e7
improve doc
OlivierCoen Dec 15, 2025
e30e736
fix test failures
OlivierCoen Dec 16, 2025
7814723
set default random_sampling_size at 5000 to fit with homo sapiens on …
OlivierCoen Dec 16, 2025
d7c9b09
improve doc and pass linters
OlivierCoen Dec 18, 2025
63678df
add new metromap
OlivierCoen Dec 18, 2025
9677dd9
remove geo datasets from pipeline test snapshots
OlivierCoen Dec 18, 2025
09666bc
add possibility to provide custom gene length file instead of
OlivierCoen Dec 22, 2025
4d2d9a9
update doc
OlivierCoen Dec 22, 2025
909f984
improve reproducibility of nf-tests
OlivierCoen Dec 23, 2025
ddcb9e6
add pyarrow in dash_app environmen.yml
OlivierCoen Dec 30, 2025
86faace
add subworkflow to filter out samplesthat are not valid
OlivierCoen Dec 30, 2025
44d03dd
fix issue when removing samples not valid
OlivierCoen Dec 30, 2025
047fbbc
pipeline lint
OlivierCoen Jan 3, 2026
0116d68
add nb of merged gene IDs in multiqc id mapping stat graph
OlivierCoen Jan 3, 2026
c96b9b7
big refactoring to filter out rare genes
OlivierCoen Jan 3, 2026
4dd05e2
implement filtering on rare genes; add to multiqc
OlivierCoen Jan 4, 2026
f8b10f3
fix issue when cleaning ENSG ids
OlivierCoen Jan 4, 2026
fae0fdb
increase default value of min_occurrence_quantile to 0.2
OlivierCoen Jan 4, 2026
50a0c1c
replace pandas by polars in remove_samples_not_valid.py
OlivierCoen Jan 5, 2026
2106da3
add log2 normalisation when computing tpm and cpm
OlivierCoen Jan 5, 2026
6e618d7
replace pandas by polars in quantile normalisation
OlivierCoen Jan 5, 2026
98d707a
change pandas to polars in compute_dataset_statistics.py
OlivierCoen Jan 5, 2026
5122121
big refactoring in order to better scale
OlivierCoen Jan 5, 2026
50472fe
pass nf-tests
OlivierCoen Jan 7, 2026
5ed7dc2
Merge pull request #11 from OlivierCoen/dev
OlivierCoen Jan 21, 2026
b36582c
fix issue when there there is no mapping at all
OlivierCoen Jan 21, 2026
e81b8e3
fix unused line in expression atlas getdata module
OlivierCoen Jan 27, 2026
35443e7
move computation of nb null values per sample in a separate process
OlivierCoen Jan 27, 2026
12cdb22
separation extraction of gene ids from the cleaning step
OlivierCoen Jan 28, 2026
09df841
add parameter to skip gene id cleaning
OlivierCoen Jan 28, 2026
72db6bb
place dataset statistics and detection of null values in a single sub…
OlivierCoen Jan 28, 2026
9802111
implement first version of missing value imputer
OlivierCoen Jan 29, 2026
e2bb714
improve iterative imputer
OlivierCoen Jan 29, 2026
e4f8e2f
fix issue of NA not recognised when parsing count tables
OlivierCoen Jan 29, 2026
c8990b8
rename subworkflow for filtering out samples and add module to filter…
OlivierCoen Jan 30, 2026
cd75be1
refactor computation of zero values and missing values
OlivierCoen Jan 30, 2026
77ae3c4
add zero and null value qc to multiqc
OlivierCoen Jan 30, 2026
322c9ad
rename gene statistics and merge mdules and subworkflows for better c…
OlivierCoen Jan 30, 2026
d76dee5
fix issue in computation of ratio of null values
OlivierCoen Jan 30, 2026
a38e3f8
fix issues with malus on ratio of null values
OlivierCoen Jan 30, 2026
5e60547
change default scoring weights to only take into account normfinder a…
OlivierCoen Jan 30, 2026
4eb6678
divide genes in multiple sections based on expression levels and perf…
OlivierCoen Feb 1, 2026
cf56723
integrate the different expression sections in multiqc
OlivierCoen Feb 1, 2026
5206cf7
fix section order in the multiqc custom content
OlivierCoen Feb 1, 2026
65bebd4
update multiqc
OlivierCoen Feb 1, 2026
5d36903
gather multiple steps into the reporting subworkflow
OlivierCoen Feb 1, 2026
acc09db
fix issue with candidate selection
OlivierCoen Feb 1, 2026
1b65a09
fix linting
OlivierCoen Feb 1, 2026
68915f6
fix issue with resource allocation in impute missing values
OlivierCoen Feb 1, 2026
d095d9f
limit both polars and python threads manually for most modules
OlivierCoen Feb 1, 2026
9a8a807
reduce number of displayed box boxes in multiqc
OlivierCoen Feb 2, 2026
4ac48d3
fix issue with null values not computed correctly
OlivierCoen Feb 3, 2026
3dc8d6b
control resource allocation for all python scripts using polars or mu…
OlivierCoen Feb 3, 2026
10a6630
add parameter for target gene selection
OlivierCoen Feb 4, 2026
081f3dc
export section specific stat file in get_candidate_gens.py
OlivierCoen Feb 4, 2026
3ddf532
fix issues in compute_stability_scores.py
OlivierCoen Feb 4, 2026
f2119cf
use section stat files for stability scoring
OlivierCoen Feb 5, 2026
0cd9789
add found target genes in multiqc report
OlivierCoen Feb 5, 2026
4c89e9d
add additional memory to multiqc
OlivierCoen Feb 5, 2026
43a8171
fix issue with flatMap not producing a constant number of items in sp…
OlivierCoen Feb 5, 2026
03cf0ee
replace hardcoded max memory by task's memory in quantile_normalise.py
OlivierCoen Feb 5, 2026
3017fbe
set quantile normalise module as process_low instead of process_single
OlivierCoen Feb 5, 2026
038c2f8
tweak resource management parameters for cpus
OlivierCoen Feb 5, 2026
8ec014a
limit polars and multiprocessing max nb of threads in modules, and me…
OlivierCoen Feb 5, 2026
0bdd3d0
remove cpus and memory limits inside modules
OlivierCoen Feb 5, 2026
5dc698e
fix inconsistent behaviour when splitting files into sections
OlivierCoen Feb 5, 2026
e6cf1cb
fix issue when no gene metadata is available
OlivierCoen Feb 5, 2026
150ee69
fix issue with nb of cpus being parsed as string instead of integer
OlivierCoen Feb 18, 2026
7ab8a09
update polars version in merge_counts
OlivierCoen Feb 18, 2026
25d2997
update merge_counts.py to use lazyframes instead of dataframes
OlivierCoen Feb 18, 2026
07513df
update compute_gene_statistics.py to use lazyframes instead of datafr…
OlivierCoen Feb 18, 2026
7f4cbb4
add parameter gff to provide directly genome annotation
OlivierCoen Feb 18, 2026
3c435c1
reduce number of box plots in each multiqc category to 25
OlivierCoen Feb 18, 2026
31ae1ad
replace requests by httpx in download_latest_ensembl_annotation.py
OlivierCoen Feb 18, 2026
a33f6bc
replace requests by httpx in python scripts
OlivierCoen Feb 19, 2026
fe86ec0
fix nf-tests
OlivierCoen Feb 19, 2026
0e3c88f
update galaxy test script
OlivierCoen Feb 19, 2026
01e708b
update galaxy tool
OlivierCoen Feb 20, 2026
237a0a9
fix remaining failures in nf-tests
OlivierCoen Feb 21, 2026
29123e4
update pipeline description and nf-core version
OlivierCoen Feb 21, 2026
aa7fbc8
update metromap and integrate drawio file in repo
OlivierCoen Feb 21, 2026
8de6920
fix wrong environment for aggregate results
OlivierCoen Feb 21, 2026
c6ace7a
update Galaxy tool
OlivierCoen Feb 22, 2026
9767606
update documentation
OlivierCoen Feb 22, 2026
48b7e89
update metromap
OlivierCoen Feb 22, 2026
9f2818d
update usage.md
OlivierCoen Feb 22, 2026
3bfaf65
remove unecessary merging of gene id mapping and metadata from the me…
OlivierCoen Feb 23, 2026
d328187
move pipeline overview to main README and update output.md
OlivierCoen Feb 23, 2026
3427017
fix bugs and add default graphs for dash app
OlivierCoen Feb 23, 2026
15c2393
lint pipeline
OlivierCoen Feb 23, 2026
44e53d8
update test snapshots
OlivierCoen Feb 25, 2026
6f69746
fix ruff checks
OlivierCoen Feb 25, 2026
ce06a38
remove geo data tests
OlivierCoen Feb 25, 2026
c33e663
Merge pull request #13 from OlivierCoen/dev
OlivierCoen Mar 11, 2026
9eff646
bump version
OlivierCoen Mar 14, 2026
8d0f90b
include custom configs & pass lint test for release
OlivierCoen Mar 14, 2026
8b32b51
add people who assisted the dvelopment of the pipeline in the credits
OlivierCoen Mar 15, 2026
1660e1a
replace pipeline version in tests/default.nt.test.snap
OlivierCoen Mar 15, 2026
50d1ac6
Merge pull request #14 from OlivierCoen/dev
OlivierCoen Mar 17, 2026
d0311f2
replace species in test_full
OlivierCoen Mar 17, 2026
16565c2
Merge pull request #15 from OlivierCoen/dev
OlivierCoen Mar 17, 2026
137dd91
update CHANGELOG.md for official release
OlivierCoen Mar 18, 2026
367bc85
Merge pull request #16 from OlivierCoen/dev
OlivierCoen Mar 19, 2026
f044b14
delete old folders
OlivierCoen Mar 19, 2026
7a739ab
updated all environments (docker + apptainer + conda) of modules usin…
OlivierCoen Mar 19, 2026
f02b605
insist on the experimental state of fetching geo datasets in README.md
OlivierCoen Mar 20, 2026
2adc3b4
sort dataframe for output consistency
OlivierCoen Mar 20, 2026
817f09c
sort list of files given to multiqc
OlivierCoen Mar 20, 2026
7d15085
add wget to geo getdata environment
OlivierCoen Mar 20, 2026
9e67438
fix issue not caught in geo get_data
OlivierCoen Mar 20, 2026
a58b2ef
fix bug introduced in detect_rare_genes.py
OlivierCoen Mar 21, 2026
1ade9db
fix bugs in download_geo_data.R
OlivierCoen Mar 22, 2026
b552b90
prevent modules from publising failure and warning reason files
OlivierCoen Mar 22, 2026
510992d
update tests
OlivierCoen Mar 22, 2026
cbc28b4
pass linter
OlivierCoen Mar 22, 2026
e9fd713
Merge branch 'nf-core:dev' into dev
OlivierCoen Mar 22, 2026
f3c557d
better handle connection issues in expression atlas getdata
OlivierCoen Mar 22, 2026
89fdc04
update broken links in documentation
OlivierCoen Mar 22, 2026
1564c47
update snapshots
OlivierCoen Mar 23, 2026
acbc1b1
update min version of Nextflow
OlivierCoen Mar 23, 2026
68898d7
add contributors
OlivierCoen Mar 23, 2026
32d543d
update min Nextflow version to 25.10.4
OlivierCoen Mar 23, 2026
18d41b8
act possibility to run nf-test through act
OlivierCoen Mar 24, 2026
35a1081
add pipeline test command to README.md
OlivierCoen Mar 25, 2026
960571d
Update default.nf.test.snap
OlivierCoen Mar 25, 2026
5cfd8bc
add act tests
OlivierCoen Mar 29, 2026
b989a35
fix tests
OlivierCoen Mar 29, 2026
95dd20e
authorize failure and ignore error for expression get data and geo de…
OlivierCoen Mar 30, 2026
d94cb49
fix tests
OlivierCoen Apr 2, 2026
7f81d56
improve testing with act and add README.md
OlivierCoen Apr 2, 2026
68627a8
update checkCounts function
OlivierCoen Apr 2, 2026
237ec2c
fix weird insertions in snapshots
OlivierCoen Apr 2, 2026
b79fde2
add skewness.csv to .nftignore
OlivierCoen Apr 2, 2026
12d33f8
update snapshots
OlivierCoen Apr 3, 2026
f7c4a50
fix reproductibility issue
OlivierCoen Apr 4, 2026
abcda5e
fix skewness issue
OlivierCoen Apr 4, 2026
d876bbd
fix issue with act
OlivierCoen Apr 4, 2026
25e7791
remove a pipeline test
OlivierCoen Apr 4, 2026
5919420
Update nf-test.yml
OlivierCoen Apr 4, 2026
772ca82
update snapshots
OlivierCoen Apr 4, 2026
3a3c759
fix prettier error
OlivierCoen Apr 5, 2026
a946490
update multiqc
OlivierCoen Apr 5, 2026
21489b2
adapt pipeline to new multiqc input syntax
OlivierCoen Apr 5, 2026
d315b9e
remove tests involving geo get data
OlivierCoen Apr 6, 2026
0f2b6af
improve testing with act
OlivierCoen Apr 7, 2026
734cf2e
Merge pull request #18 from OlivierCoen/dev
Ositofeliz Apr 7, 2026
91ea56d
remove tests for CI to try and avoid "No space left on disk" error
OlivierCoen Apr 8, 2026
2fdaf4a
Merge pull request #19 from OlivierCoen/dev
Ositofeliz Apr 8, 2026
1cb2b53
re-introduce tests commented to avoid "no disk space left" errros
OlivierCoen Apr 8, 2026
4999638
increase max shards from 7 to 10 for nf-tests
OlivierCoen Apr 8, 2026
3fda111
Merge pull request #20 from OlivierCoen/dev
OlivierCoen Apr 8, 2026
f4af14a
fix minor issues to address comments of reviewer 1 for first release
OlivierCoen Apr 16, 2026
39c1c9f
Update get_eatlas_accessions.py
OlivierCoen Apr 16, 2026
83bf375
replace beta vulgaris by prunus persica in main test and update test …
OlivierCoen Apr 16, 2026
6e30dd4
fix test snapshot
OlivierCoen Apr 17, 2026
3c5669a
Merge pull request #22 from OlivierCoen/dev
OlivierCoen Apr 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/actions/nf-test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ runs:
--changed-since HEAD^ \
--verbose \
--tap=test.tap \
--shard ${{ inputs.shard }}/${{ inputs.total_shards }}
--shard ${{ inputs.shard }}/${{ inputs.total_shards }} --debug

# Save the absolute path of the test.tap file to the output
echo "tap_file_path=$(realpath test.tap)" >> $GITHUB_OUTPUT
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nf-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ jobs:
env:
NFT_VER: ${{ env.NFT_VER }}
with:
max_shards: 7
max_shards: 10

- name: debug
run: |
Expand Down
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,14 @@ testing/
testing*
*.pyc
null/
.nf-test*
.idea/
.vscode/
taggers/
tokenizers/
corpora/
.github/act.custom_runner.Dockerfile
.ruff_cache
galaxy/test_output/
TODO
test/
14 changes: 10 additions & 4 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,22 @@ lint:
- conf/igenomes_ignored.config
files_unchanged:
- assets/nf-core-stableexpression_logo_light.png
- docs/images/nf-core-stableexpression_logo_light.png
- docs/images/nf-core-stableexpression_logo_dark.png
- .github/PULL_REQUEST_TEMPLATE.md
nextflow_config:
- params.input
template_strings:
- tests/test_data/genorm/compute_m_measure/input/std.0.0.parquet
- tests/test_data/genorm/compute_m_measure/input/std.1.2.parquet
- tests/test_data/genorm/compute_m_measure/input/std.1.2.parquet
schema_lint: false
nf_core_version: 3.5.1

nf_core_version: 3.5.2
repository_type: pipeline
template:
author: Olivier Coen
description: This pipeline is dedicated to finding the most stable genes across
count datasets
description: This pipeline is dedicated to identifying the most stable genes within a single or multiple expression dataset(s). This is particularly useful for identifying the most suitable RT-qPCR reference genes for a specific species.
force: false
is_nfcore: true
name: stableexpression
Expand All @@ -24,4 +30,4 @@ template:
skip_features:
- igenomes
- fastqc
version: 1.0dev
version: 1.0.0
16 changes: 15 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
repos:
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.1.0"
rev: "v4.0.0-alpha.8"
hooks:
- id: prettier
additional_dependencies:
- prettier@3.6.2
exclude: galaxy/

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
Expand All @@ -25,3 +27,15 @@ repos:
subworkflows/nf-core/.*|
.*\.snap$
)$

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.14.1
hooks:
# Run the linter.
- id: ruff
files: \.py$
args: [--fix]
exclude: bin/old/
# Run the formatter.
- id: ruff-format
files: \.py$
3 changes: 3 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,6 @@ bin/
ro-crate-metadata.json
modules/nf-core/
subworkflows/nf-core/
galaxy/
docs/
tests/act
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,13 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.0dev - [date]
## v1.0.0 - 18/03/2026

Initial release of nf-core/stableexpression, created with the [nf-core](https://nf-co.re/) template.
First complete, official release of nf-core/stableexpression.

## v1.0dev - 26/01/2025

Initial pre-release of nf-core/stableexpression, created with the [nf-core](https://nf-co.re/) template.

### `Added`

Expand Down
20 changes: 20 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,26 @@

## Pipeline tools

- [EBI Expression Atlas](https://www.ebi.ac.uk/gxa/home)

> Papatheodorou I, Fonseca NA, Keays M, Tang YA, Barrera E, Bazant W, Burke M, Füllgrabe A, Muñoz-Pomer Fuentes A, George N, Huerta L, Koskinen S, Mohammed S, Geniza M, Preece J, Jaiswal P, Jarnuczak AF, Huber W, Stegle O, Vizcaino JA, Brazma A, Petryszak R. Expression Atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res. 2017 Nov 20;46(Database issue):D246–D251. doi: 10.1093/nar/gkx1158. PubMed PMID: 29165655.

- [NCBI GEO](https://www.ncbi.nlm.nih.gov/geo/)

> Ron Edgar, Michael Domrachev & Alex E Lash. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002 Jan 1;30(1):207-10. doi: 10.1093/nar/30.1.207. PubMed PMID: 11752295.

- [g:Profiler](https://biit.cs.ut.ee/gprofiler/gost)

> Reimand J, Kull M, Peterson H, Hansen J, Vilo J. g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res. 2007 May 3;35(Web Server issue):W193–W200. doi:10.1093/nar/gkm226. PubMed PMID: 17478515.

- [Normfinder](https://rdrr.io/github/dhammarstrom/generefer/man/normfinder.html)

> Claus Lindbjerg Andersen, Jens Ledet Jensen, Torben Falck Ørntoft. Normalization of Real-Time Quantitative Reverse Transcription-PCR Data: A Model-Based Variance Estimation Approach to Identify Genes Suited for Normalization, Applied to Bladder and Colon Cancer Data Sets. Cancer Res (2004) 64 (15): 5245–5250. doi:10.1158/0008-5472.CAN-04-0496. PubMed PMID: 15289330.

- [GeNorm](https://pypi.org/project/rna-genorm/)

> Jo Vandesompele, Katleen De Preter, Filip Pattyn, Bruce Poppe, Nadine Van Roy, Anne De Paepe, Frank Speleman. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002 Jun 18;3(7):RESEARCH0034. doi: 10.1186/gb-2002-3-7-research0034 Pubmed PMID: 12184808.

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down
141 changes: 108 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)
[![nf-core template version](https://img.shields.io/badge/nf--core_template-3.5.1-green?style=flat&logo=nfcore&logoColor=white&color=%2324B064&link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/3.5.1)
[![run with apptainer](https://custom-icon-badges.demolab.com/badge/run%20with-apptainer-4545?logo=apptainer&color=teal&labelColor=000000)](https://apptainer.org/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
Expand All @@ -21,68 +22,144 @@

## Introduction

**nf-core/stableexpression** is a bioinformatics pipeline that ...
**nf-core/stableexpression** is a bioinformatics pipeline aiming to aggregate multiple count datasets for a specific species and find the most stable genes. The datasets can be either downloaded from public databases (EBI, NCBI) or provided directly by the user. Both RNA-seq and Microarray count datasets can be utilised.

<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
major pipeline sections and the types of output it produces. You're giving an overview to someone new
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
-->
<p align="center">
<img title="Stableexpression Workflow" src="docs/images/nf_core_stableexpression.metromap.png" width=100%>
</p>

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
It takes as main inputs :

## Usage
- a species name (mandatory)
- keywords for Expression Atlas / GEO search (optional)
- a CSV input file listing your own raw / normalised count datasets (optional).

**Use cases**:

- **find the most suitable genes as RT-qPCR reference genes for a specific species (and optionally specific conditions)**
- download all Expression Atlas and / or NCBI GEO datasets for a species (and optionally keywords)

## Pipeline overview

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

#### 1. Get accessions from public databases

- Get [Expression Atlas](https://www.ebi.ac.uk/gxa/home) dataset accessions corresponding to the provided species (and optionally keywords)
This step is run by default but is optional. Set `--skip_fetch_eatlas_accessions` to skip it.
- Get NBCI [GEO](https://www.ncbi.nlm.nih.gov/gds) **microarray** dataset accessions corresponding to the provided species (and optionally keywords)
This is optional and **NOT** run by default. Set `--fetch_geo_accessions` to run it.

#### 2. Download data (see [usage](./conf/usage.md#3-provide-your-own-accessions))

- Download [Expression Atlas](https://www.ebi.ac.uk/gxa/home) data if any
- Download NBCI [GEO](https://www.ncbi.nlm.nih.gov/gds) data if any

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
> At this point, datasets downloaded from public databases are merged with datasets provided by the user using the `--datasets` parameter. See [usage](./conf/usage.md#4-use-your-own-expression-datasets) for more information about local datasets.

<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
Explain what rows and columns represent. For instance (please edit as appropriate):
#### 3. ID Mapping (see [usage](./conf/usage.md#5-custom-gene-id-mapping--metadata))

First, prepare a samplesheet with your input data that looks as follows:
- Gene IDs are cleaned
- Map gene IDS to NCBI Entrez Gene IDS (or Ensembl IDs) for standardisation among datasets using [g:Profiler](https://biit.cs.ut.ee/gprofiler/gost) (run by default; optional)
- Rare genes are filtered out

`samplesheet.csv`:
#### 4. Sample filtering

```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
```
Samples that show too high ratios of zeros or missing values are removed from the analysis.

#### 5. Normalisation of expression

- Normalize RNAseq raw data using TPM (necessitates downloading the corresponding genome and computing transcript lengths) or CPM.
- Perform quantile normalisation on each dataset separately using [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.quantile_transform.html)

#### 6. Merge all data

All datasets are merged into one single dataframe.

#### 7. Imputation of missing values

Missing values are replaced by imputed values using a specific algorithm provided by [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.quantile_transform.html). The user can choose the method of imputation with the `--missing_value_imputer` parameter.

#### 8. General statistics for each gene

Base statistics are computed for each gene, platform-wide and for each platform (RNAseq and microarray).

#### 9. Scoring

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
- The whole list of genes is divided in multiple sections, based on their expression level.
- Based on the coefficient of variation, a shortlist of candidates genes is extracted for each section.
- Run optimised, scalable version of [Normfinder](https://www.moma.dk/software/normfinder)
- Run optimised, scalable version of [Genorm](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2002-3-7-research0034) (run by default; optional)
- Compute stability scores for each candidate gene

-->
#### 10. Reporting

Now, you can run the pipeline using:
- Result aggregation
- Make [`MultiQC`](http://multiqc.info/) report
- Prepare [Dash Plotly](https://dash.plotly.com/) app for further investigation of gene / sample counts

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
## Test pipeline

You can test the execution of the pipeline locally with:

```bash
nextflow run nf-core/stableexpression -profile test,<docker/apptainer/conda/micromamba/...>
```

## Basic usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.

To search the most stable genes in a species considering all public datasets, simply run:

```bash
nextflow run nf-core/stableexpression \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
-profile <PROFILE (examples: docker / apptainer / conda / micromamba)> \
--species <SPECIES (examples: arabidopsis_thaliana / "drosophila melanogaster")> \
--outdir <OUTDIR (example: ./results)> \
-resume
```

> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
## More advanced usage

For more specific scenarios, like:

For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/stableexpression/usage) and the [parameter documentation](https://nf-co.re/stableexpression/parameters).
- **fetching only specific conditions**
- **using your own expression dataset(s)**

please refer to the [usage documentation](https://nf-co.re/stableexpression/usage).

## Resource allocation

For setting pipeline CPU / memory usage, see [here](./docs/configuration.md).

## Profiles

See [here](https://nf-co.re/stableexpression/usage#profiles) for more information about profiles.

## Pipeline output

To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/stableexpression/results) tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/stableexpression/output).

## Support us

If you like nf-core/stableexpression, please make sure you give it a star on GitHub!

[![stars - stableexpression](https://img.shields.io/github/stars/nf-core/stableexpression?style=social)](https://github.com/nf-core/stableexpression)

## Credits

nf-core/stableexpression was originally written by Olivier Coen.

We thank the following people for their extensive assistance in the development of this pipeline:
We thank the following people for their assistance in the development of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
- Rémy Costa
- Shaheen Acheche
- Janine Soares

## Contributions and Support

Expand All @@ -95,8 +172,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/stableexpression for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

You can cite the `nf-core` publication as follows:
Expand Down
2 changes: 0 additions & 2 deletions assets/methods_description_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag
section_name: "nf-core/stableexpression Methods Description"
section_href: "https://github.com/nf-core/stableexpression"
plot_type: "html"
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
<p>Data was processed using nf-core/stableexpression v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>), utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
Expand Down
Loading
Loading