Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a global parameters list #1251

Open
christopher-hakkaart opened this issue Sep 23, 2022 · 3 comments
Open

Create a global parameters list #1251

christopher-hakkaart opened this issue Sep 23, 2022 · 3 comments
Labels
enhancement New feature or request infrastructure

Comments

@christopher-hakkaart
Copy link
Member

christopher-hakkaart commented Sep 23, 2022

Create a global parameters list

Terminology between pipelines and shared assets can differ. To help preserve shared content and familiarity between pipelines, subworkflows and modules, it would be beneficial to create a reserved ontology. For example, parameter names such as --bwa_index and --bwa should be reserved.

A reserved ontology list needs to be created. There might be example from elsewhere we could use to start. We could also scrape all JSON schema files and build a big list (link to it in the writing pipelines tutorial). Final product could be a list with clear descriptions that can be used by developers to guide naming conventions.

@christopher-hakkaart christopher-hakkaart added the enhancement New feature or request label Sep 23, 2022
@christopher-hakkaart
Copy link
Member Author

A secondary objective will be to reserve global samplesheet headers

@awgymer
Copy link

awgymer commented Mar 27, 2023

I did a really quick and dirty scrape of the schema.json from the pipelines listed as released on the website. 3 of those did not appear to have a schema.json in master and got skipped: mnaseseq, imcyto, slamseq. This leaves 44 pipelines.

Here are all the params which appeared in more than one pipeline:

outdir	44
email	44
custom_config_version	44
custom_config_base	44
config_profile_description	44
config_profile_contact	44
config_profile_url	44
max_cpus	44
max_memory	44
max_time	44
help	44
email_on_fail	44
plaintext_email	44
monochrome_logs	44
tracedir	43
input	42
publish_dir_mode	42
max_multiqc_email_size	40
validate_params	40
show_hidden_params	40
config_profile_name	39
multiqc_config	38
multiqc_title	33
hook_url	29
igenomes_ignore	28
igenomes_base	27
genome	25
version	24
multiqc_logo	24
multiqc_methods_description	24
fasta	23
skip_multiqc	17
save_reference	15
aligner	14
gtf	13
enable_conda	13
hostnames	12
skip_fastqc	12
clip_r1	10
three_prime_clip_r1	10
clip_r2	9
three_prime_clip_r2	9
save_trimmed	9
gff	8
skip_trimming	8
trim_nextseq	7
star_index	7
seq_center	7
skip_qc	7
protocol	6
save_unaligned	6
save_align_intermeds	6
gene_bed	6
bwa_index	6
skip_preseq	5
single_end	5
enzyme	4
trim_fastq	4
star_ignore_sjdbgtf	4
read_length	4
skip_alignment	4
save_merged_fastq	4
blacklist	4
skip_igv	4
skip_peak_qc	4
macs_gsize	4
name	4
database	3
decoy_method	3
precursor_mass_tolerance	3
fragment_mass_tolerance	3
fixed_mods	3
variable_mods	3
min_peptide_length	3
max_peptide_length	3
num_hits	3
subset_max_train	3
klammer	3
description_correct_features	3
quantification_method	3
contrasts	3
singularity_pull_docker_container	3
bowtie2_index	3
save_trimmed_fail	3
skip_cutadapt	3
skip_markduplicates	3
skip_picard_metrics	3
seq_platform	3
keep_dups	3
deseq2_vst	3
skip_deseq2_qc	3
skip_peak_annotation	3
skip_plot_profile	3
root_folder	2
local_input_type	2
add_decoys	2
openms_peakpicking	2
peakpicking_inmemory	2
peakpicking_ms_levels	2
search_engines	2
num_enzyme_termini	2
allowed_missed_cleavages	2
precursor_mass_tolerance_unit	2
fragment_mass_tolerance_unit	2
fragment_method	2
isotope_error_range	2
instrument	2
min_precursor_charge	2
max_precursor_charge	2
max_mods	2
db_debug	2
enable_mod_localization	2
mod_localization	2
luciphor_neutral_losses	2
luciphor_decoy_mass	2
luciphor_decoy_neutral_losses	2
luciphor_debug	2
IL_equivalent	2
posterior_probabilities	2
pp_debug	2
FDR_level	2
train_FDR	2
test_FDR	2
outlier_handling	2
consensusid_algorithm	2
consensusid_considered_top_hits	2
min_consensus_support	2
protein_level_fdr_cutoff	2
protein_quant	2
mass_recalibration	2
transfer_ids	2
targeted_only	2
skip_post_msstats	2
ref_condition	2
enable_qc	2
ptxqc_report_layout	2
skip_pycoqc	2
skip_nanoplot	2
kraken2_db	2
skip_kraken2	2
skip_fastp	2
variant_caller	2
min_mapped_reads	2
mode	2
adapter_fasta	2
save_databases	2
transcript_fasta	2
salmon_index	2
tools	2
trim	2
fai	2
malt_mode	2
stranded	2
skip_quantification	2
skip_bigwig	2
peakcaller	2
annotation_tool	2
with_umi	2
umitools_dedup_stats	2
dragmap	2
skip_tools	2
split_fastq	2
no_intervals	2
snpeff_cache	2
vep_cache	2
dbsnp	2
dbsnp_tbi	2
dict	2
fasta_fai	2
known_indels	2
known_indels_tbi	2
mappability	2
snpeff_db	2
vep_genome	2
vep_species	2
vep_cache_version	2
remove_ribo_rna	2
ribo_database_manifest	2
save_non_ribo_reads	2
bam_csi_index	2
skip_qualimap	2
fasta_index	2
skip_deduplication	2
skip_decoy_generation	2
fragment_size	2
chromap_index	2
keep_multi_map	2
bwa_min_score	2
bamtools_filter_pe_config	2
bamtools_filter_se_config	2
narrow_peak	2
broad_cutoff	2
macs_fdr	2
macs_pvalue	2
min_reps_consensus	2
save_macs_pileup	2
skip_consensus_peaks	2
skip_plot_fingerprint	2
fingerprint_bins	2
krakendb	2
bowtie_index	2
ncrna	2

There are over 2000 params that appear in only a single pipeline and I am not sure how many of those might be similarly named but not identical and should perhaps be standardised?

@maxulysse
Copy link
Member

On top of that, I'd like a global meta.map fields

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request infrastructure
Projects
Status: No status
Development

No branches or pull requests

4 participants