-
Notifications
You must be signed in to change notification settings - Fork 1
preparing_configcsv
To run the Triti-Map main program, you need to modify the parameters in the configuration file.
Description of the main parameters of the configuration file
-
email
: Important , you need to provide your personal email when using EMBL-EBI API for related analysis. -
samples
: No modification is needed, the path of the sample information file. The default sample information file issample.csv
in the Triti-Map running directory. -
datatype
: Important , the sequencing type of the sample data, dna: ChIP-seq or WGS; rna: RNA-seq. -
maxthreads
: the maximum number of threads that can be used. Default value:30
. -
ref
: Important , reference genome related file paths, contains 3 sub-parameters.-
genome
: path to the reference genome file, use absolute path. -
annotation
: path to the reference genome gene annotation file, use the absolute path. -
STARdir
: STAR reference genome index directory (required for RNA-seq analysis).
-
-
gatk
: GATK-related analysis parameters, including one sub-parameter.-
min_SNP_DP
: No modification is needed, the minimum depth that needs to be met for each pool of valid SNPs. default is10
.
-
-
snpindex
: Parameters required for BSA localization analysis, including 6 sub-parameters.-
pop_struc
: Important , the population structure of the pool samples. If the data of the pool is F2 generation, fill in F2; if the data of the pool is RIL population or all individuals are homozygous, fill in RIL. -
bulksize
: The number of samples in the pool, e.g., if each pool consists of 30 samples, then fill in30
. -
winsize
: No modification is needed, the length of the sliding window for data correction, default is1000000
(1Mb). -
filter_probs
: Important, the percentage of the original results to filter based on Delta SNPindex and SNPconut/Mb. If the value is 0.75, it means that the average Delta SNPindex and the average number of SNPs per 1Mb of the candidate interval should be both greater than 75% of the corresponding values of all original results. -
fisher_p
: No modification is needed, filter the SNP loci of the trait association interval and calculate the pvalue of each locus using fisher test, default is0.0001
. -
min_length
: No modification is needed, the minimum length of the candidate trait association interval. For large genome species such as wheat, the default length is1000000
(1Mb).
-
-
bulk_specific
: Important, how to define bulk specific scaffold(sequence).-
identical_percentage
: blast percentage of identical matches. Default is 0.85: blast percentage of identical matches need < 85% -
length_percentage
:blast match length / query length. Default is 0.85: blast match length / query length < 85%
-
-
merge_lib
: How to handle multiple sets of different ChIP-seq data of the same pool. The default ismerge
, i.e. samples are merged first and then assembled to get better results; if dealing with large genomic data such as hexaploid wheat and your server memory is less than 300G, you can modify it tosplit
, i.e. each group of data is assembled separately and then merged for subsequent analysis. -
memory
: the maximum memory available when assembling transcriptome sequences using SPAdes,300
means 300G. -
denovo_filter_method
: Important , the pool-specific sequence filtering method, valid when runningonly_assembly
module. Set toexternal_region
means user customize filter interval inregion.csv
file; set toexternal_fasta
means user use own prepared external fasta sequence as filter database, please refer to FAQ. -
filter_region_file
: If you useexternal_region
, you need to fill in the location of the filter interval file here, the default isregion.csv
. -
filter_fasta_file
: If you useexternal_fasta
, you need to fill in the location of the fasta file, e.g./your/path/region.fasta
. -
blast_database
: No modification is needed, the database used for BLAST of assembly sequences.em_cds_pln
means using EBI ENA plant coding sequence database,em_std_pln
means using EBI ENA plant standard sequence database. Default value:em_cds_pln
.