Fonda is a framework that offers a scalable and automatic analysis of multiple NGS sequencing data types.
- Fonda Prebuilt binaries
- Required environment setup
- Build Fonda
- Fonda installation
- Available workflows in Fonda
- Before running Fonda…
- Run Fonda
- Contributors
- Publications
All the binaries, built by the CI process (described in CONTRIBUTING.md) are available via the Download page and the GitHub Release page
- Unix
- Java 8
To launch all unit and integration tests run the command:
./gradlew test
To launch all unit and integration tests, to perform the source code analysis (via PMD
), to check the code adherement to a coding standard (via checkstyle
) and to count the code coverage (via JaCoCo
) run the command:
./gradlew check
To build Fonda run the command:
./gradlew clean build zip
clean
- deletes the Fondabuild
directory for a fresh compilebuild
- creates Fonda.jar
file andsrc
folder inbuild/libs
zip
- packs Fonda.jar
andsrc
folder into a zip file located inbuild/distributions
Note: before building a specific Fonda version, please check the Fonda version in the build.gradle
file is the correct one.
Fonda package contains two components:
- Fonda
.jar
file src
folder
If the src_scripts
option in global config is not set, please make sure src
folder and .jar
file are put in the same parental directory for proper usages. This is necessary because Fonda needs to call some external scripts from src
folder (python
and R
subfolders) in some pipeline usages.
For different pipeline utilities, the user needs to make sure the corresponding software prerequisites are properly installed before executing a specific Fonda pipeline. The user can check the required software and databases in the global_config
files.
Workflow | Description |
---|---|
DnaCaptureVar_Fastq | DNA Captured sequencing data for genomic variant detection using fastq data |
DnaCaptureVar_Bam | DNA Captured sequencing data for genomic variant detection using bam data |
DnaAmpliconVar_Fastq | DNA Amplicon sequencing data for genomic variant detection using fastq data |
DnaAmpliconVar_Bam | DNA Amplicon sequencing data for genomic variant detection using bam data |
DnaWgsVar_Fastq | DNA whole genome sequencing data for genomic variant detection using fastq data |
DnaWgsVar_Bam | DNA whole genome sequencing data for genomic variant detection using bam data |
RnaCaptureVar_Fastq | RNA Captured sequencing data for genomic variant detection using fastq data |
HlaTyping_Fastq | DNA sequencing data for genomic HLA type prediction using fastq data |
Bam2Fastq | Convert bam file to fastq files |
RnaExpression_Fastq | RNA sequencing data for gene expression analysis using fastq data |
RnaExpression_Bam | RNA sequencing data for gene expression analysis using bam data |
scRnaExpression_Fastq | single cell RNA sequencing data for gene expression analysis using fastq data |
scRnaExpression_CellRanger_Fastq | 10X single cell RNA/TCR/BCR sequencing data for gene expression and immune profiling analysis using fastq data |
scRnaExpression_Bam | single cell RNA sequencing data for gene expression analysis using bam data |
RnaFusion_Fastq | RNA sequencing data for gene fusion detection using fastq data |
TcrRepertoire_Fastq | DNA or RNA sequencing data for TCR or BCR repertoire detection using fastq data |
java -jar fonda-<VERSION>.jar -help
Possible options:
Option | Description |
---|---|
Required | |
-global_config <arg> |
Configuration file for the particular workflow |
-study_config <arg> |
Configuration file for the specific study |
Non-required | |
-detail |
Show the details of the Fonda framework |
-local |
Default: no. Running the job on local machine |
-test |
Default: no. Test the commands without actually running the job |
-sync |
Default: no. Running Fonda in asynchronous mode, waiting for all tasks to complete |
-master |
Default: no. Running the main master script to manage all Fonda created scripts |
-help |
Show help utility message |
-global_config
file - sets a configuration file for a particular pipeline version (such as RnaExpression_Fastq 1095.1). In the config file, there are 4 sections:
- [all_tools] - contains paths to used tools
- [Databases] - contains input data/paths to input datasets
- [Pipeline_Info] - contains workflow and toolset settings
- [Queue_Parameters] - contains
sge
settings
If the user likes to change a parameter, a new version should be generated and recorded. However, different studies can share an identical pipeline.
Available parameter options for the global_config files you can see here.
Examples of the global_config files you can see here.
Please keep in mind that in each global_config file the only tools and databases are included that are required for executing this specific pipeline version.
For example,global_config_RnaExpression_Fastq_v1.1.txt
may list out the databases, tools and parameters for a particularRnaExpression_Fastq
pipeline version 1. Later on,global_config_RnaExpression_Fastq_v1.2.txt
may be prepared for anotherRnaExpression_Fastq
pipeline version 2. In the second config the required databases, tools and parameters might be quite different from the first one.
Therefore, all potential databases, tools and parameter options for each available workflow shall be listed out to make sure users can take the full advantage of using Fonda in different projects.
To control the line-endings behavior the line_ending
option was introduced in the [Pipeline_Info]
section. The option can be specified as LF
(Unix-style end-of-line marker) or CRLF
(Windows-style end-of-line marker) value. If the option is not specified, the LF
line separator was set as the default one.
-study_config
file - sets a configuration file for a particular study - for cases when a specific study is selected to perform the NGS data analysis. In this config file, there is 1 section - [Series_Info].
Required parameters for each workflow:
Parameter | Description |
---|---|
job_name | Sets the job ID |
dir_out | Sets the output directory for the analysis |
fastq_list / bam_list | Sets the path to the input manifest file |
LibraryType | Sets the sequencing library type - DNAWholeExomeSeq_Paired, DNAWholeExomeSeq_Single, DNATargetSeq_Paired, DNATargetSeq_Single, DNAAmpliconSeq_Paired, RNASeq_Paired, RNASeq_Single, etc. |
DataGenerationSource | Sets the data generation source - Internal, IGR, Broad, etc. |
Date | Sets the sequencing run date |
Project | Sets the project ID |
Run | Sets the run ID |
The format of input manifest files see here.
Examples of the study_config files you can see here.
-help
- to show the help message
-detail
- to show the workflow details available in the current Fonda framework
-local
- to run the job on the local machine without being submitted to the cluster
-test
- to have a pilot run in the command line interface without actually submitting jobs to the cluster
java -jar /path_to_data/fonda/<VERSION>/fonda-<VERSION>.jar -global_config /path_to_data/fonda/global_config/global_config_RnaExpression_Fastq_v1.1.txt -study_config /path_to_data/config_RnaExpression_Fastq_test.txt -test
For the test mode, no job will be submitted to the cluster for actual run. In this case, you will be able to check whether the contents in each shell scripts are properly organized. This is important for debugging purposes.
java -jar /path_to_data/fonda/<VERSION>/fonda-<VERSION>.jar -global_config /path_to_data/fonda/global_config/global_config_RnaExpression_Fastq_v1.1.txt -study_config /path_to_data/config_RnaExpression_Fastq_test.txt
java -jar /path_to_data/fonda/<VERSION>/fonda-<VERSION>.jar -global_config /path_to_data/fonda/global_config/global_config_RnaExpression_Fastq_v1.1.txt -study_config /path_to_data/config_RnaExpression_Fastq_test.txt -local
For the local machine mode, the individual jobs will be run on the local machine, without being submitted to the cluster.
In this case, scripts will be the same as in the cluster mode. The only difference is the jobs are not submitted to the cluster. This is important for debugging purpose.
- Shu Yan 1
- Tenghui Chen 1
- Joon Sang Lee 1
- Chandra Sekhar Pedamallu 1
- Mark Magid 1
- Quan Wan 1
- Ei-Wen Yang 1
- Donald Jackson 1
- Jack Pollard 1
- Aleksandr Sidoruk 2
- Mariia Zueva 2
- Mikhail Alperovich 2
- Yulia Kamyshova 2
1 Sanofi, 270 Albany Street, Cambridge, MA, USA
2 EPAM Systems, Inc.
Links to publications that contain Fonda references