Skip to content

4. Advanced features

Natasha Pavlovikj edited this page Jan 27, 2021 · 4 revisions

ProkEvo is generalized to work as it is and generate results without any hassle. However, sometimes researchers may need to modify and expand ProkEvo for their own needs. ProkEvo has the capability for custom expansion which comes from using workflow management systems such as Pegasus. While these modifications may look advanced and more involved, there are only few steps to follow and we hope to make them easier for researchers with the detailed instructions provided here, as well as on the documentation page for Pegasus.

In order to add more advanced features to ProkEvo, the following files and directories are important to be noted:

  • scripts/ - this is the directory under ProkEvo where the scripts for all used tools and their specific options are stored.
  • root-dax.py - this is the Python script where all tasks and dependencies for the first sub-pipeline are defined. The first sub-pipeline performs the standard data processing steps of sequence trimming, de novo assembly, and quality control.
  • sub-dax.py - this is the Python script where all tasks and dependencies for the second sub-pipeline are defined. The second sub-pipeline uses the assemblies that have passed the quality control and performs specific population-based classifications (serotype prediction specifically for Salmonella, genotype classification at different scales of resolution, analysis of core- and pan-genomic content).
  • tc.txt - this is the Transformation Catalog of Pegasus where the mapping of the physical executables from the scripts/ directory to the Python scripts is performed.

Adding new features or removing and modifying existing ones usually requires manual changes in one of the files described above before the workflow is submitted.