pipengenex

Overview

pipengenex is a powerful framework designed for creating and executing Bioinformatic pipeline workflows. It includes a Command-Line Utility (CLI) tool for effortlessly running pipeline workflows and tasks defined through JSON workflow definitions. Some notable features of pipengenex include task parallelization and flexible workflow configuration.

Getting Started

To get started with pipengenex, you can install it using platform-specific binaries available in the Releases Page.

Using Platform-Specific Binaries

Go to the Releases Page.
Download the appropriate binary for your operating system (e.g., pipengenex.exe for Windows, pipengenex-linux for Linux, or pipengenex-macos for macOS).
Place the downloaded binary in a directory included in your system's PATH environment variable.
Open your terminal or command prompt and verify the installation by running:
```
pipengenex --version
```

Running Your First Workflow

Now that you have pipengenex installed, it's time to create and execute your first Bioinformatic pipeline workflow. Follow these steps to run your workflow:

Define a Workflow: Create a workflow definition using a JSON configuration file. You can use the provided example as a starting point.
Save the Workflow: Save your workflow definition to a file, e.g., my_workflow.json.
Run the Workflow: Open your terminal or command prompt and use the pipegenex CLI tool to run your workflow by providing the path to the workflow JSON file as an argument:

Defining Workflow

A typical Bioinformatic pipeline workflow is defined through a JSON configuration file. Here's an example of a workflow definition:

{
    "name": "My First RNASeq Workflow",
    "description": "This is my first RnaSeq workflow in my custom pipeline generator...",
    "working_directory": "~/home/bio/pipeline_2",
    "variables": [
        {
            "key": "samtools_d",
            "value": "~/usr/local/bin/samtools_d"
        },
        {
            "key": "thread_1",
            "value": "50"
        },
        {
            "key": "input_sampleID",
            "value": "human_te_genome.fasta.gz"
        },
        {
            "key": "output_name_key",
            "value": "human_te_genome_rna_seq_out"
        }
    ],
    "tasks": [
        {
            "id": "1",
            "command": "${samtools_d}samtools view -F 14 -b -@ $thread_1 $list_tmp1[0] >${input_sampleID}_m_ERV.bam",
            "description": "Align Reads"
        },
        {
            "id": "2",
            "command": "fastqc ",
            "description": "Quality Control"
        },
        {
            "id": "3",
            "command": "${samtools_d}samtools merge -f ${input_sampleID}_m2.bam ${input_sampleID}_m_ERV.bam ${input_sampleID}_m1_ERV.bam",
            "description": "Merge BAMs"
        },
        {
            "id": "4",
            "command": "python3 ./Scripts/run_python_analysis_script.py ./outputs/${input_sampleID}.bam ./outputs/${output_name_key}.csv",
            "description": "Analyze RnaSeq Outputs"
        },
        {
            "id": "5",
            "command": "climailer pipelineresults@website.org ./outputs/${output_name_key}.csv",
            "description": "Send Output via Email"
        },
        {
            "id": "6",
            "command": "echo 'Welcome to Jumanji'",
            "description": "Sleep for 10 seconds"
        }
    ],
    "run_sequence": [
        "1", "2", "3", "4", "5,6" 
    ]
}

In this JSON configuration, you define your workflow's name, description, working directory, variables, tasks, and run sequence. The command field in each task specifies the command to execute, with variables indicated by ${variable_name} syntax.

Run Sequence

Something worthy of note is the fact that you can define run_sequence for all the tasks in your workflow. Not only that, you can also tell the Engine to run some particular tasks in parallel say tasks with ids 2, 3 are independent of each other, then while defining your run sequence, you define your run sequence as follows:

"run_sequence": [
        "1", "2,3", "4", "5", "6" 
    ]

** This means, start running from task 1, run tasks 2 and 3 in parallel, then run task 4, then task 5, and lastly task 6.**

Running Workflows

After defining your workflow, you can run it using the pipegenex CLI tool. Simply provide the path to your workflow JSON file as an argument:

pipegenex run my_workflow.json

pipengenex will execute the tasks in the specified sequence, utilizing any available specified parallelization instruction set in run_sequence for optimal performance.

Report and Error Handling

Upon completing the workflow, pipengenex generates a report (report.csv) if the run was successful. In case of errors during execution, an errors.txt file is generated in the working directory. These reports help you monitor the progress and diagnose issues in your pipeline.

Examples

To see more examples and advanced usage of pipengenex, please refer to the Examples directory in the project repository.

Contributing

Contributions to pipengenex are welcome! If you'd like to contribute, please check out our Contribution Guidelines.

License

pipengenex is open-source and available under the MIT License. Feel free to use, modify, and distribute it according to the terms of the license.

Citing

Olusegun F.E. PipenGeneX -A Multipurpose, Multiplatform Bioinformatics Workflow Design and Execution Framework. Preprint. http://dx.doi.org/10.13140/RG.2.2.16151.96160

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
bin		bin
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

bin

bin

src

src

LICENSE

LICENSE

README.md

README.md

Repository files navigation

pipengenex

Overview

Table of Contents

Getting Started

Using Platform-Specific Binaries

Running Your First Workflow

Defining Workflow

Run Sequence

Running Workflows

Report and Error Handling

Examples

Contributing

License

Citing

About

Releases 1

Packages

Languages

License

propenster/pipengenex

Folders and files

Latest commit

History

Repository files navigation

pipengenex

Overview

Table of Contents

Getting Started

Using Platform-Specific Binaries

Running Your First Workflow

Defining Workflow

Run Sequence

Running Workflows

Report and Error Handling

Examples

Contributing

License

Citing

About

Resources

License

Stars

Watchers

Forks

Languages