new module: xengsort/classify #5411

diegomscoelho · 2024-03-29T00:13:05Z

Is there an existing module for this?

I have searched for the existing module

Is there an open PR for this?

I have searched for existing PRs

Is there an open issue for this?

I have searched for existing issues

Are you going to work on this?

If I'm planning to work on this module, I added myself to the Assignees to facilitate tracking who is working on the module

The text was updated successfully, but these errors were encountered:

tamuanand · 2024-05-21T20:15:48Z

Hi

I have this that works for me - from https://gitlab.com/genomeinformatics/xengsort#how-to-classify

Note: I explicitly rename my graft/host files as human/mouse. I also cat/merge all files not graft/host as other. Feel free to adapt this.

process XENGSORT {
    tag "$sample_id Attempt_${task.attempt}_cpus_${task.cpus}_mem_${task.memory}"
    debug true

    publishDir "${params.outdir}/1_QC/PreAlignment/2_${task.process}",
        mode: 'copy', pattern: "*_stats.txt"
    publishDir "${params.outdir}/1_QC/PreAlignment/2_${task.process}",
        mode: 'copy', saveAs: { it == '.command.sh' ? "${sample_id}.${task.process}.commands.txt" : null }

    input:
    path(xengsort_reference)
    tuple val(sample_id), path(reads)
    
    output:
    tuple val(sample_id), path('*_human*.fastq.gz')  , emit: human_reads
    tuple val(sample_id), path('*_mouse*.fastq.gz')  , emit: mouse_reads
    tuple val(sample_id), path('*_other*.fastq.gz')  , emit: other_reads
    tuple val(sample_id), path('*_human*.fastq.gz'), path('*_other*.fastq.gz')  , emit: human_and_other_reads
    tuple val(sample_id), path('*.xengsort_stats.txt') , emit: xengsort_stats
    path(".command.sh")
    path "versions.yml"                              , emit: versions
    
    script:
    if (params.single_end) {
        """
        [ ! -f  ${sample_id}_1.trim.fastq.gz ] && ln -s ${reads[0]} ${sample_id}_1.trim.fastq.gz

         xengsort -DD classify \\
            --index \$INDEX \\
            --fastq ${sample_id}_1.trim.fastq.gz \\
            --prefix "${sample_id}" \\
            --classification count \\
            --threads "${task.cpus}" \\
            --compression gz \\
            --progress \\
            >& "${sample_id}.${task.process}.xengsort_stats.txt"
        
       // Do other stuff like in the paired-end section as below
        """
    } else {
        """
       
        [ ! -f  ${sample_id}_1.trim.fastq.gz ] && ln -s ${reads[0]} ${sample_id}_1.trim.fastq.gz
        [ ! -f  ${sample_id}_2.trim.fastq.gz ] && ln -s ${reads[1]} ${sample_id}_2.trim.fastq.gz
        
        now="\$(TZ=EST5EDT date +'%d-%b-%Y_%H%M') EDT"
        echo "======== Starting xengsort classify ===============" \$now
        echo -e "xengsort classify started:\t\$now" >> ${sample_id}.job_stats.txt

        INDEX=`find -L ./ -name "*.hash" | sed 's/\\.hash\$//'`

        echo "INDEX = \$INDEX" \\
            | tee -a "${sample_id}.job_stats.txt"

        xengsort -DD classify \\
            --index \$INDEX \\
            --fastq ${sample_id}_1.trim.fastq.gz --pairs ${sample_id}_2.trim.fastq.gz \\
            --prefix "${sample_id}" \\
            --classification count \\
            --threads "${task.cpus}" \\
            --compression gz \\
            --progress \\
            >& "${sample_id}.${task.process}.xengsort_stats.txt"

        now="\$(TZ=EST5EDT date +'%d-%b-%Y_%H%M') EDT"
        echo "======== Finished xengsort classify ===============" \$now
        echo -e "xengsort classify ended, merge fastq files:\t\$now" \\
            | tee -a "${sample_id}.job_stats.txt"

        mv "${sample_id}-graft.1.fq.gz" "${sample_id}_human_R1.fastq.gz"
        mv "${sample_id}-graft.2.fq.gz" "${sample_id}_human_R2.fastq.gz"

        mv "${sample_id}-host.1.fq.gz" "${sample_id}_mouse_R1.fastq.gz"
        mv "${sample_id}-host.2.fq.gz" "${sample_id}_mouse_R2.fastq.gz"

        OTHER_1=`ls -1v | grep -E "(ambiguous|both|neither|unclassified).1.fq.gz" | tr '\\n' ' '`
        echo "OTHER_1 = \$OTHER_1" \\
            | tee -a "${sample_id}.job_stats.txt"

        OTHER_2=`ls -1v | grep -E "(ambiguous|both|neither|unclassified).2.fq.gz" | tr '\\n' ' '`
        echo "OTHER_2 = \$OTHER_2" \\
            | tee -a "${sample_id}.job_stats.txt"

        cat \$OTHER_1 > "${sample_id}_other_R1.fastq.gz"
        cat \$OTHER_2 > "${sample_id}_other_R2.fastq.gz"

        now="\$(TZ=EST5EDT date +'%d-%b-%Y_%H%M') EDT"
        echo "======== Merge fastq xengsort done ===============" \$now

        echo "Deleting OTHER_1 and OTHER_2 -- \$OTHER_1 \$OTHER_2" \\
            | tee -a "${sample_id}.job_stats.txt"

        rm -f \$OTHER_1 \$OTHER_2 

        echo -e "Merge fastq xengsort done:\t\$now" >> ${sample_id}.job_stats.txt
        echo -e "ALL DONE:\t\$now" >> ${sample_id}.job_stats.txt

        cat <<-END_VERSIONS > versions.yml
        "${task.process}":
            xengsort: \$( xengsort --version )
            container: "${task.container}"
        END_VERSIONS
        """
    }
}

diegomscoelho added the new module Adding a new module label Mar 29, 2024

diegomscoelho self-assigned this Mar 29, 2024

famosab linked a pull request Jul 8, 2024 that will close this issue

add module: Add xengsort/classify #5413

Draft

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new module: xengsort/classify #5411

new module: xengsort/classify #5411

diegomscoelho commented Mar 29, 2024

tamuanand commented May 21, 2024 •

edited

Loading

new module: xengsort/classify #5411

new module: xengsort/classify #5411

Comments

diegomscoelho commented Mar 29, 2024

Is there an existing module for this?

Is there an open PR for this?

Is there an open issue for this?

Are you going to work on this?

tamuanand commented May 21, 2024 • edited Loading

tamuanand commented May 21, 2024 •

edited

Loading