Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastX Bug with Parse_hairpin #271

Closed
andrewdchen opened this issue Aug 28, 2023 · 7 comments
Closed

FastX Bug with Parse_hairpin #271

andrewdchen opened this issue Aug 28, 2023 · 7 comments
Labels
duplicate This issue or pull request already exists

Comments

@andrewdchen
Copy link

Description of the bug

I'm running the pipeline with several fastq files that have been QC'd and trimmed already. Running into the following error message:

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:PARSE_MATURE'

Caused by:
  Process `NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:PARSE_MATURE` terminated with an error exit status (255)

Command executed:

  # Uncompress FASTA reference files if necessary
  FASTA="mature.fa"
  if [ ${FASTA: -3} == ".gz" ]; then
      gunzip -f $FASTA
      FASTA=${FASTA%%.gz}
  fi
  # Remove spaces from miRBase FASTA files
  # sed -i 's, ,_,g' $FASTA
  sed '#^[^>]#s#[^AUGCaugc]#N#g' $FASTA > ${FASTA}_parsed.fa
  # TODO perl -ane 's/[ybkmrsw]/N/ig;print;' ${FASTA}_parsed_tmp.fa > ${FASTA}_parsed.fa
  
  sed -i 's# .*##' ${FASTA}_parsed.fa
  seqkit grep -r --pattern ".*hsa-.*" ${FASTA}_parsed.fa > ${FASTA}_sps.fa
  seqkit seq --rna2dna ${FASTA}_sps.fa > ${FASTA}_igenome.fa
  
  cat <<-END_VERSIONS > versions.yml
  NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:PARSE_MATURE":
      seqkit: $(echo $(seqkit 2>&1) | sed 's/^.*Version: //; s/ .*$//')
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  [ERRO] fastx: invalid FASTA/Q format

Seems like there's an error with fastx, which I assume is being called by seqkit but I'm not quite sure where to look given this. The fastq file has the following format.

>@A00821:1340:H2GW7DRX3:2:2101:4562:1031 1:N:0:AGATGTAC
GNCTGGCAGTAATGTAGAGCC

Command used and terminal output

nextflow run nf-core/smrnaseq \
    -profile test,singularity \
    --input clean_samplesheet.csv \
    --mature https://mirbase.org/download/CURRENT/mature.fa \
    --hairpin https://mirbase.org/download/CURRENT/hairpin.fa \
    --skip_fastqc \
    --skip_multiqc \
    --genome 'GRCh37' \
    --mirtrace_species 'hsa' \
    --protocol 'illumina' \
    --outdir 'Processed'

Relevant files

No response

System information

Nextflow version: 23.04.1
Hardware: HPC
Executor: Sun Grid Engine
Container engine: Singularity
OS: AlmaOS 8 Linux
Version of nf-core/smrnaseq: v2.2.1-gf7022ab

@andrewdchen andrewdchen added the bug Something isn't working label Aug 28, 2023
@apeltzer
Copy link
Member

Status code 255 usually suggests something like "file not found". This is very likely related to the MIRBASE urls not working anymore, as fixed in PR #269

@apeltzer apeltzer added duplicate This issue or pull request already exists and removed bug Something isn't working labels Aug 30, 2023
@andrewdchen
Copy link
Author

andrewdchen commented Aug 30, 2023

@apeltzer Sorry to re-open this, but the urls I used above are the updated ones!

@apeltzer apeltzer reopened this Sep 5, 2023
@apeltzer
Copy link
Member

apeltzer commented Sep 5, 2023

I have also checked now on my end (should've done this earlier). Are you by any chance using a Proxy Server in the background of wherever you run this pipeline?

When i look at a failed run on the HPC, i see this here:

<p>&gt;cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop<br>UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC<br>UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA<br>&gt;cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop<br>AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCU<br>GGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU<br>&gt;cel-mir-1 MI0000003 Caenorhabditis elegans miR-1 stem-loop<br>AAAGUGACCGUACCGAGCUGCAUACUUCCUUACAUGCCCAUACUAUAUCAUAAAUGGAUA<br>UGGAAUGUAAAGAAGUAUGUAGAACGGGGUGGUAGU<br>&gt;cel-mir-2 MI0000004 Caenorhabditis elegans miR-2 stem-loop<br>UAAACAGUAUACAGAAAGCCAUCAAAGCGGUGGUUGAUGUGUUGCAAAUUAUGACUUUCA<br>UAUCACAGCCAGCUUUGAUGUGCUGCCUGUUGCACUGU<br>&gt;cel-mir-34 MI0000005 Caenorhabditis elegans miR-34 stem-loop<br>CGGACAAUGCUCGAGAGGCAGUGUGGUUAGCUGGUUGCAUAUUUCCUUGACAACGGCUAC<br>CUUCACUGCCACCCCGAACAUGUCGUCCAUCUUUGAA<br>&gt;cel-mir-35 MI0000006 Caenorhabditis elegans miR-35 stem-loop<br>UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUA<br>UCACCGGGUGGAAACUAGCAGUGGCUCGAUCUUUUCC<br>&gt;cel-mir-36 MI0000007 Caenorhabditis elegans miR-36 stem-loop<br>CACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUA<br>UCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGA<br>&gt;cel-mir-37 MI0000008 Caenorhabditis elegans miR-37 stem-loop<br>UUCUAGAAACCCUUGGACCAGUGUGGGUGUCCGUUGCGGUGCUACAUUCUCUAAUCUGUA<br>UCACCGGGUGAACACUUGCAGUGGUCCUCGUGGUUUCU<br>&gt;cel-mir-38 MI0000009 Caenorhabditis elegans miR

Which is obvisouly not a FASTA file. If I look at the file on mirbase FTP / HTTPS, it looks clean without any of the
things inside.

@apeltzer
Copy link
Member

apeltzer commented Sep 5, 2023

#279

@apeltzer
Copy link
Member

apeltzer commented Sep 5, 2023

Tests reference a different URL than what the pipeline actually uses in real runs. That means we're not testing properly this special case...

@apeltzer
Copy link
Member

apeltzer commented Sep 5, 2023

#279 reproduces your error now @andrewdchen - thats a step forward, now finding out whats the matter actually :(

@apeltzer apeltzer mentioned this issue Sep 6, 2023
10 tasks
@apeltzer
Copy link
Member

apeltzer commented Sep 6, 2023

Fixed in 2.2.3

@apeltzer apeltzer closed this as completed Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants