Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum split_fastq value is 250 #1363

Closed
adamrtalbot opened this issue Dec 20, 2023 · 4 comments
Closed

Minimum split_fastq value is 250 #1363

adamrtalbot opened this issue Dec 20, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@adamrtalbot
Copy link
Contributor

Description of the bug

The minimum value for split_fastq is 250 because this the smallest value accepted by FASTP is 1000 but this isn't reflected in the schema. Discovered while trying to fix #1357.

This could be addressed in the schema.

Command used and terminal output

nextflow run . -profile test_cache,docker --outdir results --split_fastq 249 --test_data_base ~/test-datasets


ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:FASTP (test-test_L2)'

Caused by:
  Process `NFCORE_SAREK:SAREK:FASTP (test-test_L2)` terminated with an error exit status (255)

Command executed:

  [ ! -f  test-test_L2_1.fastq.gz ] && ln -sf test_1.fastq.gz test-test_L2_1.fastq.gz
  [ ! -f  test-test_L2_2.fastq.gz ] && ln -sf test_2.fastq.gz test-test_L2_2.fastq.gz
  fastp \
      --in1 test-test_L2_1.fastq.gz \
      --in2 test-test_L2_2.fastq.gz \
      --out1 test-test_L2_1.fastp.fastq.gz \
      --out2 test-test_L2_2.fastp.fastq.gz \
      --json test-test_L2.fastp.json \
      --html test-test_L2.fastp.html \
       \
       \
       \
      --thread 2 \
      --detect_adapter_for_pe \
      --disable_adapter_trimming      --split_by_lines 996 \
      2> >(tee test-test_L2.fastp.log >&2)
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:FASTP":
      fastp: $(fastp --version 2>&1 | sed -e "s/fastp //g")
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  ERROR: you have enabled splitting output by file lines, the file lines (--split_by_lines) should be >= 1000.


### Relevant files

_No response_

### System information

_No response_
@adamrtalbot adamrtalbot added the bug Something isn't working label Dec 20, 2023
@maxulysse
Copy link
Member

actually minimal is 0, but good spot

@adamrtalbot
Copy link
Contributor Author

Source for fastp v0.23.4: https://github.com/OpenGene/fastp?tab=readme-ov-file#output-splitting

Use -S or --split_by_lines to limit the lines of each file. The last files may have smaller sizes since usually the input file cannot be perfectly divided. The actual file lines may be a little greater than the value specified by --split_by_lines since fastp reads and writes data by blocks (a block = 1000 reads).

adamrtalbot added a commit that referenced this issue Dec 20, 2023
Changes:
 - FASTP uses blocks of 250 reads when splitting a FASTQ file.
 - This update makes 250 the minimum sized block to split a FASTQ file into.
 - Updates help text accordingly

Fixes #1363
@FriederikeHanssen
Copy link
Contributor

Minimal is 0 in the schema, because that turns of any splitting. Is there a way, it can be 0 or >=250 with the schema?

@FriederikeHanssen
Copy link
Contributor

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants