New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--subsample option not working with writing out a FASTQ file. #85

Closed
aniag opened this Issue Oct 9, 2018 · 0 comments

Comments

Projects
None yet
2 participants
@aniag

aniag commented Oct 9, 2018

It seems that in case of subsampling there is automatically a suffix .subsampled added to the output filename. While in general it is a good idea to make it obvious that it is not a full result, it backfires when ngless tries to format the filename (inserting pair.1, pair.2 and singles):

_formatFQOname base insert
    | "{index}" `isInfixOf` base = return $ replace "{index}" insert base
    | endswith ".fq" base = return $ removeEnd base ".fq" ++ "." ++ insert ++ ".fq"
    | endswith ".fq.gz" base = return $ removeEnd base ".fq.gz" ++ "." ++ insert ++ ".fq.gz"
    | endswith ".fq.bz2" base = return $ removeEnd base ".fq.bz2" ++ "." ++ insert ++ ".fq.bz2"
| otherwise = throwScriptError ("Cannot handle filename " ++ base ++ " (expected extension .fq/.fq.gz/.fq.bz2).")

Which then results in the following error message:

Cannot handle filename sample_name.preprocessed.fq.gz.subsampled (expected extension .fq/.fq.gz/.fq.bz2).

And a small example script which results in the above error:

ngless "0.7"
import "mocat" version "0.0"
                                                                                                                                                                                
sample = ARGV[2]
input = load_mocat_sample(ARGV[1] + '/' + sample)                                                                                                                               

input = preprocess(input, keep_singles=True) using |read|:                                                                                                                      
    read = substrim(read, min_quality=25)
    if len(read) < 45:                                                                                                                                                          
        discard
write(input, ofile=sample+'.preprocessed.fq.gz')  

@luispedro luispedro added the bug label Oct 9, 2018

@luispedro luispedro closed this in 8af2ec8 Oct 28, 2018

luispedro added a commit that referenced this issue Nov 12, 2018

BLD Release 0.10
Many small fixes rather than any large new features.

Full ChangeLog:

    * Fix to lock1's return value when used with paths (#68 - reopen)
    * Support _F/_R suffixes for forward/reverse in load_mocat_sample
    * samtools_sort() now accepts by={name} to sort by read name
    * Fixed bug where header was printed even when STDOUT was used
    * Fixed bug where writing interleaved FastQ to STDOUT did not work as
    expected
    * Indices created by bwa and minimap2 are now versioned
    * arg1 in external modules is no longer always treated as a path
    * Added expand_searchpath to external modules API (closes #56)
    * Fixed bug where detection of Fastq encoding was not performed on the second pair
    * Fix saving fastq sets with --subsample (issue #85)
    * Add __extra_megahit_args to assemble() (issue #86)
    * Better error message when user mis-specifies the ngless version string
    (issue #84)
    * Support NO_COLOR environment variable (issue #83)
    * Garbage collection for temporary files (issue #79)
    * Rename --search-dir to --search-path for consistency with other API
    * Fix corner case with select() producing incorrect CIGAR strings (#92)
    * Always check output file writability (#91)
    * Make paired() accept encoding argument
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment