Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read comment #4

Closed
esteinig opened this issue Jan 3, 2020 · 4 comments
Closed

Read comment #4

esteinig opened this issue Jan 3, 2020 · 4 comments

Comments

@esteinig
Copy link

esteinig commented Jan 3, 2020

Hey nice tool - quite useful to extend pyfaidx to Fastq. Is there any chance you could implement to read the comment on a read header?

Currently the only accessible attribute is read.name when iterating over pyfastx.Fastq

@esteinig
Copy link
Author

esteinig commented Jan 3, 2020

Also on that note is there a function to write the complete read back to file, something like:

for read in fai:
    output.write(str(read))

This will write the sequence, but not the complete read.

@esteinig
Copy link
Author

esteinig commented Jan 3, 2020

Here is simple Python function for now:

def build_read_string(read, fastq: bool = False, comment: str = None):

    """ Build read string from pyfastx read """

    if fastq:
        return f"@{read.name}{' '+comment if comment else ''}" \
            f"\n{read.seq}\n+\n{read.qual}"
    else:
        return f">{read.name}\n{read.seq}"

@lmdu
Copy link
Owner

lmdu commented Jan 3, 2020

Good suggestion! In later versions, I will consider adding a ".raw" attribution to read and sequence object to get raw string as it appeared in file. But I am not sure if the read comment is important. In many fastq files, the comment line only contains a '+' char.

@esteinig
Copy link
Author

esteinig commented Jan 3, 2020

Thanks that's great to hear! I was imprecise when I said comment, which was a reference to the pysam comment read attribute, containing the content after the read name. Sometimes it contains useful information, for example when generating Fastq files from nanopore basecalling:

@8dc817b4-9485-4b09-884f-c5b4fd741d75 runid=9e281aa698a86f2cde7f5c6db95cdfa8b3edd3ff read=58861 ch=178 start_time=2019-07-30T21:52:20Z

In this case it would be useful to be able to access the string after the @name from the fields runid to start_time

@lmdu lmdu closed this as completed Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants