Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence compostion #77

Open
kullrich opened this issue Feb 4, 2024 · 2 comments
Open

Sequence compostion #77

kullrich opened this issue Feb 4, 2024 · 2 comments

Comments

@kullrich
Copy link

kullrich commented Feb 4, 2024

Hi,

I am using angsd to produce fasta sequences, these are automatically gzipped fasta files.

If I open the fasta file with R-Biostrings the sequence compositon looks like this:

dna<-readDNAStringSet("WSBg.asm5.fa.gz")
> alphabetFrequency(dna[1])
            A        C        G        T M R W S Y K V H D B        N - + .
[1,] 53974765 37689595 37636633 53870814 0 0 0 0 0 0 0 0 0 0 11982472 0 0 0

However, for the same fasta.gz file the sequence composition with pyfastx looks like this:

fa=pyfastx.Fasta("WSBg.asm5.fa.gz")
s1=fa['chr1']
s1.composition
{'\x00': 162258284,
 'A': 8774131,
 'C': 5629514,
 'G': 5628512,
 'N': 4131093,
 'T': 8732745}

Could you please indicate what the '\x00' would mean?

Can it be that pyfastx can not correctly index read these gzipped files?

Thank you in anticipation

Best regards

Kristian

@lmdu
Copy link
Owner

lmdu commented Feb 8, 2024

Thanks. I will fix this bug

@lmdu
Copy link
Owner

lmdu commented Feb 28, 2024

Fixed in v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants