Sequence compostion #77

kullrich · 2024-02-04T11:10:31Z

Hi,

I am using angsd to produce fasta sequences, these are automatically gzipped fasta files.

If I open the fasta file with R-Biostrings the sequence compositon looks like this:

dna<-readDNAStringSet("WSBg.asm5.fa.gz")
> alphabetFrequency(dna[1])
            A        C        G        T M R W S Y K V H D B        N - + .
[1,] 53974765 37689595 37636633 53870814 0 0 0 0 0 0 0 0 0 0 11982472 0 0 0

However, for the same fasta.gz file the sequence composition with pyfastx looks like this:

fa=pyfastx.Fasta("WSBg.asm5.fa.gz")
s1=fa['chr1']
s1.composition
{'\x00': 162258284,
 'A': 8774131,
 'C': 5629514,
 'G': 5628512,
 'N': 4131093,
 'T': 8732745}

Could you please indicate what the '\x00' would mean?

Can it be that pyfastx can not correctly index read these gzipped files?

Thank you in anticipation

Best regards

Kristian

The text was updated successfully, but these errors were encountered:

lmdu · 2024-02-08T12:49:37Z

Thanks. I will fix this bug

lmdu · 2024-02-28T14:31:36Z

Fixed in v2.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence compostion #77

Sequence compostion #77

kullrich commented Feb 4, 2024

lmdu commented Feb 8, 2024

lmdu commented Feb 28, 2024

Sequence compostion #77

Sequence compostion #77

Comments

kullrich commented Feb 4, 2024

lmdu commented Feb 8, 2024

lmdu commented Feb 28, 2024