We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The following line from the Ensembl Homo_sapiens.GRCh37.75.gtf is parsed incorrectly:
1 protein_coding exon 860260 860328 . + . gene_id "ENSG00000187634"; transcript_id "ENST00000420190"; exon_number "1"; gene_name "SAMD11"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "SAMD11-011"; transcript_source "havana"; exon_id "ENSE00001637883"; tag "cds_end_NF"; tag "mRNA_end_NF";
As you can see the line contains two 'tag' attributes. Only the second one is present in the DataFrame returned by read_gtf_as_dataframe:
read_gtf_as_dataframe
seqname source feature start end score strand frame \ 0 1 protein_coding exon 860260 860328 NaN + . gene_id transcript_id exon_number gene_name gene_source \ 0 ENSG00000187634 ENST00000420190 1 SAMD11 ensembl_havana gene_biotype transcript_name transcript_source exon_id \ 0 protein_coding SAMD11-011 havana ENSE00001637883 tag 0 mRNA_end_NF
The cds_end_NF tag is lost. Ideally both tags should be presented in a list, but I'm not sure if that's possible with pandas.
cds_end_NF
The text was updated successfully, but these errors were encountered:
Sorry for the very slow response and not sure if this is still relevant to you. I think this PR is trying to address the same issue: #6
In that case, the solution is to concatenate the multiple values in a comma separated string. I think collecting a list makes more sense.
Sorry, something went wrong.
No branches or pull requests
The following line from the Ensembl Homo_sapiens.GRCh37.75.gtf is parsed incorrectly:
As you can see the line contains two 'tag' attributes. Only the second one is present in the DataFrame returned by
read_gtf_as_dataframe
:The
cds_end_NF
tag is lost. Ideally both tags should be presented in a list, but I'm not sure if that's possible with pandas.The text was updated successfully, but these errors were encountered: