A small number of rows where gene name, symbol and feature type have "no value" #47

ValWood · 2022-05-18T13:51:49Z

ValWood · 2022-05-18T13:53:45Z

kimrutherford · 2022-05-18T21:42:13Z

I get the same list with this query:

<query model="genomic" view="Gene.primaryIdentifier Gene.secondaryIdentifier Gene.symbol Gene.name Gene.length Gene.organism.shortName" constraintLogic="(A and B)" sortOrder="">
   <constraint path="Gene.length" op="IS NULL" code="B"/>
   <constraint path="Gene.organism.shortName" value="S. pombe" op="=" code="A"/>
</query>

kimrutherford · 2022-05-18T22:06:44Z

I'm investigating (pombase/pombase-chado#967) why we export the transcript ID "SPAC1556.06.1" as an exact synonym for "SPAC1556.06" in the JSON file for PombeMine. But it might just be a coincidence that it's in this list.

SPAC1F12.03c and SPAC4H3.12c aren't current PomBase identifiers. Those genes were removed sometime in the past. (Details: https://www.pombase.org/status/new-and-removed-genes)
- Ensembl Genomes has SPAC4H3.12c still: https://fungi.ensembl.org/Schizosaccharomyces_pombe/Gene/Summary?g=SPAC4H3.12c
- SPAC1F12.03c isn't in Ensembl Genomes.
SPBC28F2.11 is a current PomBase gene. There are two genes with that DB identifier in PombeMine. I'm not sure why they haven't merged.
SPBC8E4.02c is a synonym of SPNCRNA.9001 in PomBase because two genes were merged in the past. In PombeMine there is a gene object for SPBC8E4.02c and one for SPNCRNA.9001.
- Ensembl Genomes has SPBC8E4.02c but not SPNCRNA.9001
SPCC548.03c.1 and SPCC548.03c.2 are transcript IDs.

kimrutherford · 2022-05-22T05:48:50Z

I thought we only load genes from PomBase?

They will be loaded from any source that has gene data.

I should have done this earlier. Here is the result of querying PombeMine for the gene identifier and the DataSet that the identifier came from:

identifier	DataSet
Q9H9V9	GO Annotation data set
SPAC1556.06.1	BioGRID interaction data set
SPAC1F12.03c	BioGRID interaction data set
SPAC4H3.12c	BioGRID interaction data set
SPBC28F2.11	cerevisiae-orthologs data set
SPBC8E4.02c	BioGRID interaction data set
SPCC548.03c.1	GO Annotation data set
SPCC548.03c.2	GO Annotation data set

ValWood · 2022-05-23T09:28:32Z

https://beta.uniprot.org/uniprotkb/Q9H9V9/entry is a human entry, determine how this gets into pombe gene set

ValWood · 2022-05-23T09:32:14Z

Contact BioGRID about:
SPBC8E4.02c is now a synonym of -> SPNCRNA.9001 (there is no longer a protein coding orf for this ID)
SPAC1F12.03c. removed; replaced by a nuclear mitochondrial pseudogene (NUMT) feature
SPAC4H3.12c not protein-coding (of upstream region of snr62). No corresponding gene feature (but might be part of snr62 transcript)
SPAC1556.06.1 is a transcript ID for an alternative transcript of SPAC1556.06

Also asked @kimrutherford not to load into PomBase
#51

ValWood · 2022-05-23T09:34:38Z

SPBC28F2.11 | cerevisiae-orthologs data set

I don't understand this one. The S. c orthologs are parsed from the contig files and this isn't mentioned except as a systematic ID?

See query
#50

ValWood · 2022-05-23T09:40:24Z

when I search UniPRrt for these isoforms I only get one entry
Q9P3V0

Can you send the GOA GAF so that I can investigate further? (the alternative forms would be in the column "gene product form ID (column 17)

Addded to #51

kimrutherford · 2022-05-23T10:01:13Z

SPBC28F2.11 | cerevisiae-orthologs data set
I don't understand this one. The S. c orthologs are parsed from the contig files and this isn't mentioned except as a systematic ID?
Yep, I think that's one for InterMine to investigate. outdated

Can you send the GOA GAF so that I can investigate further? (the alternative forms would be in the column "gene product form ID (column 17)

Here's the pombe and japonicus lines from the GOA GAF we load:
https://curation.pombase.org/kmr44/gene_association.goa_uniprot.pombe+japonicus-2022-04-01.tsv.gz

That's what PomBase uses, but PombeMine might be reading the XML file.

ValWood · 2022-05-23T11:43:44Z

identifier	DataSet
SPCC548.03c.1	GO Annotation data set
SPCC548.03c.2	GO Annotation data set

#51

ValWood · 2022-05-23T11:57:53Z

Sorry @danielabutano ! I thought this ticket was on our tracker whilst we tracked down the sources. So I can close this issue, more informative tickets. have been opened for the individual issues requiring action.

ValWood · 2022-05-23T21:33:19Z

BioGrid have mailed back. They have fixed the 4 issues at their end so these will disappear soon.

This comment was marked as outdated.

Sign in to view

kimrutherford mentioned this issue May 18, 2022

Search on a gene sometimes gives 2 features not one #46

Closed

This comment was marked as outdated.

Sign in to view

ValWood closed this as completed May 23, 2022

ValWood mentioned this issue May 23, 2022

Do not load genes from BioGRrid that are not in PomBase pombase/pombase-chado#969

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A small number of rows where gene name, symbol and feature type have "no value" #47

A small number of rows where gene name, symbol and feature type have "no value" #47

ValWood commented May 18, 2022

This comment was marked as outdated.

ValWood commented May 18, 2022

kimrutherford commented May 18, 2022

kimrutherford commented May 18, 2022

This comment was marked as outdated.

kimrutherford commented May 22, 2022

ValWood commented May 23, 2022

ValWood commented May 23, 2022 •

edited

Loading

ValWood commented May 23, 2022 •

edited

Loading

ValWood commented May 23, 2022 •

edited

Loading

kimrutherford commented May 23, 2022 •

edited by ValWood

Loading

This comment was marked as outdated.

ValWood commented May 23, 2022 •

edited

Loading

ValWood commented May 23, 2022 •

edited

Loading

ValWood commented May 23, 2022

A small number of rows where gene name, symbol and feature type have "no value" #47

A small number of rows where gene name, symbol and feature type have "no value" #47

Comments

ValWood commented May 18, 2022

This comment was marked as outdated.

ValWood commented May 18, 2022

kimrutherford commented May 18, 2022

kimrutherford commented May 18, 2022

This comment was marked as outdated.

kimrutherford commented May 22, 2022

ValWood commented May 23, 2022

ValWood commented May 23, 2022 • edited Loading

ValWood commented May 23, 2022 • edited Loading

ValWood commented May 23, 2022 • edited Loading

kimrutherford commented May 23, 2022 • edited by ValWood Loading

This comment was marked as outdated.

ValWood commented May 23, 2022 • edited Loading

ValWood commented May 23, 2022 • edited Loading

ValWood commented May 23, 2022

ValWood commented May 23, 2022 •

edited

Loading

ValWood commented May 23, 2022 •

edited

Loading

ValWood commented May 23, 2022 •

edited

Loading

kimrutherford commented May 23, 2022 •

edited by ValWood

Loading

ValWood commented May 23, 2022 •

edited

Loading

ValWood commented May 23, 2022 •

edited

Loading