Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBAs from PAINT not being applied to GOA files? #1061

Closed
kltm opened this issue Apr 23, 2019 · 20 comments
Closed

IBAs from PAINT not being applied to GOA files? #1061

kltm opened this issue Apr 23, 2019 · 20 comments
Assignees

Comments

@kltm
Copy link
Member

kltm commented Apr 23, 2019

It seems that from the http://release.geneontology.org/2019-04-17/ release, PAINT files do not seem to be applied to the GOA (goa_*) resources as they are to the others--neither header no data. This is appears to be true for all goa_ and non-goa_ files sampled so far, comparing to http://release.geneontology.org/2019-03-18/ .

As part of this exploration, I also noticed that when unpacking goa_*.gaf.gz files, the "inner" archive name was "goa_*_valid.gaf" rather than the expected "goa_*.gaf", as would be expected. All non-GOA files that had PAINT applied properly did not seem to have this property. Unsure if related, but a sign of trouble and should be fixed if not.

Original reference:
geneontology/helpdesk#206

@kltm
Copy link
Member Author

kltm commented Apr 23, 2019

Quickest way forward is probably feedback from @dougli1sqrd

So far, no obvious files changes in Makefile and merge code that I can see in the relevant time period, but may be (likely?) missing something.

@kltm
Copy link
Member Author

kltm commented Apr 23, 2019

If this was not intended, we'll need to make a release note and reclassify this as a showstopper.

@kltm
Copy link
Member Author

kltm commented Apr 23, 2019

@pgaudet Would you know of any reason over the past couple of months that valid PAINT annotations would not be applied to GOA (goa_*) output?

@kltm
Copy link
Member Author

kltm commented Apr 23, 2019

Looking at possible changes in ontobio
biolink/ontobio@a4dfba2#diff-b474cebaed03ed9a6299feedd8d0985a

If cannot get traction on this soon, will test to an earlier ontobio version.

@pgaudet
Copy link
Contributor

pgaudet commented Apr 24, 2019

@pgaudet Would you know of any reason over the past couple of months that valid PAINT annotations would not be applied to GOA (goa_*) output?

I don't know. I hadn't noticed.

@pgaudet
Copy link
Contributor

pgaudet commented Apr 29, 2019

@kltm Any idea what's going on ? Will this be resolved for the upcoming release ?

Thanks, Pascale

@pgaudet
Copy link
Contributor

pgaudet commented Apr 29, 2019

We have no IBA annotations for human - isn't this a showstopper ? I would expect that changes our annotation corpus quite a bit (and any analysis done with the GO data).

@kltm
Copy link
Member Author

kltm commented Apr 29, 2019

@pgaudet It is indeed a "showstopper", which is why it has been labeled as "showstopper". Release will be suspended until this has been fixed.
We've considered "rolling back" this release, but the mechanisms to do that may well be more disruptive than trying to get a fixed release out ASAP.

kltm added a commit to geneontology/pipeline that referenced this issue Apr 30, 2019
kltm added a commit to geneontology/pipeline that referenced this issue Apr 30, 2019
@dougli1sqrd
Copy link
Contributor

The release build log shows that perhaps goa_chicken has properly merged paint_goa_chicken. Has anyone yet verified goa_chicken?

@dougli1sqrd
Copy link
Contributor

It looks like goa_chicken worked for some reason, and not the other goa datasets:

edouglass@Erics-MBP-2:/tmp$ curl -L http://release.geneontology.org/2019-04-17/annotations/goa_chicken.gaf.gz | gzip -dcf | grep PANTHER | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0!PANTHER version: v.13.1.
UniProtKB	F1NXP2	GTF2B		GO:0000979	PMID:21873635	IBA	MGI:MGI:2385191|PANTHER:PTN000178620|UniProtKB:Q00403	F	Uncharacterized protein	UniProtKB:F1NXP2|PTN000907432	protein	taxon:9031	20181024	GO_Central
UniProtKB	Q9PU53	TERF2	contributes_to	GO:0003720	PMID:21873635	IBA	PANTHER:PTN002929876|UniProtKB:Q15554	F	Telomeric repeat-binding factor 2	TERF2|TRF2	protein	taxon:9031	20180816	GO_Central
UniProtKB	A0A1D5P254	A0A1D5P254		GO:0009725	PMID:21873635	IBA	PANTHER:PTN002322100|RGD:3185	P	Uncharacterized protein	UniProtKB:A0A1D5P254|PTN002714537	protein	taxon:9031	20170928	GO_Central
UniProtKB	A0A1L1RU88	A0A1L1RU88		GO:0005938	PMID:21873635	IBA	MGI:MGI:2450166|PANTHER:PTN002696576	C	Uncharacterized protein	UniProtKB:A0A1L1RU88|PTN002696614	protein	taxon:9031	20180817	GO_Central
UniProtKB	E1C102	ELOVL7		GO:0030148	PMID:21873635	IBA	MGI:MGI:1858959|PANTHER:PTN000125390|SGD:S000000630|SGD:S000004364|UniProtKB:Q9BW60	P	Elongation of very long chain fatty acids protein 7	ELOVL7	protein	taxon:9031	20170228GO_Central
UniProtKB	P08642	HRAS		GO:0019003	PMID:21873635	IBA	PANTHER:PTN000631348|PomBase:SPAC17H9.09c|PomBase:SPBC428.16c|RGD:2981|RGD:621840|UniProtKB:P01112|UniProtKB:P10114|UniProtKB:P10301|UniProtKB:P11233|UniProtKB:P11234|UniProtKB:P61224|UniProtKB:P61225|UniProtKB:Q15382|UniProtKB:Q9Y3L5	F	GTPase HRas	HRAS|HRAS1	protein	taxon:9031	20190325	GO_Central
UniProtKB	A0A1D5PJH9	A0A1D5PJH9		GO:0005250	PMID:21873635	IBA	FB:FBgn0005564|MGI:MGI:102663|PANTHER:PTN000165409|RGD:68393|RGD:68394|UniProtKB:Q9NZV8|UniProtKB:Q9UK17	F	Uncharacterized protein	UniProtKB:A0A1D5PJH9|PTN002610100	protein	taxon:9031	20170228	GO_Central
UniProtKB	Q5ZLC9	GOLGA7		GO:0005795	PMID:21873635	IBA	PANTHER:PTN000327120|UniProtKB:Q7Z5G4	C	Golgin subfamily A member 7	GOLGA7|RCJMB04_6k22	protein	taxon:9031	20170228	GO_Central

@dougli1sqrd
Copy link
Contributor

In snapshot:
curl -L http://snapshot.geneontology.org/annotations/goa_human.gaf.gz | gzip -dcf | grep PANTHER | head yields:

UniProtKB	Q00056	HOXA4		GO:0033613	PMID:21873635	IBA	FB:FBgn0000439|PANTHER:PTN002518650	F	Homeobox protein Hox-A4	HOXA4|HOX1D	protein	taxon:9606	20180216	GO_Central
UniProtKB	Q9HB19	PLEKHA2		GO:0016020	PMID:21873635	IBA	PANTHER:PTN000371608|UniProtKB:Q9HB19	C	Pleckstrin homology domain-containing family A member 2	PLEKHA2|TAPP2	protein	taxon:9606	20170228	GO_Central
UniProtKB	Q6XYQ8	SYT10		GO:0014059	PMID:21873635	IBA	MGI:MGI:101759|MGI:MGI:1859545|MGI:MGI:99667|PANTHER:PTN000001283	P	Synaptotagmin-10	SYT10	protein	taxon:9606	20180613	GO_Central
UniProtKB	Q9HAN9	NMNAT1		GO:0009435	PMID:21873635	IBA	EcoGene:EG13241|MGI:MGI:1913704|MGI:MGI:1921330|PANTHER:PTN000247701|RGD:1307331|SGD:S000003242|SGD:S000004320|UniProtKB:F4K687	P	Nicotinamide/nicotinic acid mononucleotide adenylyltransferase 1	NMNAT1|NMNAT	protein	taxon:9606	20190319	GO_Central
UniProtKB	Q9GZQ6	NPFFR1		GO:0005887	PMID:21873635	IBA	FB:FBgn0038880|PANTHER:PTN001202348	C	Neuropeptide FF receptor 1	NPFFR1|GPR147|NPFF1	protein	taxon:9606	20170228	GO_Central
UniProtKB	Q96HC4	PDLIM5		GO:0005913	PMID:21873635	IBA	FB:FBgn0265991|PANTHER:PTN001198141|UniProtKB:Q9NR12	C	PDZ and LIM domain protein 5	PDLIM5|ENH|L9	protein	taxon:9606	20190221	GO_Central
UniProtKB	P01861	IGHG4		GO:0009897	PMID:21873635	IBA	MGI:MGI:96446|MGI:MGI:96447|MGI:MGI:96448|MGI:MGI:98612|MGI:MGI:99546|PANTHER:PTN001510949	C	Immunoglobulin heavy constant gamma 4	IGHG4	protein	taxon:9606	20170228	GO_Central

So it looks like PAINT is properly being applied at this point. Reopen if there is still an issue.

@suzialeksander
Copy link
Contributor

Got a helpdesk email about missing cow annotations, I've narrowed it down to there not being a single IBA in goa_cow.gaf (test: UniProt O18971 definitely has IBAs in Protein2GO). @dougli1sqrd I'm assuming this is the relevant ticket?

@kltm
Copy link
Member Author

kltm commented May 30, 2019

@suzialeksander @dougli1sqrd looking at our incoming products, it seems that there is no paint_goa_cow-src.gaf.gz (or any other paint_goa_cow*) as might be expected.
http://current.geneontology.org/products/annotations
This seems to be true on snapshot as well.

The two possibilities are that: 1) PAINT upstream does not supply cow or that 2) we are somehow not getting them (metadata issues)?

@dougli1sqrd
Copy link
Contributor

@suzialeksander did they say what annotations are missing? Were they IBAs? Yeah we filter out IBAs from all non-PAINT sources. We expect IBAs to come in from a PAINT gaf. But as @kltm pointed out, we don't have a corresponding paint_goa_cow*.gaf. It's not in the metadata.

@kltm
Copy link
Member Author

kltm commented May 30, 2019

@dustine32 Is there actually a cow paint file that we're missing upstream?

@suzialeksander
Copy link
Contributor

@dougli1sqrd Not a full list of errors, but the user provided GO IDs that are missing for PPARG. I can't get others easily as BioMart is down. I have a feeling these are all IBAs but only checked two, then looked at the IBA-less GAF:

I want to download bovine gene association files that include gene ID, gene symbol, GO ID, GO term, and biological domain. As a result of the revision of the Gene Ontology website, I don't know how to download these. I remember that in previous versions these relevant files could be downloaded directly.

But now, I can only find part of files from this link (http://current.geneontology.org/products/pages/downloads.html), and this is not quite consistent with the GO terms of the Ensembl annotation extracted from BioMart. For example, Ensembl has 87 non-redundant GO terms annotated to the PPARG gene, but Gene Ontology provide only 66 non-redundant GO terms with this gene (See the details below).

Ensembl Gene Ontology
GO:0000122  
GO:0000976  
GO:0000977  
GO:0000981  
GO:0001103  
GO:0001227  
GO:0003677 GO:0003677
GO:0003682 GO:0003682
GO:0003690  
GO:0003700 GO:0003700
GO:0003707 GO:0003707
GO:0004879 GO:0004879
GO:0005504  
GO:0005634 GO:0005634
GO:0005737 GO:0005737
GO:0006355 GO:0006355
GO:0006357 GO:0006357
GO:0006631  
GO:0006919 GO:0006919
GO:0007165  
GO:0007275  
GO:0008022 GO:0008022
GO:0008134  
GO:0008144  
GO:0008217 GO:0008217
GO:0008270 GO:0008270
GO:0008289  
GO:0009755  
GO:0010742 GO:0010742
GO:0010745 GO:0010745
GO:0010871 GO:0010871
GO:0010887  
GO:0010891 GO:0010891
GO:0016525 GO:0016525
GO:0019216  
GO:0019899 GO:0019899
GO:0030154  
GO:0030224 GO:0030224
GO:0030374 GO:0030374
GO:0030855 GO:0030855
GO:0032526 GO:0032526
GO:0032869 GO:0032869
GO:0032991  
GO:0033613 GO:0033613
GO:0033993  
GO:0035357 GO:0035357
GO:0038023  
GO:0042277 GO:0042277
GO:0042593 GO:0042593
GO:0042752 GO:0042752
GO:0042802 GO:0042802
GO:0042953 GO:0042953
GO:0043231  
GO:0043388 GO:0043388
GO:0043401 GO:0043401
GO:0043537 GO:0043537
GO:0043565  
GO:0043621 GO:0043621
GO:0044212 GO:0044212
GO:0045165 GO:0045165
GO:0045598 GO:0045598
GO:0045600 GO:0045600
GO:0045713 GO:0045713
GO:0045892 GO:0045892
GO:0045893 GO:0045893
GO:0045944 GO:0045944
GO:0046872  
GO:0046965 GO:0046965
GO:0046982 GO:0046982
GO:0048469 GO:0048469
GO:0048511 GO:0048511
GO:0048662  
GO:0050692 GO:0050692
GO:0050693 GO:0050693
GO:0050872 GO:0050872
GO:0051091 GO:0051091
GO:0051393 GO:0051393
GO:0060336 GO:0060336
GO:0060850 GO:0060850
GO:0060965 GO:0060965
GO:0061614 GO:0061614
GO:0070888 GO:0070888
GO:0071404 GO:0071404
GO:0090575  
GO:1904706 GO:1904706
GO:1905461 GO:1905461
GO:1905563 GO:1905563

@dustine32
Copy link
Contributor

@kltm @dougli1sqrd Yeah the cow IBAs are in paint_other.gaf. Checking ftp://ftp.pantherdb.org/downloads/paint/presubmission/gene_association.paint_other.gaf.gz:

$ cut -f13 gene_association.paint_other.gaf | grep 9913 | sort | uniq -c
60878 taxon:9913

Assuming 9913 is the taxon ID for cow.

@mugitty
Copy link
Contributor

mugitty commented May 30, 2019

@dustine32 9913 is what PANTHER uses for cow AKA Bos taurus (short letter code BOVIN)

@kltm
Copy link
Member Author

kltm commented May 30, 2019

Okay, great--thank you for the input. It seems that things are working as intended:

  • All IBAs are filtered out, as PAINT is the only upstream resource for these within the current GO pipeline
  • We get a hold of PAINT upstream; there is no separate goa_cow file to be appended, so no IBAs are appended to goa_cow
  • There are GOA IBAs for cow, available as above (e.g. zcat paint_other.gaf.gz | cut -f13 | grep 9913 | sort | uniq -c => 60806 taxon:9913), but in http://current.geneontology.org/products/annotations/paint_other.gaf.gz
sjcarbon@moiraine:/tmp$:( zcat paint_other.gaf.gz | cut -f 7,13 | grep 'taxon\:9913' | sort | uniq -c
  60806 IBA	taxon:9913

So, in summary, this is all as it should be. In the case of cow, there are two files that one needs to look at. In the future, when we move from resource-centric to species-centric, this friction would disappear.

@suzialeksander
Copy link
Contributor

Great, thanks for looking into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants