Add the NEO into the main pipeline #35

kltm · 2018-04-17T21:11:41Z

The general idea would be to eliminate as much mechanism as possible as far as deployment and maintenance of multiple pipelines and servers. To this end, I've proposed that NEO (the neo.owl owltools ontology load, sorry @cmungall) gets folded into the main solr load and index. This would simply be:

adding the metadata for a new document category (e.g. neontology_class)
updating the schema
adding additional owltools call (that does not add to general)

A separate issue, not dealt with here, would be the adding of the creation of neo.owl itself. As we are just pulling from a URL, this can be separated.

Another, weaker, formulation would be to drop the NEO index separately, but within the new pipeline framework and runs.

kltm · 2018-06-30T01:01:00Z

Well, starting and exploring this a little bit, it will not pan out in a "merged" index--we would clobber on general, which is used (for example) by the ubernoodle for NEO.
Instead, at least for now, we'll look at making another index on the pipeline and switch over to deployment like the other indices we have now.

cmungall · 2018-06-30T16:19:58Z

switch to solr6 and use a separate core?

kltm · 2018-07-02T18:50:16Z

The idea is to simplify are current setup, reducing the number of deployed servers and/or number of distinct pipelines. As the Solr 6.x (higher now) is orthogonal, splitting out separately would be at least a temporary bump up in the above.

kltm · 2019-01-23T00:52:06Z

From an earlier experiment, the overlay is problematic. We'll work towards the weaker form to make progress on things like #73 and geneontology/neo#38 (comment)

kltm · 2019-01-23T00:53:08Z

Until we have a fix for the NEO job automation, it will be a manual step.

kltm · 2019-01-23T00:54:52Z

From @hdrabkin:

I had created 6 new PRO ids and they became available in our MGI GO EI on Friday. That means they are in the mgi.gpi, (I verified) which I expected would then make them available in Noctua today but they are not there.
PR:000050039
PR:000050038
PR:000050037
PR:000050036
PR:000050035
PR:000050034

kltm · 2019-01-23T00:56:28Z

Also see geneontology/neo#38 (comment)

hdrabkin · 2019-01-23T18:57:42Z

So does this mean these ids will be available soon?

kltm · 2019-01-24T09:38:44Z

A manual load is finishing now and a spot check seems positive -- try them now?

@cmungall I think there may be something up owltools and the NEO load. It seems to slow down towards the end of the ontology document loading (not for general docs), eventually giving out. I'll try and get a more nuanced view at some point, but it may be best to look towards this as a use case for a new python loader after the go-cams.

kltm · 2019-01-24T09:50:55Z

Actually, I'm not sure we use anything but the "general" doc in the index...
That would greatly speed-up and simplify things.

hdrabkin · 2019-01-24T15:55:21Z

Hi @cmungall and @kltm
Just checked this morning and the Pro ids are all available now. Thanks.

kltm · 2019-01-24T21:43:28Z

@cmungall We'll need to discuss 1) how we want to migrate the neo build to a new pipeline (whether main or not) and 2) what actual deployment looks like for the ontology

…ixes; work on #35

kltm · 2019-01-25T00:16:31Z

This will need to be tested a bit more, but it looks like the additional resources and updates on our new pipeline can make short work of the NEO products build:
http://skyhook.berkeleybop.org/issue-35-neo-test/products/solr/
This can be used to juggle updates in and out more safely in the interim.

kltm · 2019-01-25T00:18:10Z

From @cmungall : the PURLs are from the given S3 bucket, not Jenkins, so we just clobber them out.
He has also agreed with the plan of a second pipeline to support NEO as a separate product from the main pipeline, with the chance to revisit later.

kltm · 2019-01-25T02:05:32Z

Need more mem for Java:

/obo/BFO_0000040> "BFO:0000040"^^xsd:string) AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/CHEBI_23367> "CHEBI:23367"^^xsd:string) }
18:02:38 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
18:02:38 	at com.carrotsearch.hppcrt.sets.ObjectHashSet$EntryIterator.<init>(ObjectHashSet.java:734)
18:02:38 	at com.carrotsearch.hppcrt.sets.ObjectHashSet$1.create(ObjectHashSet.java:784)
18:02:38 	at com.carrotsearch.hppcrt.sets.ObjectHashSet$1.create(ObjectHashSet.java:779)
18:02:38 	at com.carrotsearch.hppcrt.ObjectPool.<init>(ObjectPool.java:74)
18:02:38 	at com.carrotsearch.hppcrt.IteratorPool.<init>(IteratorPool.java:51)
18:02:38 	at com.carrotsearch.hppcrt.sets.ObjectHashSet.<init>(ObjectHashSet.java:778)
18:02:38 	at com.carrotsearch.hppcrt.sets.ObjectHashSet.<init>(ObjectHashSet.java:157)
18:02:38 	at uk.ac.manchester.cs.owl.owlapi.HPPCSet.<init>(MapPointer.java:444)
18:02:38 	at uk.ac.manchester.cs.owl.owlapi.MapPointer.putInternal(MapPointer.java:324)
18:02:38 	at uk.ac.manchester.cs.owl.owlapi.MapPointer.init(MapPointer.java:151)
18:02:38 	at uk.ac.manchester.cs.owl.owlapi.MapPointer.getValues(MapPointer.java:190)
18:02:38 	at uk.ac.manchester.cs.owl.owlapi.OWLImmutableOntologyImpl.getAxioms(OWLImmutableOntologyImpl.java:1325)

cmungall · 2019-01-25T04:42:25Z

We could easily split neo into multiple separate files to be read. Seems like current approach won't scale if we add swissprot.

…

On Thu, Jan 24, 2019 at 6:05 PM kltm ***@***.***> wrote: Need more mem for Java: /obo/BFO_0000040> "BFO:0000040"^^xsd:string) AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/CHEBI_23367> "CHEBI:23367"^^xsd:string) } 18:02:38 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet$EntryIterator.<init>(ObjectHashSet.java:734) 18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet$1.create(ObjectHashSet.java:784) 18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet$1.create(ObjectHashSet.java:779) 18:02:38 at com.carrotsearch.hppcrt.ObjectPool.<init>(ObjectPool.java:74) 18:02:38 at com.carrotsearch.hppcrt.IteratorPool.<init>(IteratorPool.java:51) 18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet.<init>(ObjectHashSet.java:778) 18:02:38 at com.carrotsearch.hppcrt.sets.ObjectHashSet.<init>(ObjectHashSet.java:157) 18:02:38 at uk.ac.manchester.cs.owl.owlapi.HPPCSet.<init>(MapPointer.java:444) 18:02:38 at uk.ac.manchester.cs.owl.owlapi.MapPointer.putInternal(MapPointer.java:324) 18:02:38 at uk.ac.manchester.cs.owl.owlapi.MapPointer.init(MapPointer.java:151) 18:02:38 at uk.ac.manchester.cs.owl.owlapi.MapPointer.getValues(MapPointer.java:190) 18:02:38 at uk.ac.manchester.cs.owl.owlapi.OWLImmutableOntologyImpl.getAxioms(OWLImmutableOntologyImpl.java:1325) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADGOTBTHt9-NBd5dvmHC8Kovxg7ipq6ks5vGmZugaJpZM4TZDiF> .

hdrabkin · 2019-05-22T20:56:39Z

Hi Seth
We have another new ID in our GPI that needs to get into Noctua
PR:A0A1W6AWH1

kltm · 2019-05-22T22:35:40Z

@hdrabkin I believe that this is a different issue. Your should be cleared on the completion of
https://github.com/geneontology/noctua/issues/612

kltm · 2019-10-21T23:22:11Z

Previously discussed with @cmungall , we would spin out this branch into a new top-level pipeline. After starting work on that, I do not believe it's viable compared to formalizing it as a new branch in the current pipeline: it would either be a very fiddly piece of code that played carefully so as not to accidentally clobber skyhook locations or it would require a small rewrite of how skyhook works. While neither of these are insurmountable, given the small and likely temporary nature of this pipeline, I think formalizing the current branch into something slightly more permanent is the fastest and safest way forward.

kltm · 2019-10-22T19:01:26Z

Discussed with @goodb on how to make this a workable transition:

start producing neo as a file in /ontology/extensions/
likely just by running the same steps as now, but after the ontology build
aim neo purl at new file location (snapshot?)
turn off old job
incorporate neo build directly into go-ontology main build so that go-lego.owl and robot can use current (as in during the run) version of neo, rather than the day-old snapshot one; I may need to tap @goodb or @balhoff for this

With the completion of this, we can now either build the GOlr index for go-lego in the main pipeline, or do it elsewhere. Deployment would still be once a week or so, so it may be fine to keep the degenerate pipeline-neo branch separate.

kltm · 2020-04-17T01:30:09Z

From the call today, talking with @goodb and @balhoff, next steps in issue-35-neo-test branch:

For Noctua GOlr:

go-lego.owl and neo.owl

For Minerva:

go-lego-reacto.owl ("main" ontology)
question to @goodb: we don't have anything like that in there at the moment, what is the best command to produce this, or should this just be added into go-ontology master?
blazegraph-go-lego-neo-reacto.owl (ontojournal)

Clarification:
We no longer need the journal product blazegraph-go-lego-reacto.owl

yes, we no longer need it
no, we still want that around

goodb · 2020-04-17T17:05:31Z

go-lego-reacto.owl ("main" ontology) question to @goodb: we don't have anything like that in there at the moment, what is the best command to produce this, or should this just be added into go-ontology master?

Probably the best thing is to add this to the ontology makefile by making a go-lego-reacto-edit.ofn file, adding in reacto.owl as an import, and adding the target to the makefile just like the go-lego one. Note the issue of having code to make reacto in a different location from the ontology makefile - so synchronization may be an issue.

kltm · 2020-04-17T20:55:22Z

@goodb Hm. I don't think there is necessarily any issue with that, as reacto.owl is made during a normal pipeline run anyways and this experimental pipeline will eventually be folded into that. I suppose there is a bit of a trick with the references here, but hopefully that could be accomplished with a catalog or a materialized ontology. That said, it would actually be a convenience to have reacto.owl in the GO Makefile as well, would it not?

goodb · 2020-04-17T21:13:20Z

It would indeed be more convenient to have reacto built in the main makefile along with the others. We might actually promote that to a policy - that all ontology products are produced there. Downstream things like journals and indexes could happen elsewhere in the pipeline.

kltm · 2020-04-17T21:22:00Z

The building of reacto.owl is just a few lines and it feels like it would be an easy win and easy to back out of if necessary. It seems to need a single remote file and a single binary available for the build, possibly supplied by optional environmental variables. As that binary is a release, it might be nice just to add that as a lib to the go-ontology repo to cut the external dependency and make it a little more self-contained.
Since we need this (or a workalike) to continue making progress on getting NEO out and updating minerva, any reason not to just go ahead and do this? Otherwise, maybe I can just get the ontology merge command so I can make go-lego-reacto.owl at least on this branch and still get the updates out.

goodb · 2020-04-17T21:48:15Z

@kltm I don't see any reason not to go forward as you suggest. At some point I'd like to figure out why the source code build wasn't working in the pipeline environment and get it posted to maven. For now, I think the binary release approach we have now ought to work.

For merge if needed, pretty straightforward robot command. http://robot.obolibrary.org/merge

I can work on it this weekend if you want.

kltm · 2020-04-17T22:08:36Z

@goodb Okay, it would be great if you could go ahead with this. If at all possible, please be mindful of the relative positions of the files in the directory hierarchy:

/ontology/extensions/reacto.owl
/ontology/neo.owl

In the meantime, let me know if you'd like me to do the relatively straightforward merge to produce go-lego-reacto.owl to unblock you on testing minerva and co.

addresses geneontology/pipeline#35

…docker fail) and split derivatives so we can restart faster; work on #35

…eline issues; work on #35

… on #35

kltm · 2020-05-05T00:58:46Z

@goodb My current understanding is that, while this pipeline is still separate, it is now creating all of the products that we want.

Besides merging this back into the main pipeline (which may have to wait until we get some speed improvements), we still have some work on checking NEO here: https://github.com/geneontology/pipeline/blob/issue-35-neo-sanity-test/Jenkinsfile#L352 . @dougli1sqrd @goodb Is this still in progress?

dougli1sqrd · 2020-05-05T02:09:58Z

Yes I think that's still ongoing? I need to revisit to see the exact state, it's been a little bit since I've looked at that last.

goodb · 2020-05-05T04:48:40Z

@kltm my understanding matches yours with regard to pipeline build.

Regarding testing the products, I had written a couple simple sparql queries that could be run on the generated, merged ontologies. I had handed this off to @dougli1sqrd to pipelinify. It looks like he is running something on the merged go-lego.owl file using Robot. That ought to work but if its slow, the tests could be moved downstream to make use of the blazegraph journals that are now being generated. Test queries could be run with blazegraph-runner and ought to be fast.

BTW @dougli1sqrd running something with neo in its name ("sparql/neo/profile.txt") against go-lego.owl probably no longer makes sense. neo.owl is not currently included in go-lego.owl. It needs to be treated separately. Downstream there is a blazegraph journal that does merge them if you need them together.

@goodb

…a/minerva even during a neo run or neo failure; work on #35, heads up to @goodb as well

kltm · 2022-01-27T22:18:28Z

From the software call today, we don't want to forget making reacto creation and exposure better with this item.

kltm added the enhancement label Apr 17, 2018

kltm added this to the wishlist milestone Apr 17, 2018

kltm removed this from the wishlist milestone Jan 23, 2019

kltm changed the title ~~Explore adding the Solr NEO load into the main pipeline and index~~ Add the NEO Solr load into the main pipeline Jan 23, 2019

kltm mentioned this issue Jan 23, 2019

Ensure new PRO IDs are available via automated pipeline #73

Closed

kltm added a commit that referenced this issue Jan 24, 2019

first attempt at neo-only production of golr in pipeline; work on #35

35daac0

kltm added a commit that referenced this issue Jan 24, 2019

whoops on image, remove most soctware building; work on #35

0b96e2d

kltm added a commit that referenced this issue Jan 24, 2019

add in rudamentry neo build, but not deploy (@cmungall); some small f…

1fa33c0

…ixes; work on #35

kltm added a commit that referenced this issue Jan 24, 2019

whoops; work on #35

071d9d7

kltm added a commit that referenced this issue Jan 25, 2019

increase memory; work on #35

7157408

goodb added a commit to geneontology/go-ontology that referenced this issue Apr 18, 2020

add build for reacto.owl and go-lego-reacto.owl to stage_release

a644a12

addresses geneontology/pipeline#35

goodb mentioned this issue Apr 18, 2020

add build for reacto.owl and go-lego-reacto.owl to stage_release geneontology/go-ontology#19288

Merged

kltm added a commit that referenced this issue Apr 26, 2020

another tuninf for issue #35

13d18db

kltm added a commit that referenced this issue Apr 26, 2020

whoops for #35

b97d363

kltm added a commit that referenced this issue Apr 28, 2020

debugging for #35

67b5338

kltm added a commit that referenced this issue Apr 28, 2020

add more tmp to where the journals are going (to hopefully prevent a …

31485ee

…docker fail) and split derivatives so we can restart faster; work on #35

kltm added a commit that referenced this issue Apr 29, 2020

whoops for #35

d19d748

kltm added a commit that referenced this issue Apr 29, 2020

shorten cycle and add additional debugging to concentrate on late pip…

f04a65f

…eline issues; work on #35

kltm added a commit that referenced this issue Apr 29, 2020

add additional debugging to concentrate on late pipeline issues; work…

86c7e06

… on #35

kltm added a commit that referenced this issue Apr 29, 2020

possible copy-paste fix for neo product; work on #35

289bd60

kltm added a commit that referenced this issue Apr 29, 2020

unify derivatives again; work on #35

80ceef5

kltm added a commit that referenced this issue Apr 29, 2020

remove the fat from the system; work on #35

f7fc17e

kltm added a commit that referenced this issue May 5, 2020

pseudo-publish go-lego-reacto.owl to ensure that we can restart noctu…

82aed99

…a/minerva even during a neo run or neo failure; work on #35, heads up to @goodb as well

kltm mentioned this issue May 27, 2020

NEO blazegraph build fails on branch issue-35-neo-test #181

Closed

kltm added a commit that referenced this issue Jun 26, 2020

some doc and debug for issue #35

b3c0d5e

kltm added a commit that referenced this issue Jun 26, 2020

remove unnecessary reacto.owl stanza; likely echoes from fixes for #35

9edb178

kltm mentioned this issue Jan 26, 2022

Recent PRO ids do not display in the annotation table geneontology/noctua-form#159

Closed

kltm mentioned this issue Mar 29, 2022

Hacks in the NEO build process prevent branches from coexisting geneontology/neo#86

Closed

kltm mentioned this issue Nov 10, 2022

Redo NEO pipeline geneontology/project-management#52

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the NEO into the main pipeline #35

Add the NEO into the main pipeline #35

kltm commented Apr 17, 2018

kltm commented Jun 30, 2018

cmungall commented Jun 30, 2018

kltm commented Jul 2, 2018

kltm commented Jan 23, 2019

kltm commented Jan 23, 2019

kltm commented Jan 23, 2019 •

edited

Loading

kltm commented Jan 23, 2019

hdrabkin commented Jan 23, 2019

kltm commented Jan 24, 2019

kltm commented Jan 24, 2019 •

edited

Loading

hdrabkin commented Jan 24, 2019

kltm commented Jan 24, 2019

kltm commented Jan 25, 2019

kltm commented Jan 25, 2019

kltm commented Jan 25, 2019

cmungall commented Jan 25, 2019 via email

hdrabkin commented May 22, 2019

kltm commented May 22, 2019 •

edited

Loading

kltm commented Oct 21, 2019

kltm commented Oct 22, 2019

kltm commented Apr 17, 2020 •

edited

Loading

goodb commented Apr 17, 2020

kltm commented Apr 17, 2020

goodb commented Apr 17, 2020

kltm commented Apr 17, 2020

goodb commented Apr 17, 2020

kltm commented Apr 17, 2020 •

edited

Loading

kltm commented May 5, 2020

dougli1sqrd commented May 5, 2020

goodb commented May 5, 2020

kltm commented Jan 27, 2022

Add the NEO into the main pipeline #35

Add the NEO into the main pipeline #35

Comments

kltm commented Apr 17, 2018

kltm commented Jun 30, 2018

cmungall commented Jun 30, 2018

kltm commented Jul 2, 2018

kltm commented Jan 23, 2019

kltm commented Jan 23, 2019

kltm commented Jan 23, 2019 • edited Loading

kltm commented Jan 23, 2019

hdrabkin commented Jan 23, 2019

kltm commented Jan 24, 2019

kltm commented Jan 24, 2019 • edited Loading

hdrabkin commented Jan 24, 2019

kltm commented Jan 24, 2019

kltm commented Jan 25, 2019

kltm commented Jan 25, 2019

kltm commented Jan 25, 2019

cmungall commented Jan 25, 2019 via email

hdrabkin commented May 22, 2019

kltm commented May 22, 2019 • edited Loading

kltm commented Oct 21, 2019

kltm commented Oct 22, 2019

kltm commented Apr 17, 2020 • edited Loading

goodb commented Apr 17, 2020

kltm commented Apr 17, 2020

goodb commented Apr 17, 2020

kltm commented Apr 17, 2020

goodb commented Apr 17, 2020

kltm commented Apr 17, 2020 • edited Loading

kltm commented May 5, 2020

dougli1sqrd commented May 5, 2020

goodb commented May 5, 2020

kltm commented Jan 27, 2022

kltm commented Jan 23, 2019 •

edited

Loading

kltm commented Jan 24, 2019 •

edited

Loading

kltm commented May 22, 2019 •

edited

Loading

kltm commented Apr 17, 2020 •

edited

Loading

kltm commented Apr 17, 2020 •

edited

Loading