OGER hackathon #34

Aequivinius · 2018-04-13T14:30:54Z

Dear organisers

We're preparing our submission of OGER, a dictionary-based entity recogniser, as a webservice for openminted. We're currently in the process of fixing a few remaining issues that relate to how we parse the XMI that we receive from openminted. As it currently stands, it looks like the payload of the requests includes some non-XML preface, which we need to cut in order to parse the document to be annotated. Would you have a sample of how OMTD constructs the requests payload?

As for the hackathon, would it be possible to find a time on Tuesday afternoon? Most people from our group can make it then. Apart from that, Thursday or Friday would suit us, too.

Thanks for your help & kind regards,

Nico

greenwoodma · 2018-04-13T14:39:26Z

Is the non-XML preface in the XMI file a Unicode BOM (Byte Order Marker)? In theory the files should be UTF-8 which I don't believe requires a BOM, but I know we've had a problem in GATE before (outside of OpenMinTeD) where XML files from odd sources had a BOM prefix.

If it helps then the code we use in GATE to ensure we always discard the BOM can be found at https://github.com/GateNLP/gate-core/blob/master/src/main/java/gate/util/BomStrippingInputStreamReader.java

greenwoodma · 2018-04-16T12:23:16Z

To make sure we are as well prepared as possible to help during the hackathon sessions could you please add/attach to this issue:

The landing page URL of any component/workflow you have registered
The OMTD-SHARE XML file for each component/workflow
One or two sample documents that you expect to produce sensible output for your component/workflow

Aequivinius · 2018-04-16T15:51:43Z

Dear @greenwoodma:

This is the URL of OGER on OMTD: https://test.openminted.eu/landingPage/application/d71caa63-9444-4bee-8161-52e0462c7eb0
Attached the share XML
oger.xml.zip
We've tested the service using the OpenMinTeD subset of OpenAIRE publications on term "Thalamus" (https://test.openminted.eu/landingPage/corpus/ac016a8f-ebb8-4b92-b808-3b11491c4199)

galanisd · 2018-04-16T17:00:09Z

For some reason the code that is generating Galaxy XML wrappers didn't work as expected. The typesystem you provided was not copied. I do not know why...
@nguyennth and I have registered Manchester's
web service many times without problems.

So, I deleted your record and re-registered it.
Here it the new landing page. https://test.openminted.eu/landingPage/application/OGERWS
Wrapper was generated correctly.

Then used the registered app to process the thalamus corpus.

Finished .... :-) :-) :-)

Output is here
https://test.openminted.eu/landingPage/corpus/7691bf1a-283d-43bc-9653-26f482476264
and here
6ef31b96-675d-4078-88fa-ddecd7ad1a77.zip

Please check it. I do not see any NER annotations.
What we should expect?
Probably it has to do with the typesystem you provided
mvn:de.tudarmstadt.ukp.dkpro.core.api.ner-asl:1.9.0

Maybe we need some help by University Of Manchester that developed the web service
spec. for OMTD @nguyennth or @reckart that knows everything about DKPro.

The typesystem is required from the web service client to serialize the results. If it is not there
the respective annotations will not included in the output.

Aequivinius · 2018-04-16T17:23:20Z

Yeah, this is the issue we're currently investigating, and which we were hoping to discuss during the Hackathon.

OGER sends NER annotations, but OMTD doesn't seem to care for them when it re-parses our results. I'm actually a bit at a loss as for what sort of typesystem we should provide and how so. We have this file ready on our server (typesystem.xml.zip), which I would've expected to provide the necessary information. However, OMTD never sends a request for this file.

If you have any more information on what sort of typesystem file precisely we need to add where, that would be greatly appreciated.

galanisd · 2018-04-16T17:31:20Z

Please see this one as an example.
https://mvnrepository.com/artifact/uk.ac.nactem.uima/NeuroscienceTypeSystem/0.2
You can download the jar see the its structure and contents.
@nguyennth can provide some more info I think.

gkirtzou · 2018-04-17T09:23:19Z

@Aequivinius There is a minor semantical error in your metadata. Your component takes as input a whole corpus of documents, not a single document, and generated annotations for the corpus, thus an annotated corpus. Correct? If that's the case, please change the processingResourceType from document to corpus in both inputContentResourceInfo and outputResourceInfo, in the final version of your metadata.

Aequivinius · 2018-04-17T13:22:25Z

@gkirtzou Done

@galanisd | @nguyennth I have a few questions:

What is the proper way to register the typesystem in the share-omtd.xml? Currently, we're doing this:

<ns0:typesystem> <ns0:resourceNames> <ns0:resourceName lang="en">DKPro Core</ns0:resourceName> </ns0:resourceNames> <ns0:resourceIdentifiers> <ns0:resourceIdentifier resourceIdentifierSchemeName="maven">mvn:de.tudarmstadt.ukp.dkpro.core.api.ner-asl:1.9.0</ns0:resourceIdentifier> </ns0:resourceIdentifiers> </ns0:typesystem>

Is there a good written documentation of the format of those typesystem files? If not, what specifically do we have to add to allow for tags such as the following to be included in our annotations:

<neType:NamedEntity begin="262056" xmi:id="9759" end="262078" sofa="2" identifier="D000070642"/>

When does OMTD send requests for the typesystem file that we have on our server?

galanisd · 2018-04-17T14:45:10Z

The NeuroScience maven artifact was registered as follows:
<ns0:resourceIdentifiers> <ns0:resourceIdentifier resourceIdentifierSchemeName="maven">mvn:uk.ac.nactem.uima:NeuroscienceTypeSystem:0.2</ns0:resourceIdentifier> </ns0:resourceIdentifiers>

It seems identical to yours. The web service executor that I created downloads this artifact and adds it
to its classpath...For contents and structure you should ask @nguyennth .

galanisd · 2018-04-17T15:07:52Z

Does anyone know why this
https://test.openminted.eu/landingPage/application/OGERWS
has disappeared?

It was deleted by someone?
There is a new landing page?

Aequivinius · 2018-04-17T15:10:07Z

I noticed it, too, currently using this ( https://test.openminted.eu/landingPage/application/b8fb9bbd-603c-4b53-b86d-15c6c753302d). It is set to private so I can easily play around with different typesystems, but I can set it to public if you need me to.

…

On Tue, Apr 17, 2018 at 5:07 PM, Dimitrios Galanis ***@***.*** > wrote: Does anyone know why this https://test.openminted.eu/landingPage/application/OGERWS has disappeared? It was deleted by someone? There is a new landing page? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#34 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AK6JaJRYxRzSvvgAeCPaYCdbmS4WQE3Rks5tpgVKgaJpZM4TTjPa> .

galanisd · 2018-04-17T15:11:16Z

I am sure that I didn't delete it
@antleb Any ideas?

Aequivinius · 2018-04-17T15:35:54Z

I've tried now these maven coordinates in the omtd-share.xml, which seem correct:

mvn:de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.api.ner-asl:1.9.1

This should point to this repository, if I'm not mistaken, which includes the necessary info.

However, our namedEntity annotations are still missing from OMTD.

galanisd · 2018-04-17T15:58:55Z

You are expecting things like this?
<type2:NamedEntity xmi:id="22347" sofa="56913" begin="219" end="225" identifier="A4FV52"/><type2:NamedEntity xmi:id="22353" sofa="56913" begin="230" end="236" identifier="A6QLI1"/><type2:NamedEntity xmi:id="22359" sofa="56913" begin="263" end="272" identifier="CHEBI:14321"/><type2:NamedEntity xmi:id="22365" sofa="56913" begin="273" end="279" identifier="GO:0098657"/><type2:NamedEntity xmi:id="22371" sofa="56913" begin="488" end="497" identifier="CHEBI:14321"/>

galanisd · 2018-04-17T16:11:22Z

For some reason your app was not registered correctly; i.e. the wrapper for Galaxy was never saved.
I do not know why.

@courado @antleb ?

I re-registered your app.
https://test.openminted.eu/landingPage/application/b8fb9bbd-603c-4b53-b86d-15c6c753302d

and processed the thalamus corpus.

Output here:
https://test.openminted.eu/landingPage/corpus/ba172d04-96dc-4007-b9ae-020460691e19
and here:
12d3dce1-996b-4c2a-8324-74a951f2f7c4.zip

I hope that is not an illusion...

galanisd · 2018-04-17T16:11:59Z

@Aequivinius welcome to OpenMinTeD.

nguyennth · 2018-04-17T16:12:01Z

Hi,

Sorry for my late reply. As far as I understand it seems that you're using an available type system that was already uploaded to Maven central, i.e., the ner type system by dkpro. This means that you don't need to create a new type system. You only need to include the type system as a dependency in pom of the web service project. As @galanisd showed above, I believe it works now.

In the case that you need to create a new type system, please let me know, we can discuss details later.

Aequivinius · 2018-04-17T21:11:25Z

@galanisd Fascinating, this is precisely what we were after. Wonder if the re-registering did the trick? Anyway, this is what we wanted, so it seems all is well! Thanks for your help!

Should we now proceed to register the service on services.openminted.eu?

galanisd · 2018-04-17T21:17:34Z

Should we now proceed to register the service on services.openminted.eu?

Not yet. services.openminted.eu has not been updated for quite some time.
You will be notified.

Thanks!

Dimitris

gkirtzou · 2018-04-18T07:45:26Z

@Aequivinius I was taking a final look into your metadata (as the one registered here ) and I noticed that you had declared in your input that the annotation type is Name Entity (i.e. http://w3id.org/meta-share/omtd-share/NamedEntity). Semantically, that means that your input needs to be annotated at that level before using your application. Is that the case? If not, and your input is just a raw corpus, then I would suggest removing the annotation type in the inputContentResourceInfo section.

Also I would like to ask for statistical reasons, whether you performed the registration via the registration form or via xml?

Aequivinius · 2018-04-18T08:11:35Z

This is a mistake, I'll remove it from the XML and upload it correctly next time (the registration form doesn't let me delete the value for this specific field once set). I mostly used the web registration form, only occasionally tinkering with the XML.

…

On Wed, Apr 18, 2018 at 9:45 AM, Katerina Gkirtzou ***@***.*** > wrote: @Aequivinius <https://github.com/Aequivinius> I was taking a final look into your metadata (as the one registered here <https://test.openminted.eu/landingPage/application/b8fb9bbd-603c-4b53-b86d-15c6c753302d> ) and I noticed that you had declared in your input that the annotation type is Name Entity (i.e. http://w3id.org/meta-share/ omtd-share/NamedEntity). Semantically, that means that your input needs to be annotated at that level before using your application. Is that the case? If not, and your input is just a raw corpus, then I would suggest removing the annotation type in the inputContentResourceInfo section. Also I would like to ask for statistical reasons, you whether you performed the registration via the registration form or via xml? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#34 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AK6JaCHncRyxYLyDi5mPE2wS7DGphbsYks5tpu8WgaJpZM4TTjPa> .

gkirtzou · 2018-04-18T09:20:45Z

@Aequivinius I didn't know that the registration form didn't allow you to delete specific fields once set. I will report this bug to the responsible technical person. Thanks for sharing!

gkirtzou · 2018-04-18T09:35:57Z

Also, when you do the last changes in the OMTD-SHARE descriptor could you please uploaded here as well to have a final check? In case I missed anything :)

Aequivinius · 2018-04-18T15:50:24Z

@gkirtzou Here you go! 18-4-removed_input.xml.zip

gkirtzou · 2018-04-19T10:06:58Z

The metadata seems fine. I would only suggest two things

If you would want to register your application using the xml registration form, please just remove the metadataHeaderInfo section, as it will be autocompleted by the platform.
As your application is an OntoGene Entity Recognition, you could also add in the outputResourceInfo section, in the annotationType field the value
http://w3id.org/meta-share/omtd-share/BiologicalEnity. But is minor and only a recommendation.

Othewise, the metadata are correct and your application is also tested. It only rests the final registeration to the platform, when @greenwoodma informs you.

Aequivinius · 2018-04-20T10:33:29Z

@gkirtzou Thank you for you help! Find attached the most recent version of our share descriptor.
20-4.xml.zip

gkirtzou · 2018-04-23T07:59:19Z

@Aequivinius Perfect! I have no further comments/recommendations.

pennyl67 · 2018-04-26T18:37:55Z

@Aequivinius You can now proceed to the final uploading of your application at services.openminted.eu. If you encounter any problems, please let us know.
Thanks!

pennyl67 · 2018-04-26T19:59:16Z

@Aequivinius My mistake, please refrain from uploading at services.openminted.eu until further notice.

pennyl67 · 2018-05-09T17:42:04Z

@Aequivinius I have taken the liberty to upload your application at services.openminted.eu and tested it. It seems to work ok. The application is available at: https://services.openminted.eu/landingPage/application/71345d18-297f-4ac5-b4de-38ef3cacbe75 You can also test it yourself.
If everything is ok, let me know so that we close the issue.

Aequivinius · 2018-05-15T10:53:14Z

Perfect, thanks!

pennyl67 · 2018-05-15T15:06:08Z

@Aequivinius I have a question; in your proposal and the description of the application, you mention the Bio Term Hub, and I'm trying to understand the relation between the two. When you say that the OGER is built on top of the BTH, you mean that you use the terminologies from the reference databases? And this aggregation of terminologies is already in the docker image you have provided? Or should we expect another component/application?

Aequivinius · 2018-05-15T15:39:31Z

@pennyl67 No, there will be no further components or applications. BTH is an aggregator of terminologies and produces a unified terminology. The terminology created in this way can be used by OGER. However, the two components can also be used independently. The term list provided by BTH could be used for other purposes; and OGER can be provided with a term list obtained from other sources. We submitted OGER as a web service as an application to OMTD. This web service uses BTH to obtain up to date terminologies in the background. Furthermore, we also wanted to make BTH available to the public, so we created a Docker image that allows researchers can run it locally. Alternatively, they may use our own webservice at https://pub.cl.uzh.ch/projects/ontogene/biotermhub/. However, BTH uses a web interface in which desired resources are manually selected. Because of that, it was not suited to be integrated into the OMTD platform, which is why we provide a separate link for the research community where they can download a Dockerized version of BTH (https://github.com/OntoGene/BioTermHub_dockerized). Kind regards, - Nico Colic

…

On 15.5.2018 17:06, Penny Labropoulou wrote: @Aequivinius [1] I have a question; in your proposal and the description of the application, you mention the Bio Term Hub, and I'm trying to understand the relation between the two. When you say that the OGER is built on top of the BTH, you mean that you use the terminologies from the reference databases? And this aggregation of terminologies is already in the docker image you have provided? Or should we expect another component/application? -- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [2], or mute the thread [3]. * Links: ------ [1] https://github.com/Aequivinius [2] #34 (comment) [3] https://github.com/notifications/unsubscribe-auth/AK6JaLssZOakEfaVe8-VyAC8_awEWu2Wks5tyu7hgaJpZM4TTjPa

pennyl67 · 2018-05-15T15:53:57Z

Thanks for the explanations. It's clear now!

Given that your application is already uploaded and public in the platform, if you agree, I will close this issue.

galanisd assigned galanisd and unassigned galanisd Apr 15, 2018

gkirtzou self-assigned this Apr 17, 2018

galanisd closed this as completed Apr 17, 2018

greenwoodma reopened this Apr 17, 2018

greenwoodma added the Component Participant is providing component(s) label Apr 18, 2018

greenwoodma added Web Service Component/Application provided as web service Application Participant is providing application(s) and removed Component Participant is providing component(s) labels Apr 18, 2018

pennyl67 closed this as completed May 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OGER hackathon #34

OGER hackathon #34

Aequivinius commented Apr 13, 2018

greenwoodma commented Apr 13, 2018

greenwoodma commented Apr 16, 2018

Aequivinius commented Apr 16, 2018

galanisd commented Apr 16, 2018

Aequivinius commented Apr 16, 2018

galanisd commented Apr 16, 2018

gkirtzou commented Apr 17, 2018

Aequivinius commented Apr 17, 2018 •

edited

Loading

galanisd commented Apr 17, 2018 •

edited

Loading

galanisd commented Apr 17, 2018

Aequivinius commented Apr 17, 2018 via email

galanisd commented Apr 17, 2018

Aequivinius commented Apr 17, 2018

galanisd commented Apr 17, 2018

galanisd commented Apr 17, 2018

galanisd commented Apr 17, 2018

nguyennth commented Apr 17, 2018

Aequivinius commented Apr 17, 2018

galanisd commented Apr 17, 2018

gkirtzou commented Apr 18, 2018 •

edited

Loading

Aequivinius commented Apr 18, 2018 via email

gkirtzou commented Apr 18, 2018

gkirtzou commented Apr 18, 2018

Aequivinius commented Apr 18, 2018

gkirtzou commented Apr 19, 2018

Aequivinius commented Apr 20, 2018

gkirtzou commented Apr 23, 2018

pennyl67 commented Apr 26, 2018

pennyl67 commented Apr 26, 2018

pennyl67 commented May 9, 2018

Aequivinius commented May 15, 2018

pennyl67 commented May 15, 2018

Aequivinius commented May 15, 2018 via email

pennyl67 commented May 15, 2018

OGER hackathon #34

OGER hackathon #34

Comments

Aequivinius commented Apr 13, 2018

greenwoodma commented Apr 13, 2018

greenwoodma commented Apr 16, 2018

Aequivinius commented Apr 16, 2018

galanisd commented Apr 16, 2018

Aequivinius commented Apr 16, 2018

galanisd commented Apr 16, 2018

gkirtzou commented Apr 17, 2018

Aequivinius commented Apr 17, 2018 • edited Loading

galanisd commented Apr 17, 2018 • edited Loading

galanisd commented Apr 17, 2018

Aequivinius commented Apr 17, 2018 via email

galanisd commented Apr 17, 2018

Aequivinius commented Apr 17, 2018

galanisd commented Apr 17, 2018

galanisd commented Apr 17, 2018

galanisd commented Apr 17, 2018

nguyennth commented Apr 17, 2018

Aequivinius commented Apr 17, 2018

galanisd commented Apr 17, 2018

gkirtzou commented Apr 18, 2018 • edited Loading

Aequivinius commented Apr 18, 2018 via email

gkirtzou commented Apr 18, 2018

gkirtzou commented Apr 18, 2018

Aequivinius commented Apr 18, 2018

gkirtzou commented Apr 19, 2018

Aequivinius commented Apr 20, 2018

gkirtzou commented Apr 23, 2018

pennyl67 commented Apr 26, 2018

pennyl67 commented Apr 26, 2018

pennyl67 commented May 9, 2018

Aequivinius commented May 15, 2018

pennyl67 commented May 15, 2018

Aequivinius commented May 15, 2018 via email

pennyl67 commented May 15, 2018

Aequivinius commented Apr 17, 2018 •

edited

Loading

galanisd commented Apr 17, 2018 •

edited

Loading

gkirtzou commented Apr 18, 2018 •

edited

Loading