Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OGER hackathon #34

Closed
Aequivinius opened this issue Apr 13, 2018 · 34 comments
Closed

OGER hackathon #34

Aequivinius opened this issue Apr 13, 2018 · 34 comments
Assignees
Labels
Application Participant is providing application(s) Web Service Component/Application provided as web service

Comments

@Aequivinius
Copy link

Dear organisers

We're preparing our submission of OGER, a dictionary-based entity recogniser, as a webservice for openminted. We're currently in the process of fixing a few remaining issues that relate to how we parse the XMI that we receive from openminted. As it currently stands, it looks like the payload of the requests includes some non-XML preface, which we need to cut in order to parse the document to be annotated. Would you have a sample of how OMTD constructs the requests payload?

As for the hackathon, would it be possible to find a time on Tuesday afternoon? Most people from our group can make it then. Apart from that, Thursday or Friday would suit us, too.

Thanks for your help & kind regards,

  • Nico
@greenwoodma
Copy link
Member

Is the non-XML preface in the XMI file a Unicode BOM (Byte Order Marker)? In theory the files should be UTF-8 which I don't believe requires a BOM, but I know we've had a problem in GATE before (outside of OpenMinTeD) where XML files from odd sources had a BOM prefix.

If it helps then the code we use in GATE to ensure we always discard the BOM can be found at https://github.com/GateNLP/gate-core/blob/master/src/main/java/gate/util/BomStrippingInputStreamReader.java

@galanisd galanisd assigned galanisd and unassigned galanisd Apr 15, 2018
@greenwoodma
Copy link
Member

To make sure we are as well prepared as possible to help during the hackathon sessions could you please add/attach to this issue:

  1. The landing page URL of any component/workflow you have registered
  2. The OMTD-SHARE XML file for each component/workflow
  3. One or two sample documents that you expect to produce sensible output for your component/workflow

@Aequivinius
Copy link
Author

Dear @greenwoodma:

  1. This is the URL of OGER on OMTD: https://test.openminted.eu/landingPage/application/d71caa63-9444-4bee-8161-52e0462c7eb0
  2. Attached the share XML
    oger.xml.zip
  3. We've tested the service using the OpenMinTeD subset of OpenAIRE publications on term "Thalamus" (https://test.openminted.eu/landingPage/corpus/ac016a8f-ebb8-4b92-b808-3b11491c4199)

@galanisd
Copy link
Member

For some reason the code that is generating Galaxy XML wrappers didn't work as expected. The typesystem you provided was not copied. I do not know why...
@nguyennth and I have registered Manchester's
web service many times without problems.

So, I deleted your record and re-registered it.
Here it the new landing page. https://test.openminted.eu/landingPage/application/OGERWS
Wrapper was generated correctly.

Then used the registered app to process the thalamus corpus.

Finished .... :-) :-) :-)

screenshot from 2018-04-16 19 41 55

Output is here
https://test.openminted.eu/landingPage/corpus/7691bf1a-283d-43bc-9653-26f482476264
and here
6ef31b96-675d-4078-88fa-ddecd7ad1a77.zip

Please check it. I do not see any NER annotations.
What we should expect?
Probably it has to do with the typesystem you provided
mvn:de.tudarmstadt.ukp.dkpro.core.api.ner-asl:1.9.0

Maybe we need some help by University Of Manchester that developed the web service
spec. for OMTD @nguyennth or @reckart that knows everything about DKPro.

The typesystem is required from the web service client to serialize the results. If it is not there
the respective annotations will not included in the output.

@Aequivinius
Copy link
Author

Yeah, this is the issue we're currently investigating, and which we were hoping to discuss during the Hackathon.

OGER sends NER annotations, but OMTD doesn't seem to care for them when it re-parses our results. I'm actually a bit at a loss as for what sort of typesystem we should provide and how so. We have this file ready on our server (typesystem.xml.zip), which I would've expected to provide the necessary information. However, OMTD never sends a request for this file.

If you have any more information on what sort of typesystem file precisely we need to add where, that would be greatly appreciated.

@galanisd
Copy link
Member

Please see this one as an example.
https://mvnrepository.com/artifact/uk.ac.nactem.uima/NeuroscienceTypeSystem/0.2
You can download the jar see the its structure and contents.
@nguyennth can provide some more info I think.

@gkirtzou
Copy link

@Aequivinius There is a minor semantical error in your metadata. Your component takes as input a whole corpus of documents, not a single document, and generated annotations for the corpus, thus an annotated corpus. Correct? If that's the case, please change the processingResourceType from document to corpus in both inputContentResourceInfo and outputResourceInfo, in the final version of your metadata.

@Aequivinius
Copy link
Author

Aequivinius commented Apr 17, 2018

@gkirtzou Done

@galanisd | @nguyennth I have a few questions:

  • What is the proper way to register the typesystem in the share-omtd.xml? Currently, we're doing this:

<ns0:typesystem> <ns0:resourceNames> <ns0:resourceName lang="en">DKPro Core</ns0:resourceName> </ns0:resourceNames> <ns0:resourceIdentifiers> <ns0:resourceIdentifier resourceIdentifierSchemeName="maven">mvn:de.tudarmstadt.ukp.dkpro.core.api.ner-asl:1.9.0</ns0:resourceIdentifier> </ns0:resourceIdentifiers> </ns0:typesystem>

  • Is there a good written documentation of the format of those typesystem files? If not, what specifically do we have to add to allow for tags such as the following to be included in our annotations:

<neType:NamedEntity begin="262056" xmi:id="9759" end="262078" sofa="2" identifier="D000070642"/>

  • When does OMTD send requests for the typesystem file that we have on our server?

@gkirtzou gkirtzou self-assigned this Apr 17, 2018
@galanisd
Copy link
Member

galanisd commented Apr 17, 2018

The NeuroScience maven artifact was registered as follows:
<ns0:resourceIdentifiers> <ns0:resourceIdentifier resourceIdentifierSchemeName="maven">mvn:uk.ac.nactem.uima:NeuroscienceTypeSystem:0.2</ns0:resourceIdentifier> </ns0:resourceIdentifiers>

It seems identical to yours. The web service executor that I created downloads this artifact and adds it
to its classpath...For contents and structure you should ask @nguyennth .

@galanisd
Copy link
Member

Does anyone know why this
https://test.openminted.eu/landingPage/application/OGERWS
has disappeared?

It was deleted by someone?
There is a new landing page?

@Aequivinius
Copy link
Author

Aequivinius commented Apr 17, 2018 via email

@galanisd
Copy link
Member

I am sure that I didn't delete it
@antleb Any ideas?

@Aequivinius
Copy link
Author

I've tried now these maven coordinates in the omtd-share.xml, which seem correct:

mvn:de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.api.ner-asl:1.9.1

This should point to this repository, if I'm not mistaken, which includes the necessary info.

However, our namedEntity annotations are still missing from OMTD.

@galanisd
Copy link
Member

You are expecting things like this?
<type2:NamedEntity xmi:id="22347" sofa="56913" begin="219" end="225" identifier="A4FV52"/><type2:NamedEntity xmi:id="22353" sofa="56913" begin="230" end="236" identifier="A6QLI1"/><type2:NamedEntity xmi:id="22359" sofa="56913" begin="263" end="272" identifier="CHEBI:14321"/><type2:NamedEntity xmi:id="22365" sofa="56913" begin="273" end="279" identifier="GO:0098657"/><type2:NamedEntity xmi:id="22371" sofa="56913" begin="488" end="497" identifier="CHEBI:14321"/>

@galanisd
Copy link
Member

  • For some reason your app was not registered correctly; i.e. the wrapper for Galaxy was never saved.
    I do not know why.

@courado @antleb ?

I re-registered your app.
https://test.openminted.eu/landingPage/application/b8fb9bbd-603c-4b53-b86d-15c6c753302d

and processed the thalamus corpus.
screenshot from 2018-04-17 19 05 31

Output here:
https://test.openminted.eu/landingPage/corpus/ba172d04-96dc-4007-b9ae-020460691e19
and here:
12d3dce1-996b-4c2a-8324-74a951f2f7c4.zip

I hope that is not an illusion...
screenshot from 2018-04-17 19 08 36

@galanisd
Copy link
Member

@Aequivinius welcome to OpenMinTeD.

@nguyennth
Copy link
Collaborator

Hi,

Sorry for my late reply. As far as I understand it seems that you're using an available type system that was already uploaded to Maven central, i.e., the ner type system by dkpro. This means that you don't need to create a new type system. You only need to include the type system as a dependency in pom of the web service project. As @galanisd showed above, I believe it works now.

In the case that you need to create a new type system, please let me know, we can discuss details later.

@greenwoodma greenwoodma reopened this Apr 17, 2018
@Aequivinius
Copy link
Author

@galanisd Fascinating, this is precisely what we were after. Wonder if the re-registering did the trick? Anyway, this is what we wanted, so it seems all is well! Thanks for your help!

Should we now proceed to register the service on services.openminted.eu?

@galanisd
Copy link
Member

Should we now proceed to register the service on services.openminted.eu?

Not yet. services.openminted.eu has not been updated for quite some time.
You will be notified.

Thanks!

Dimitris

@gkirtzou
Copy link

gkirtzou commented Apr 18, 2018

@Aequivinius I was taking a final look into your metadata (as the one registered here ) and I noticed that you had declared in your input that the annotation type is Name Entity (i.e. http://w3id.org/meta-share/omtd-share/NamedEntity). Semantically, that means that your input needs to be annotated at that level before using your application. Is that the case? If not, and your input is just a raw corpus, then I would suggest removing the annotation type in the inputContentResourceInfo section.

Also I would like to ask for statistical reasons, whether you performed the registration via the registration form or via xml?

@Aequivinius
Copy link
Author

Aequivinius commented Apr 18, 2018 via email

@greenwoodma greenwoodma added the Component Participant is providing component(s) label Apr 18, 2018
@gkirtzou
Copy link

@Aequivinius I didn't know that the registration form didn't allow you to delete specific fields once set. I will report this bug to the responsible technical person. Thanks for sharing!

@gkirtzou
Copy link

Also, when you do the last changes in the OMTD-SHARE descriptor could you please uploaded here as well to have a final check? In case I missed anything :)

@greenwoodma greenwoodma added Web Service Component/Application provided as web service Application Participant is providing application(s) and removed Component Participant is providing component(s) labels Apr 18, 2018
@Aequivinius
Copy link
Author

@gkirtzou Here you go! 18-4-removed_input.xml.zip

@gkirtzou
Copy link

The metadata seems fine. I would only suggest two things

  1. If you would want to register your application using the xml registration form, please just remove the metadataHeaderInfo section, as it will be autocompleted by the platform.
  2. As your application is an OntoGene Entity Recognition, you could also add in the outputResourceInfo section, in the annotationType field the value
    http://w3id.org/meta-share/omtd-share/BiologicalEnity. But is minor and only a recommendation.

Othewise, the metadata are correct and your application is also tested. It only rests the final registeration to the platform, when @greenwoodma informs you.

@Aequivinius
Copy link
Author

@gkirtzou Thank you for you help! Find attached the most recent version of our share descriptor.
20-4.xml.zip

@gkirtzou
Copy link

@Aequivinius Perfect! I have no further comments/recommendations.

@pennyl67
Copy link
Collaborator

@Aequivinius You can now proceed to the final uploading of your application at services.openminted.eu. If you encounter any problems, please let us know.
Thanks!

@pennyl67
Copy link
Collaborator

@Aequivinius My mistake, please refrain from uploading at services.openminted.eu until further notice.

@pennyl67
Copy link
Collaborator

pennyl67 commented May 9, 2018

@Aequivinius I have taken the liberty to upload your application at services.openminted.eu and tested it. It seems to work ok. The application is available at: https://services.openminted.eu/landingPage/application/71345d18-297f-4ac5-b4de-38ef3cacbe75 You can also test it yourself.
If everything is ok, let me know so that we close the issue.

@Aequivinius
Copy link
Author

Perfect, thanks!

@pennyl67
Copy link
Collaborator

@Aequivinius I have a question; in your proposal and the description of the application, you mention the Bio Term Hub, and I'm trying to understand the relation between the two. When you say that the OGER is built on top of the BTH, you mean that you use the terminologies from the reference databases? And this aggregation of terminologies is already in the docker image you have provided? Or should we expect another component/application?

@Aequivinius
Copy link
Author

Aequivinius commented May 15, 2018 via email

@pennyl67
Copy link
Collaborator

Thanks for the explanations. It's clear now!

Given that your application is already uploaded and public in the platform, if you agree, I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Application Participant is providing application(s) Web Service Component/Application provided as web service
Projects
None yet
Development

No branches or pull requests

6 participants