-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UPFMT Hackathon #33
Comments
Could you possibly attach your exiisting OMTD-SHARE XML descriptor to this issue, along with a description of the parameters you are trying to include so we can have a look at this before the online session. Thanks. |
Hi, I tried again to register our docker component (xml attached). We have only basic parameters:
For example, this works on my local pc: docker run -v E:_d\in:/input -v E:_d\out:/output upfmt:latest --input=/input --output=/output --param:language=en The docker image is here: https://hub.docker.com/r/dumitrescustefan/upfmt/ |
The component is here : https://test.openminted.eu/landingPage/component/5f796253-c00d-432a-9c3a-d1b4d586ed50 ( I already tried registering before, so now there is a UPFMT and a UPFMT2, same component, different XML shares to see if i did something wrong). Could you point me to :
Thank you! |
Hi Stefan |
Here it is. I changed the extension to .txt otherwise attaching says that it can't handle this type of document (?!?). |
@dumitrescustefan
Technical issues (if any) will be discussed in the hackathon session. |
Definitely. The metadata now is only targeted to get things working; for the final version we will fill everything in fully, including parameter comments, citation, etc. Thanks! |
some remarks concerning your metadata
<ns0:parameterInfo>
<ns0:parameterName>input</ns0:parameterName>
<ns0:parameterLabel>input folder containing xmi and/or txt files</ns0:parameterLabel>
<ns0:parameterDescription>input folder containing xmi and/or txt files</ns0:parameterDescription>
<ns0:parameterType>string</ns0:parameterType>
<ns0:optional>false</ns0:optional>
<ns0:multiValue>false</ns0:multiValue>
<ns0:defaultValue>/input</ns0:defaultValue>
</ns0:parameterInfo>
<ns0:parameterInfo>
<ns0:parameterName>output</ns0:parameterName>
<ns0:parameterLabel>output folder path where xmi and conllu files will be written</ns0:parameterLabel>
<ns0:parameterDescription>output folder path where xmi and conllu files will be written</ns0:parameterDescription>
<ns0:parameterType>string</ns0:parameterType>
<ns0:optional>false</ns0:optional>
<ns0:multiValue>false</ns0:multiValue>
<ns0:defaultValue>/output</ns0:defaultValue>
</ns0:parameterInfo> thanks |
Hi, I made the changes you suggested above and re-registered as UPFMT3 : https://test.openminted.eu/landingPage/component/5f796253-c00d-432a-9c3a-01b4d586ed50 Could you tell me how to test it? Do i need to create an application? Thank you, |
@greenwoodma @galanisd I have tried UPFMT3 in a workflow (omtdImport -> pdfReader -> UPFMT3). I have run the workflow with a corpus (pdf) but it ends up with an error "System error getting execution status (Server responded: undefined)". Could we please have the logs to know what is wrong ? the workflow is private https://test.openminted.eu/landingPage/application/0ca1e01c-b5c7-4cc6-a625-1f0f9ad117b6 |
It is possible that the pdfReader was not configured appropriately. |
Also I had a look into our workflow engine. It seems that UPFMT3 wrapper which is generated from your OMTD-SHARE record uses "upfmt:latest" as a command for calling your component. Is this a valid command? |
Hi, I re-registered the component as UPFMT4 (this time it is private so we can edit it), and put in the command just "upfmt". In our local tests it works with both with and without the :latest tag. UPFMT4 now has just:
Just to be sure I specified, our component looks for all .xmi and/or .txt files in the input and dumps processed .xmi files in the output folder (as well as other files, for example .conllu-format, to easily check the output). Thank you very much! |
Please send me the landing page... |
@galanisd what you mean by "the pdfReader was not configured appropriately" ? are there any specific things to consider when using the uima pdfreader in a workflow ? |
@mandiayba I got caught out by this earlier. It seems that by default the PdfReader doesn't find any documents and so produces no output. This is because it's driven by a patterns param which defaults to blank. The easiest option is to set it to |
Exactly!
Exactly!
Default values in the Galaxy XML wrappers come from default values in the OMTD-SHARE record. PdfReader is actually something like a built-in component in our platform; so, yes we can probably manually edit the wrapper and set **/*.pdf as default value for patterns parameter. The other solution is to have some help & instructions for building workflows where it should be mentioned. |
That made me laugh so much! |
The best thing would be to have the So editing the OMTD-SHARE descriptor before completing the registration process seems a sensible solution for the time being. |
considering that component UPFMT4 takes xmi files as input, could we find another way to run it on the registry ? Could we use xmi files from @dumitrescustefan and define a executable workflow, for example omtdImporter -> UPFMT4 ? |
Yes you can. |
@dumitrescustefan could you please attach a sample of input files ? I will try with them |
The component also looks for .txt files (the .xmi input just extracts raw text from the xmi and creates a temporary txt, so it is the same as having txts directly). so if you already have a PDF->txt converter or something similar, might be easier to test. Also, here is a sample .xmi file. |
@galanisd I have tried the UPFMT 4 in a workflow (omtdImporter -> UPFMT4) with the corpus sent by @dumitrescustefan in the previous comment but it does run. I got the error "There was a problem running the application. Try again in a while. (corpus with ID '23f1d29d-919e-4847-b61d-61aea8967094' is empty)". Could you please check what is wrong with the corpus ? |
@mandiayba did you just upload the zip file when creating the corpus? If so then that's the problem. The input documents need to be in a subfolder called |
Also, did you register multiple times your component in the registry today? Because I see multiple galaxy wrapper records for your component with today's date. The galaxy wrapper records are generated by the omtd platform when you register a component in order galaxy workflow engine will be able to call your component. |
@gkirtzou Here is the zip with the latest XML: I also made the component public so you could test it. Also, yes, I pressed the button a few times. I did this because nothing happened for ~ 15 seconds the first time I clicked, so I tried again. A couple of times :) Then I saw a bunch of entries in the components list and I cleaned everything by deleting all duplicates. I had no visual feedback that anything was happening after pressing the button, and I became trigger-happy. |
Thanks for the metadata, I will check them. Aaaah, I see.. Yes sometimes the response is a little bit slow. |
@dumitrescustefan I am happy to announce that we have successfully run your component to the OMTD platform!!! In the attachements you would find the initial corpus with 2 pdf and the generated output. Could you verify that it is meaningful? |
@gkirtzou Yes, that's the output we should have 👍 I have left the temporary .conllu files as a debug in case something fails like the out-of-ram issue before, but with the final publication I will remove them. Thanks alot for the help! |
That's great news!! Than mean that we were able to successfully test your component!!! So we are done! The only thing that is left is to upload your component to the services, but we will let you know when to do that. |
No, we will leave the final .connlu and .xmi files untouched (so users get bot txt and xml-type outputs). What I wanted to say was that I will remove the intermediary conllu file that precedes the parsing process: the file is always named temporary.conllu and exists only in the docker - I copy it out in the /output folder just to see that everything is ok up to that step. Finally, I am unsure whether to ask in this thread or open a new issue: for the adapt courses should we use the test.openminted platform or wait for the non-test version? And a second question, for you, would be: for the testing process did you create an application? Or how did you perform the testing, as in the tutorial we should show how to run the component on a corpus. Thanks! |
Ahh, I see. Sorry I missunderstood what you send previously.
You will register your components to a non-test version of the platform. As soon as we are ready to processed, we will let you know.
I created a private app via the workflow editor, that contains the following components in that order :
|
Dear @dumitrescustefan you can now proceed to the uploading of your component at https://services.openminted.eu/home Just, some final suggestions, not obligatory but recommended, for the metadata record are
Please, when you upload your component, create the appropriate workflow so that someone could run your component using the workflow editor. For more info see https://openminted.github.io/releases/workflow-editor/ |
@gkirtzou Thanks! I added the languages and updated the label for the language parameter in a new (private) component. Please tell me under what section is the resourceCreator so I can add it as well. As soon as I validate the component on the test server i'll upload the xml to the services. |
@dumitrescustefan when you edit the metadata of a registered component, there is the option "Add Resource Creation". You would find it under the Identification section. |
@gkirtzou I edited the component, and it's public on : |
@dumitrescustefan Thanks for uploading the component. Could you please create a public application as well, so that non expert user could use it? Note that when you create an application, you wiil be asked to fill in a metadata record. Some tips for filling it in - so that they are discoverable by the users but also that users can cite you and your resource.
If you encounter any problems, please let us know. |
@gkirtzou I am first trying to create an app on the test. server. I edited the metadata with all the above pointers, landing page is: I tried to run it a couple of times, but says : "running" for some time (though for 2 pdfs it should finish in ~1 minute). Is that normal behaviour? Also, I tried editing the workflow, and the save button seems not to work (any changes I make are discarded). I think I need to check the output icon of the last component to make the dataset not hidded (which is the default), but I can't seem to save the changes. |
I cannot see it since it is private. Could you please send me here the xml with the metadata from your app, so I could check them?
I check the workflow engine and I found three successful run from your workflow. Did you get the final output in the UI? Each experiment took ~10 minutes.
You mean that you made changes in the workflow editor and that changes were not saved, when you reopen the app with workflow editor?
No you don't need to do this. In fact I think that it should not be available as an option. Right @greenwoodma ? |
Here is the xml zipped: Regarding the workflow, I have the omtdImporter linked to the PdfReader then to the UPFMT component (the last version, updated one), exactly as you specified. However, on the UI, I get three "Running" tasks. Lastly, regarding the workflow editor, when pressing Save it does not save the components' x y positions on the flow (i know it is just cosmetic but it's a hint that re-saving does not work), and also the editor does not allow me to view edit any component, like changing the pattern for the pdf reader, etc. I tried with both firefox and chrome (latest versions) in case it was a browser problem, but they both have the same behavior. Anyway, I brought this up as I believed that the "output" check is what kept the app to not complete.. |
@dumitrescustefan yes, the inability to change parameter values on a workflow that you have previously created is a known bug which we are looking into (it's a bug in Galaxy which they are investigating). Currently the only option is to remove the component you want to edit and then re-add it. Sorry about that, I'm aware just how annoying that specific bug is. The hidden status of the dataset should have no impact on the workflow, as all that does is hide the output in the galaxy UI which you are not using to access the results. The OpenMinTeD platform retrieves the results via the Galaxy API for you and this is not affected by the hidden status. |
@dumitrescustefan I checked the application metadata and I have the following comments/suggestion
One question: is your application in service or in test? because I thought you were playing in test first, by in the metadata you are using a link to the UPFMT component registered in service. |
Okay, thank you. I would like to try to add the app in the services. server, but for some reason I can't find the UPFMT component in the workflow editor. |
Ok no problem. Could you send me the metadata, just to be sure?
Yes, I will check and see what going on and let you know. |
Here is the zip with the app's metadata: |
I forgot to pretty-print it, here it is again: |
A few comments, minor corrections:
I will let you know,when I figure out what's going on with your component. |
Hi, for the personIdentifier I can choose between: ORCID, INSI, ResearcherID, ScopusID and other. There's no URL, I would have chosen that. Are any of these better than "other"? |
@dumitrescustefan You are right, there is not URL in person identifier scheme name. I got confused with the generic one that we have in the metadata schema, I am sorry for that. If you have an ORCID that would be nice to add. If you want to use the url page to your linkedin account, then the "other" value is more appropriate. |
@dumitrescustefan We figure out what went wrong and we are trying to fix it. I will let you know as soon as we are good to go. Sorry about the trouble. |
No problems, I'm standing by. Thank you very much! |
@dumitrescustefan We finally resolved the problem we had with the component's registration. I took the liberty and created an application to make your component available to the non-tdm users of OMTD platform. You can find the application here : https://services.openminted.eu/landingPage/application/6aea5b89-e857-4c47-b111-81c441e7a741 The app runs correctly. Since everything works perfectly and nothing else remains open, I am closing the issue. Cheers! |
Hi, we need help in testing our docker component.
So far we successfully registered it on test.openminted.eu, but we are unable to test it.
The component takes as input a folder where it searches first for xmi files and extracts raw text from it (in .txt format), and also searches for .txt files. All the .txt files get processed (segmented, tokenized, lemmatized, tagged and parsed) and in the output folder we create .conllu and .xmi formats.
We need help:
Thank you,
Stefan (Ineosoft)
The text was updated successfully, but these errors were encountered: