-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error upon submitting correction annotations #179
Comments
The log above is of the web service, attached please is the foliadocserver's log... |
Keep getting this error, related to corrections, after test-annotating for a while on different files. Attached please find the last hour's activity log. Can it perhaps be also related to insufficient memory? The service runs on Debian 10 and is currently pretty thin: |
It looks like a corruption occurred for your first file (an invalid reference somewhere), that's a definitely a bug in the system because that should not happen of course. I thought I had an auto-correction mechanism in place for that already, but I may be wrong, since clearly it fails to load now. Could you send the two FoLiA documents so I can pinpoint why exactly it might have gone wrong?
Nah, that sounds like enough memory. |
Thanks a lot, attached are the files. test_FA-MBK-4-3_035245008_0030_abpproc_pars_ucto.folia.xml.txt |
There was a bug in this mechanism, so that's probably what caused part of the problem (fixed and released already in foliapy v2.5.7). I can't really pinpoint the problem on the 0030 document yet, will investigate further. |
Thanks, so how should I update FLAT so that I can load these annotated documents again? I run By this I got folia-2.5.5 |
There was a doc that I test-annotated (pls see attached), which I cannot import correctly, although foliavalidator says it is fine. FLAT says: Uploaded file is no valid FoLiA Document: FoLiA exception in handling of @ line 93 (in parent @ parent line 92) : [InvalidReference] FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70Traceback (most recent call last): -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 7574, in getitem -- return self.index[key] -- KeyError: 'FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70' -- -- During handling of the above exception, another exception occurred: -- -- Traceback (most recent call last): -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 6322, in parsexml -- return doc[id] -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 7583, in getitem -- raise KeyError("No such key: " + key) -- FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt |
On my PC, both folialint and foliavalidator reject this file:
|
Looking closer, the problem is in this fragment: <entities>
<entity xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.entity.1" class="ff:italic" processor="proc.pirolen.039f428d" datetime="2021-09-09T17:
01:08">
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70" t="-;"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.3" t="Wodeham"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.4" t="("/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.5" t="?"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.6" t=")"/>
</entity>
<entity xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.entity.2" class="lem:Auth" processor="proc.pirolen.039f428d" datetime="2021-09-09T17:0
1:58">
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70" t="-;"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.3" t="Wodeham"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.4" t="("/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.5" t="?"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.6" t=")"/>
<feat subset="normaliz_intern" class="Adam;"/>
</entity>
</entities> the I have no clue why or how. Seems very strange and is surely dead wrong. |
Thanks a lot for troubleshooting! |
Direct edits are the source of a lot of evil. As you discovered. |
I was using FLAT. |
It might be this: in the entity annotation, the string "-;" is a single unit.
|
I suspect this can come from the order of steps I made:
As said, I was experimenting with an annotation approach and tagset and FLAT. |
Yes, the core of the error was an invalid reference that was indeed created like you describe. i thought I had a simple workaround in place in the latest version that would allow loading your document, by simply dropping the invalid reference completely. You should be on foliapy v2.5.7 for this to work correctly, you might have a slightly older version still? |
Thanks, I see! So when I update FLAT, I don't actually see a foliapy version being referred to, only folia (please see above): By this I get folia-2.5.5 |
Just a remark: Should't the correction have an "original" node containing the "-;" thus keeping the reference alive? |
@pirolen: do
If it's done in correction mode, yes, but not in direct edit mode. |
Yes, that's foliapy, you need 2.5.7 of that one |
$pip show folia I completely stopped and restarted the webserver and foliadocserver, but still could not load this very document. What I did then: I deleted the (several) annotations about wich FLAT complained, and then it worked to load the document. |
Ok, didn't know that (not a FLAT user). But this implies that 'direct edit mode' is highly dangerous. |
Btw, you might have seen that I used ucto set to French and German, on these Latin texts... not sure what would be a better option. |
Best would be to create a separated tokconfig-lat fil, su you could use -Llat. Any input would be welcome. |
Also the existing Italian files would be useful. I am happy to assist, e.g. compiling an abbreviations list. |
Well, ANY list of Latin abbreviations would be welcome, I suppose. As long as it doesn't contains entries that wouldn't be an abbreviation in "normal" Latin context. |
I will let you know when I'll have a usable abbreviation list -- the data provider said that the abbreviations used in the texts at hand are almost random (as the printed book required). |
Hi, I encountered similar issues (again on Latin), when using an ucto-ed file, in a containerized FLAT. Similar to my previous experiences (with a non-container FLAT back then):
Could you please tell where the logfile is located in the container? I am going to attach it, together with before/after files and screenshots. Many thanks! Disclaimer: I cannot exclude the possibility of unintentionally having used the GUI in a way that is not valid... |
I am trying to trace back how everything was set up. (Apologies for the overhead!) I have this server since end of last year. I guess I need to set all this straight for the new container!
|
PS. The data has always been in the container. I used to access files from there. I could access them with e.g. |
And what's inside that
That could have also just been the mountpoint with the actual files outside the container. E.g. ownership is convoluted, and the flat dir belongs to root :-(
Actually yes, it looks okay. If the data is there then it doesn't look so bad. I'll explain:
That's probably because the container is not run rootless (our containers do support that btw) and so root inside the container maps to root outside the container. It always takes a bit of effort to get these things nicely in sync (and with rootless containers it's even more of an effort, but best for security). Yes, the user under which foliadocserve runs inside the containers happens to map to 'systemd-timesync' outside of the container on your system (same UID). If you start your container with the same volume mounts as the old one, doesn't your data show up? That's how it was designed. |
I will test. I also suspect that things got mixed up when I was attempting to configure the container. E.g. not only I pulled the image, but also cloned the repo etc. I also see a Dockerfile in the directory.... (Please advise: is it the best to create an empty dir, cd to it, and start the container in there? Or is this nonsense? I think I also used foliautils by cloning... have to check. And: was one supposed to create the Can you perhaps also explain why there is no need to build the container, i.e. one can start it directly after pulling the image? |
You don't have to be in any particular location to start the container (assuming the volume mounts all use absolute paths) and you indeed don't need to clone the git repo at all to work with the containers.
The container image was built already on my end, and uploaded to Docker Hub. You pull in the container image and can run it directly without needing anything else. It contains all the binaries (rather than the sources which you don't need in that context). When you built containers before for some of our tools, that was needed because you were working with development versions of things that weren't released yet (so I hadn't built and published a container for it). Only in that situation did you need the git source repo.
I personally use podman instead of docker, as it's better suited for rootless containers (and is otherwise almost entirely compatible with docker and its containers). It may be that Docker is also better at it nowadays. It's probably overkill to switch at this point if the current docker solution works for you. Perhaps talk to your system administrator at some point about it. |
For the record: Proceeding now further with starting the container. |
Ah okay, that's not what I expected then, so the files must have indeed lived inside the container somehow. |
Does this look as expected, permission/ownershipwise? pirol@badwqsv-dev:~$ ls -la /home/pirol/flat/data/flat.docroot/ |
Yes, I think I use UID 100 and GID 100 in the container. That probably matches what you're seeing. |
By the way, you can configure the UID and GID used inside the container by setting environment variables |
Uh... I am not sure about the implications. So far it would be great to make sure that (as you write in the installation manual) I need to make sure I (belonging to the docker group) have sufficient writing permissions. So again, I manually create the |
Got an internal server error upon docker run :-( Sorry! The logs: |
Can you pass --env FLAT_DEBUG=1 to the container, that should provide some more output on what causes the HTTP 500.
You can create |
Thanks, passing FLAT_DEBUG looks informative, I paste the logs further below, and also what was printed to stderr. NB. I haven't yet set the reverse proxy. Stderr:
The logs:
|
Please ignore the above, I will make a clearn relaunch |
I guess it looks similar now:
The logs:
|
In the FLAT Manual it says:
I haven't yet created any yaml config, neither did I edit settings.py so far. Maybe this contributes to the problems. I thought that there would be a fallback on some default settings.py in case no yaml config is provided. Line 94 in 3b14b81
|
But maybe I am mixing up things. In any case, the flat_settings.py file is in place inside the container. |
Got FLAT running, now it's only the configuration invalid ;-) |
I am using the default full.yml file now for settings, but get an 2023/02/23 19:29:49 [notice] 17#17: using the "epoll" event method |
Apologies for the troubles. Furthermore, I am not sure if this is informative, but despite being admin, and having also granted myself explicitly all rights via the django admininstration GUI, I see this in the logs: [pid: 49|app: 0|req: 6/8] 129.187.243.56 () {46 vars in 985 bytes} [Thu Feb 23 19:59:44 2023] GET /index/pirolen => generated 8285 bytes in 15 msecs (HTTP/1.1 200) 4 headers in 263 bytes (1 switches on core 1) |
Sorry for the delay in catching up!
Try something like
Yes, definitely, when you don't provide a yaml config, it will just use the built-in 'full' config, which may be fine for your purposes now as we already ascertained there was little difference between your config and the default. So indeed you don't have the provide a settings.py or yaml config as long as you set some of the configuration environment variables. Perhaps it's easiest for now to try without custom yaml config and without custom settings.py, you can always add that later if you want to fine-tune the configuration. Just leave out the For reference, this is how I run it in my tests:
|
OK, so also Thanks a lot, it is working now, if I start it and add the domain name via --env FLAT_DOMAIN. So shall I try to specify now my own configs nonetheless? So if I want to provide my own yaml, can Ojust stop the container, add the config yaml, and restart the container? Or should I stop FLAT and start a new container, specifying the settings via --env? |
Do these parameters make sense for starting the container: |
PS. the documents are now showing up at the mounted location as expected! |
Yes
Just stop and start the container yes (technically it's a new container each time), each will have a unique ID, but based off the same container image (the Working incrementally is a good idea, then you can detect where things go wrong and roll back easily. If you manage to reproduce the original issue of tokens getting deleted (no rush), that'd be great.
That would autorestart the container if it fails. At this stage I don't think you need that. If it really fails it's more likely to fail continuously anyway. You can use |
And also to add I did both and now got the 'Invalid configuration' error message, upon wanting to upload a document. The yml file I put under settings is the same as provided by you, https://raw.githubusercontent.com/proycon/flat/master/flat.d/full.yml, although I renamed it. |
Yes, indeed
Ah, that's the the issue then, pass |
Works like a charm now, thanks!! |
I got errors on two files upon submitting correction annotations, and those files would not open anymore, there is nginx gateway timeout signalled.
I am attaching the docserver logs here too.
foliaserverlog.txt
The text was updated successfully, but these errors were encountered: