Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem separating definitions from examples in WNDB #777

Closed
ekaf opened this issue Nov 14, 2021 · 1 comment
Closed

Problem separating definitions from examples in WNDB #777

ekaf opened this issue Nov 14, 2021 · 1 comment
Labels
release format This issue refers to the WNDB or RDF export, so no changes will be made to this repository

Comments

@ekaf
Copy link
Contributor

ekaf commented Nov 14, 2021

Release format
WNDB

Describe the bug
In WNDB 2021, the examples are not quoted, so it is no longer possible to distinguish them from semicolons inside the definition.

To Reproduce

For ex., consider data.adj in OEWN 2021:
00004137 00 s 01 moribund 0 001 & 00003913 a 0000 | being on the point of death; breathing your last; a moribund patient
"breathing your last" appears like an example although it is a part of the definition.

Expected behavior

Until EWN2020, the examples could be extracted, because they were surrounded by quotes. For ex., with PWN 3.1:
00004170 00 s 01 moribund 0 001 & 00003938 a 0000 | being on the point of death; breathing your last; "a moribund patient"

Additional context

In PWN 3.1, 2820 glosses had both a semicolon inside their definition, and other semicolons to separate examples. But with WNDB 2021, no parser can handle the corresponding cases adequately.

@ekaf ekaf added the release format This issue refers to the WNDB or RDF export, so no changes will be made to this repository label Nov 14, 2021
@jmccrae
Copy link
Member

jmccrae commented Nov 16, 2021

This is really a problem with the WNDB format as there is no distinction between the examples and the gloss, and the examples are not consistent in format. I have updated the export files to add the " back in, but depending on this particular formatting is risky. This also restores source information that was lost before.

@jmccrae jmccrae closed this as completed Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release format This issue refers to the WNDB or RDF export, so no changes will be made to this repository
Projects
None yet
Development

No branches or pull requests

2 participants