Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support homographs (was: Re-consider need to abort due to duplicated preferred terms) #1065

Closed
ronaldtse opened this issue Nov 29, 2023 · 8 comments
Assignees

Comments

@ronaldtse
Copy link
Contributor

ISO 5598 is a trilingual document, English, French and German.

It happens that in English, all preferred terms are unique (due to ISO DIR 2 requirements), but in French and German, it contains multiple duplicated preferred terms.

Personally I think there are reasons for these duplications, for example, look at this:

English:

[[term_3.2.571]]
==== pressure gauge

device that measures and indicates gauge pressure (3.2.346)

[[term_3.2.577]]
==== pressure-measuring instrument

device that measures and indicates the level, variations and differences of pressure (3.2.560)

German:

[[term_3.2.571]]
==== Druckmessgerät

[%metadata]
grammar::
gender::: neuter

Gerät zum Messen und Anzeigen von Überdruck (3.2.346)

[[term_3.2.577]]
==== Druckmessgerät

[%metadata]
grammar::
gender::: neuter

Gerät zum Messen und Anzeigen der Druckhöhe, Druckveränderung und Druckdifferenz

If in German the words for "pressure gauge" and "pressure measuring instrument" are the same, then we are not in a position to say that is wrong.

While one might say that ISO DIR 2 says:

"Only one definition per terminological entry is allowed. If a term is used to define more than one concept, a separate terminological entry shall be created for each concept and the domain shall be included in angle brackets before the definition."

But if the two terms have the same domain (or no domain) in English, are we to separate into new domains in French or German?

It is a known fact that different languages expresses concepts differently. Moreover, in a dictionary one word can have multiple definitions.

This same issue also happens in ISO/IEC 2382:

Perhaps this is a common vocabulary dataset issue?

Also FYI @ReesePlews

@ReesePlews
Copy link

@ronaldtse i believe ISO 10241-1 clause 6.2.6 Homographs and antonyms talks about this issue. it suggests when these type of homographs cannot be avoided, homographs shall be defined in separate terminological entries. a cross reference in a note to entry between these entries can be useful. however that adds complexity dealing with the note to keep the xref or not. not sure if this is what you were looking for or not. i am not familiar with such duplicated multilingual terms. this does not happen, in my understanding in asian languages as the pictograms would be different, but i dont have much experience in this area.

@ronaldtse
Copy link
Contributor Author

Thank you @ReesePlews for pointing that out! I think the answer is definitive now, in 6.2.6 they provided an example where the French preferred term is duplicated (as in homographs representing two different concepts).

There is a special treatment for the "Note to entry", where the language that contains a homograph will need a particular "Note to entry" for cross-referencing, and the language where the homograph does not exist will need a "Note to entry" as a placeholder.

Definition of a "homograph":
Screenshot 2023-12-06 at 6 38 00 AM

Recognition that "homographs" exist:
Screenshot 2023-12-06 at 6 37 46 AM

Avoid homographs:
Screenshot 2023-12-06 at 6 38 51 AM

Homographs in different languages are accepted (even in the same domain):
Screenshot 2023-12-06 at 6 39 21 AM

@opoudjis
Copy link
Contributor

opoudjis commented Dec 5, 2023

OK, the abort becomes a warning.

We do need a notion of severity in warnings, as some warnings are clearly more important than others, and I don't want a warning about homographs to be missed. I'm already suppressing display of grammar errors, but I am inclined to introduce 3 severity levels, abort on 1, display 2, and not display 3.

@opoudjis
Copy link
Contributor

opoudjis commented Dec 6, 2023

Thank you @ReesePlews and @ronaldtse , very helpful to see the chapter and verse on this

@opoudjis
Copy link
Contributor

opoudjis commented Dec 7, 2023

Want to get rid of the separate @fatalerror variable, and abort on all severity 1 log errors. (They are almost all in standoc.)

@opoudjis
Copy link
Contributor

opoudjis commented Dec 7, 2023

Severe errors in standoc:

  • Asciibib references missing anchor, title, or document identifier
  • Non-Fatal error in Bibliography Spans notation
  • External terms source given but not referenced
  • Unresolved footnote block reference
  • Unsupported Asciidoctor element
  • Latex math equation not parsed
  • Malformed Asciidoctor in bibliographic reference
  • Request error in Relaton fetch
  • Unresolved concept reference
  • Unresolved crossreference target
  • Term designation mismatch with IEV
  • Two identical term designations

@opoudjis
Copy link
Contributor

opoudjis commented Dec 7, 2023

Abort errors are shown in HTML log wth pink background. Severe errors are in boldface.

opoudjis added a commit to metanorma/metanorma-standoc that referenced this issue Dec 7, 2023
opoudjis added a commit to metanorma/mn-requirements that referenced this issue Dec 8, 2023
opoudjis added a commit to metanorma/metanorma-standoc that referenced this issue Dec 8, 2023
opoudjis added a commit to metanorma/metanorma-standoc that referenced this issue Dec 8, 2023
@opoudjis
Copy link
Contributor

opoudjis commented Dec 8, 2023

Holding off on escalating style warnings in child flavours, except for IETF, which is more finicky because its output needs to be XML validated:

  • No matching review for cref
  • Image other than SVG
  • IETF: unrecognised working group

@opoudjis opoudjis closed this as completed Dec 8, 2023
opoudjis added a commit to metanorma/metanorma-ietf that referenced this issue Dec 8, 2023
@ronaldtse ronaldtse changed the title Re-consider need to abort due to duplicated preferred terms Support homographs (was: Re-consider need to abort due to duplicated preferred terms) Dec 8, 2023
opoudjis added a commit to metanorma/metanorma-standoc that referenced this issue Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants