New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usefulness of "for" properties in the controlled vocabulary module #55
Comments
Yes, in principle the for... properties could be combined into a single multi-value property, so that e.g. instead of As for restrictions on how they may be combined, there are (mostly!) no restrictions and all combinations are valid. There are some exceptions and yes I agree we should state them explicitly. For example, under |
Suggested Implementation:
|
Accepting a suggestion as above would also effectively fix #60 |
OK, so here is how it could work. A new model-level
|
I agree with the proposal other than that I prefer a space-separated string for serialization |
Here are my arguments for not being in favour of spaces-and-colons-separated strings. Agument 1: Consistency with rest of DMLexA spaces-and-colons-separated string here would be inconsistent with the approach taken everywhere else in the DMLex serializations. For example, we never do stuff like this:
and instead we always do stuff like this:
Argument 2: It’s not the JSON/XML wayJSON and XML parsers cannot not “see” the structure inherent in these strings. To process a string like Yes, writing your parsing routine for these things can be a trivial oneliner if you’re processing e.g. dictionary entries one by one. But it can become a nuisance if you want to do some kind of bulk processing, like “give me all tag types that are ‘for’ translations but not have a language specified”. Doing this in e.g. an XSL stylesheet is straightforward if the the XML object model can “see” the individual ’for’ values (= my way) but not if not (= John’s way). |
We would have to introduce I am also a little concerned about the inconsistency in the JSON serialization with both strings and objects in the same array, this often creates issues with the parsing, as you have to check the type first. The single string proposal is easily processed with XSLT, e.g. <xsl:if test="contains(for, 'translations')"/> |
1.
No we wouldn’t. We would need to introduce a new We have done something similar once or twice already, such as the 2.
True, it is a little frowned-upon in the JavaScript/JSON universe to have arrays with mixtures of different types inside them. Perhaps something like this would be better:
Bonus: makes it very similar to my proposed XML serialization. Drawback: a bit wordy (but not more than the XML serialization). 3.
That’s true, but not bullet-proof. The XPath function
Additionally, there might be performance bottlenecks during bulk processing. The XPath processor would typically have to do the substring matching on-the-spot during each run instead of being able to rely on an already parsed object model. So, I’m not convinced, I still prefer the fully explicit serializations. |
20th December 2023 |
Having thought about it a bit more, I’m afraid the scheme we have agreed on is not expressive enough. It is unable to express that, for example, something is only allowed on translations in languages X, Y and Z but not others. To express such things, it seems to me that, after all, we have no choice but to go with something much like John’s original proposal where the
This would be mean that the part-of-speech tag What we’d lose: the ability to model the constrains explicitly in the formalisms that we have serializations for (JSON, XML...). Instead, we are embedding a private notation as a string. What we’d gain: full expressivity; the ability to express all and any combinations of constrains, even crazy ones (if we allow implementors to extend the notation with their own atomic terms eg. Another option is to leave the contents of the
What would it be for other serializations? I hear XPath 3.1 can query JSON too, so maybe that. Or some other JSON query language, there are a couple of them around. Also, in my experience, most dictionary schemas in existence today don’t bother modelling these constrains at all (and so they allow e.g. adding “plurals” to verbs, or “past tense” forms to nouns). So, whatever DMLex comes up with, will either be a big step forward or completely ignored by implementors anyway. |
Decision taken on 5th Jan 2024: |
Implemented. We now have a single optional string-valued This issue will close automatically when pull request #77 is merged. |
The tags have multiple "for" properties, e.g., forHeadwords. Do we have restrictions on how these may be combined, e.g., can a inflectedFormTag apply to headwords or translations or languages. Would it not make sense to combine this into a single property with values, e.g., instead of forHeadwords=true have for=headwords
The text was updated successfully, but these errors were encountered: