-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not shortest relations path for Sk language #3819
Comments
This works as designed: The actual path is the same in both cases (a path of length 6, as indicated by the dotted lines in your image, i.e. S1-C1-P1-GP1-GP2-P2-C2). The algorithm actually works like this: It determines for each language which term that describes this path has the fewest characters. It does not (as apparently expected) break down the path into a minimal number of sub-paths (determined according to some criteria) and then translates those. In case of English, both approaches lead to the same result: "partner’s second cousin" is the term with the fewest characters, and also the combined term for the two components 'partner' and 'second cousin'. In case of Slovak, you'd get a term with more characters for the second approach (something like 'partner od druhostupňová sesternica'?). For more complex relationships, this approach becomes infeasible. That said, improving the language-specific relationship names has been an open issue for a long time now, see e.g. here: #2331. |
I misused term "relations path", but it seems that you understand my approach, but you are right, that translation happens via I am not sure, if i understand properly the last paragraph. Do you want to tell, that in English is "partner’s second cousin" chosen, because it is shorter string than "father-in-law's aunt's grandson" and in Slovak it is vice versa? If yes, then it is IMO bad algorithm/condition and must leads to unpredictable results in different languages, as the length of the term will differs by terms itself, not by relation. Nobody will describe this relation by the blue path, simple because it is horrible complicated (in really, i understand it only after i draw the image). IMO here is need to count the "path" (blue vs red steps) used to construct the string, not the length of the string itself. In other words, the term "druhostupňová sesternica" must have the same weight (preference) as "vnuk" or "teta", because all describes one person. But the "druhostupňová sesternica" (one step relation) must have higher weight than "vnuk od teta" (two steps relation), because one step is shorter than two steps, no matter of the string length... (eh, terrible to describe for me at all, especially in English) Anyway, the construct of "partner od druhostupňová sesternica" is grammatically bad, but i do not want to discus this, i understand the limitations... |
I agree with you - it looks odd and nobody in real world would use such grammatical construction. This translation was inspired by the Czech translation. Some languages use only a simple arrow. There is no real good solution of the translation. But I agree with Slavko, that only the length of a string is not the best sollution. Ladislav |
Yes, that's how the algorithm works currently. I guess it has been chosen because it works (somewhat) for all languages. It wasn't easily possible in earlier webtrees versions to use different algorithms for different languages. It would be possible now (in webtrees 2.x) to choose the actual algorithm per language, or even offer a choice of different algorithms. Once we have defined a proper interface for this functionality, it will be possible to develop language-specific (custom) modules for this. (I'm not sure though how to best handle the language-specific relationship terms - I don't think this can be done via weblate, so these terms would probably be part of the language-specific module, and only be editable there) |
On the fly I have played a little bit with the Slovak translations:
It seems a better "translation" for "%1$s’s %2$s" would be something like "%2$s GENITIVE(%1$s)" for the 1st level of recursion and "GENITIVE(%2$s) GENITIVE(%1$s)" for deeper calls. I have no idea how this recursion works, so it is possible, that it is not so easy to decide, which is the very first (or last) part of the string, which only should be in nominative. The other parts of the string should be in genitive. I know this is not only a matter of translation, but it is perhaps easier than a complet new module. And this could be helpful also for other slavic languages. Ladislav |
Using the genitive case for specific languages has also been suggested before. See Greg's reply here: #947 (comment) . This was about five years ago - there has been no progress at all on these issues since then unfortunately. For a while, it was at least on the 2.1 milestone list, but it has been indefinitely postponed apparently. The interface I mentioned earlier may not even be that complex. Instead of using getRelationshipNameFromPath from functions.php, we need an additional method in ModuleLanguageInterface (or LocaleInterface?). Its default implementation via ModuleLanguageTrait could use the existing implementation. @fisharebest: Any thoughts on this? |
I am curious which languages are those "all languages". It doesn't work for at least for Slovak and Czech (but i afraid about other Slavic languages too), thus your statement is not right. I did look at statistics page and i see, that most installations are in USA and DE, do you mean English ant Deutsch as all languages?
weblate is not an issue here, i am not translator (while i collaborate with our one). The issue is bad algorithm, as i stated before, the string length is not measurement, at least not universal measurement. |
Again, it works as designed. We don't disagree there could be better solutions! There are different aspects to be addressed, the algorithm itself is just one of them. I would be interested in tackling this, but it won't be possible without Greg's input. |
I think, the idea of using the length of the strings is not a very good one. For example a very simple change - not to use the length of the strings, but the number of the blanks is in my opinion better. At least for Slovak it will give better results. So I changed the line 2.400 (quite at the very end) of the file Functions.php (in the folder /app/Functions) from Ladislav |
I afraid, that counting spaces is not universal too. What about "bratranec z 3. kolena", more spaces than "vnuk od tety"... |
Regarding the algorithm, I think the original suggestion is more suitable than counting spaces. Anyway, I'm working on all of this over here as long as there is no progress on this in webtrees itself. I hope to have a first version (for German) available as a showcase shortly. |
It depends on the way, how is the string "%1$s’s %2$s" translated in English (and German) its adding only one blank. in other languages (Slovak, Czech, ...) its adding 2 blanks. For such languages it will make better results. And the longer the relationship the better. Of course counting of the sub-paths is far better.
I am not saying it is universal, but I am sure, that for some languages it gives better results, and for the rest its giving not a worse. And AFAIK a "3rd cousin" is "directly translated", so this will not happen. And a second thing - when a translator knows, that the leng of the string, or the count of blanks is important for the algorithmus it cann ba changed "bratranec z 3. kolena" cann be also "treťostupňový bratranec" or even "3-stupňový bratranec".
This is a very good news. When you need a tester for other languages ... |
IMO, because translation happens in |
@slavkoja: I have already implemented an improved algorithm, and an option to use more grammatical genitive-based constructions. What's left to do is to create the language-specific files, e.g. in a format similar to the file for German. |
@ric2016 - I have tried a similar approach to yours before. Here are some issues that you might consider... The relationship and relationship-name can depend on marriage. For example, the relationship between myself and my girlfriend's brother depends on whether we are unmarried, married or divorced. unmarried => partner's brother So solve this, I used closures instead of strings for the names. The relationship name can depend on pedigree. e.g. father, foster-father, adopted-great-grandfather, etc. Also, the relationship name can depend on the sex of the first individual. In some languages (Polish?) the name for a man's uncle is different to the name for a woman's uncle. I am very interested to see what progress you make. @slavkoja - the example above ("partner's brother") shows why it would be difficult to count recursion depths. This relationship name already contains two parts. If we can remove all the "exotic" names which exist for only one language, then I think we can use a non-recursive solution. Simply match as much of the first part of the relationship as possible. For very long paths, the recursive algorithm to find the "best" name is slower than the code to find the relationship. |
I have been working on this intermittently. But I do not have a solution that is good enough to commit to the main repository. |
My goal is to split this joiner from the rest of the code. The joiner can then be implemented per language, as required. By the way, in German it's more grammatical to use a reversed construction i.e. "NOMINATIVE ... GENITIVE GENITIVE" ("husband of the cousin of the partner" rather than "partner's cousin's husband") but that is just a detail (drawback is that the respective diagram has to be parsed from right to left when using this construction, which may be confusing as well).
If we define the criteria differently, the recursive algorithm performs better as well. In the first modification, it is first attempted to split in specific places only (before/after spouse), because the resulting sub-paths are often available directly (without further recursion). The result preserves common-ancestor-based relationships, so that we actually get "partner’s second cousin" rather than e.g. "father-in-law's aunt's grandson" in all languages regardless of string length (see the original issue).
I have this case covered - I'm not sure about the other cases though (pedigree/marriage): This additional information is currently not used either, except for simple relationships (path length of one). So the input to the function (currently implemented in Functions:: getRelationshipNameFromPath) would have to be extended. |
@slavkoja from the example given in the table above, it seemed that it would be possible to chain, as only the second part of the overall term has to be adjusted in each step? Or is that a special case? |
I dont know exactly, but I think that this should work in slavic languages, perhaps in all European languages. On the other hand, there are also in Slovak other patterns for complex relationship - with possesive adjectives, which sounds better, but the pattern is far more complex. See the column Slovak (OK) #3819 (comment) this form is for example gender sensitive. Also the question (from @slavkoja here) "Do we realy need a description for very complex relationships?" should be discussed. Wouldnt it be better simply say "This relationship is too complex - see the picture"? Or when there is no picture then a text string "IND1 simple_relationship1-2 INDI2 simple_relationship2-3 INDI3 ..... simple_relationship(n-1)-n INDIN"? |
I would say the GENITIVE chain is acceptable. For "long" chains even good. For "small" chains the possesive adjectives sound better. But som of them are defined directly, not through the recursive routine. But I hope also @slavkoja would agree, that the "Genitive chain" is a better solution then we have today. |
IMO, whole discussion gets bad direction. We cannot solve genitives problem, until we get decision about underlying shortest relation's path identification. Currently implemented algorithm is bad, and this this one case with Sk and Cs language only shows, that the localized string length cannot be used as measure. The genitive's discusion have to be filled as separate issue, depended on result of this one. From my point of view, there can be measured depth of "translation's" recursion directly on English string, to get better decision. But i contact my mathematician friends (Graph theory), i hope that i get answer soon, while i am not sure, if they will able to help with this directly in PHP, as no one from us is PHP man, but i hope that he can help to move this somewhere. |
These are seperate issues - Both can be handled independently (I know this issue originally was only about one of them). Again, I already have a implemented an improved solution for the path algorithm, which seems to work well. There is no need to over-complicate this aspect. |
Ich schreibe in Deutsch, weil das Thema sehr komplex ist. Ich denke, dass der Ansatz die Verwandtschaftsbezeichnungen pro Sprache zu erzeugen nicht so gut ist. Vielleicht müsste man erst einmal festlegen, welches Verwandtschaftssystem zum Einsatz kommen soll. Wir verwenden ja das Eskimosystem (kognatische, bilaterale Abstammung), aber es gibt ja noch mehr (Hawaiisystem, Iroquoisystem, Crow/Omahasystem, Sudansystem). Innerhalb dieser Systeme gibt es dann noch pro verwendeter Sprache Varianten, aber das Gruppieren in Verwandtschaftssysteme könnte vielleicht helfen die Vielfalt etwas zu strukturieren. Hilfreich finde ich:
|
The first version of the new approach is now available in Vesta release 2.0.15.1.0 (module "Extended Relationships" in particular).
@fisharebest Thank you for your input - I have kept the common cases, but a closure/callback is now used internally, and can also be used directly to implement additional strategies of arbitrary complexity. As an example of this, I have implemented modified step-father etc. relationships (in the English language version) which take into account the date of marriage vs the child's birthdate, as suggested on the forum a while ago. @hartenthaler Additional kinship systems are an interesting but rather academic topic (at least, as you point out, for the large majority of users). In any case we need a solution for the existing issues, which definitely require a language-specific approach. That's what this is about - Everything beyond is a different issue that should be handled separately, if at all. |
@slavkoja - I have now found my original solution to this problem, and further developed it. Relationships are defined using code like this:
We can distinguish between married and unmarried partners. Variable relationships are defined using some callback functions. e.g.
It matches relationships using a "longest substring" algorithm. Are you able to help me by providing definitions for Slovak? |
@fisharebest have you looked at the code in the 'Extended Relationship' module? It seems rather similar regarding general concepts. We don't have to start to duplicate development on this now after years of no progress? The current Vesta release has a finished solution for English, German and Slovak. |
As Ric wrote the Slovak definitions are in LanguageSlovakExt.php. Some translations cann be discussed (for some relationships we have two equivalent names), perhaps I forgot to add some relationships. Step/adopted relationships are (intentionally) minimalised - cann be discussed. But I am very happy, that with this aproach we are now not using the "n-th cousin y-removed" relationship. Of course, when you need further assitence with Slovak translation I cann help also. Ladislav |
@fisharebest i can, of course, while i am not sure, what you want from me. I still afraid about decision by string length. Is it problem to count the recursion depth? When i return to the initial image, the red "path" has 2 recursions (relations), but the blue path has 3. The shortest one is with less recursion depth... |
There is a working solution available in Vesta based on criteria like the one you suggest. I'd be interested in your feedback! |
I have looked at the code now...
I am currently matching the longest substring with a named-translation. (This part of the algorithm can be changed easily.) So, in your example, we have spouse-parent-sibling-child (4 steps). We can match 3 steps with the definition for "cousin" (parent-sibling-child).
@slavkoja - I have taken most of the relationship definitions from @ric2016's code. But I cannot work out the numbering system for cousins. A cousin is: If you have |
The main question for me is whether we should continue to work on this in parallel. Seems like unnecessary effort - There are so many open other issues. Why do you (re-)start to work on this now? Again, I have a finished solution. You can download the Vesta modules and test it with the relationship chart. I had hoped we could do it like this: We test additional languages in the custom module, and then we move its code to webtrees eventually. Edit: Maybe we should discuss this further elsewhere? |
OK, then we can discuss this algorithm latter.
First, i need to state, that i am not interested in genealogy at all, i only install & manage webtrees on my server for my mother ;-) Thus my info can be bad and would be great if @ro-la can confirm or correct this. AFAIK, the our cousin's level requires that X = Y, thus cousins have common grand-parent, second cousins grand-grand-parent, etc... In our language cousin's name depends on sex, thus we have "bratranec" for male and "sesternica" for female (and here must be something for unknown too).
Thus, when IMO, you can inspect different languages in
For all languages you can use in
Quick look at result shows, that all languages simple reuse the
shows, that not all languages uses 6 (number) in translation, but i am not able to understand a most of them, thus some can use it as word (six)... Another complication is, that right translation is e.g. "z 2...", "z 3...", but "zo 4..." or "zo 6...". AFAIK this is impossible to achieve with Be free to ask if other info is needed. As experienced free/open software (mostly gettext) translator i am strictly against hardcoding string's translation into code, as it is unmaintainable and most of translators will do not doing updates in it. One can use dummy PHP file (even with by hand generated "dynamic" values) for them, to they go to POT file and will be translatable by usual way. |
In Slovak if X<>Y we dont use description like in English. One would mostly say something like "3rd cousin of my grandfather"
In Ric's aproach we used a special translation for 4, 6, 7, 14, 16, 17 with "zo", the rest ist with "z". Its hard to say, that somebody would need 40-th cousin.
Generaly I agree. But we have for example the Census Asistant - where the header of the table is not translated and is used in the original language. I think the relationships names are very language specific - different language (groups) use quite different system of family relationships. But also inside a languagegroup some relationships have a specific name in one language and not in a second one. So for a translator it is not good to have all possible relationships names in one file with other translations. So this "relationships messages" should not be added to the "messages.php" in folder "sk", but in a separate file, perhaps even in separate directory. In this case I would accept also "hard-coded" text. Because the translation is not only a simple translation, the translator must identifiy what messages are needed, what should be used and what not. Ladislav |
I tried to explain the reason for this here. As Ladislav also pointed out, you cannot solve this issue with only a translator's usual tools (you noticed yourself some things like "z/zo %s..." are not solvable via gettext). So, do you really think it is easier to create a file like this, which has to be language-specific, using strings that are then to be translated in a separate file? In case of updates, usually both files would have to be adjusted anyway. |
What is the name for one level? e.g. parent - sibling - child?
Do other levels have "names" that are prefered to numbers. |
The definitions in LanguageSlovakExt.php are based on Ladislav's input, so you should be able to reuse them. But note that slavkoja's definitions do not match these (they seem to be off by one level):
shouldn't this be "bratranec z 3. kolena"? Regarding your earlier question:
When the same object $ref is used in a definition, it collects the count in the first matcher (in this case ->parent($ref)), and the count is then used in subsequent matchers (in this case ->child($ref)). The second argument in 'Times::min(1, 1)' is an offset, so a count of 2 would result in a value of 3 to be used for '%s'. |
bratranec (male)
I think this is an error - this is "bratranec z 3. kolena"
As far as I know not. Ladislav |
The slovak language definitions need to be checked at: https://github.com/fisharebest/webtrees/blob/main/app/Module/LanguageSlovakian.php Tests need to be written at:
|
Note that the Slovak relationships from LanguageSlovakExt.php are designed to be used with a particular path-splitting algorithm, for which see that file starting from line 21. Using them with the standard shortest-string-based algorithm won't produce the intended results in all cases. |
I have tried to make some changes to the LanguageSlovakian.php - this is first time I did it using GitHub. Where and how cann I test the translations? Ladislav |
Yes - this worked OK!
|
We no longer have a "shortest-string" algorithm. Instead, we match the longest sub-path that has a single name. For the example at the top of this issue, we have spouse-parent-sibling-child We could split this in several ways:
Option 3 has the longest subpath (3 steps), and so gets extracted first. |
ok, but 'longest subpath' isn't the desired criteria in all cases either (in Slovak): 'first cousin 3 times ascending' would split into 'great-great-grandfather' + 'niece', where 'great-grandfather' + 'cousin' would be preferred. Also, common-ancestor-based sub-paths are preferred, which you don't always get with 'longest subpath' either. |
I have tried this: The First relationship is not the best description: |
I have added a new pull request for LanguageSlovakExt.php - the definition for nephew an niece are exchanged. |
@fisharebest as far as the problem with n-th grandparent is not solved, this issue should be reopened - or should I rise a new bug report? |
HI ro-la - I have a solution for this. I hope to submit it soon. |
In this case, the relations path is shortest in English language but it differs from path in Slovak (and Czech too) language:
In English it shows "partner’s second cousin" (red path), but in Slovak it shows "vnuk od teta od svokor" (BTW, horrible form), which can be raw translated as "father-in-law's aunt's grandson" (blue path). It took me some hours to check translation files, but then i find, that it comes from code, where i lost in shortest path selection...
The text was updated successfully, but these errors were encountered: