-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gender normalization (localization) #20
Comments
Furthermore, some languages may have different set of genders for singular and plural. One example is Polish, in which most linguist distinguish 5 genders: 3 for singular (masculine, feminine and neuter) and 2 for plural (virile and non-virile). It must be noted though that some linguists prefer different classifications, for example Polish entries in IEV use an old-school approach with masculine and feminine genders in plural (102-03-13). My conclusion is that it will be difficult to develop a discrete set of genders which will work for every language and for every project. Perhaps we should allow arbitrary genders, but I'm not sure if Glossarist Desktop supports that. Perhaps we should be even more elastic and describe terms with an array of arbitrary grammar classifiers rather than have separate fields for gender, plurality, etc. |
For what it’s worth, here is how grammatical properties of nouns are typed in Glossarist model:
Not sure if this helps and what you are trying to achieve, just saw this issue in my notifications. |
What does "common" gender stand for? Is it kinda "not applicable" or "unspecified"? Or maybe it's kinda "masculine or feminine, but not neuter"?
I'm trying to achieve something more elastic as there are languages which have more than three genders. For example in context of IEV, Dutch has |
I recommend using fully qualified gender names instead of one-letter abbreviations to reduce ambiguity. For linguistic background of neuter/common see e.g. https://en.wikipedia.org/wiki/Grammatical_gender |
Thanks! It explains everything.
I'm okay with either option. Still, I'm not sure if set of just four genders will be future-proof. For example, some languages distinguish for example animate and inanimate nouns, and most vocabularies display that next to gender, because it's useful for users. Moreover, some languages (e.g. Polish) distinguish different genders in singular (masculine, feminine, neuter) and in plural (virile, non-virile). These two extra genders in plural can be internally represented as masculine and feminine, and that's probably technically correct, but at some point I guess we'll have to do some mapping in the interface in both Geolexica and Glossarist desktop so that more appropriate verbiage is used. That said, what you proposed should be enough in context of IEV and I'm okay with that. |
Animate/inanimate property could be added if needed, but like you say, for glossaries we deal with it may not be relevant. Generally, in linguistics there are different competing ways of classifying verbal expressions. Control bodies can disagree with each other which one they use. Also, they always evolve. I think user-configurable versioned schemas (like what we are trying to do with generic registry schema) is the way to go. Some vocabularies may need more finely detailed grammatical properties, but for others those properties may not matter. |
Indeed, this is my primary concern too. But after your clarifications, what we adopted seems enough for now, at least I haven't found any outstanding case yet. Closing? |
No objections from my side…
… On 10 Jan 2021, at 3:38 PM, Sebastian Skałacki ***@***.***> wrote:
Generally, in linguistics there are different competing ways of classifying verbal expressions. Control bodies can disagree with each other which one they use. Also, they always evolve.
Indeed, this is my primary concern too. But after your clarifications, what we adopted seems enough for now, at least I haven't found any outstanding case yet. Closing?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Agree. It is difficult to have different control bodies agree on an identical set of language gender, so leaving it customizable is easiest for now. |
BTW, what's "generic registry schema"? I'm certainly not on the same page here. |
It’s data schema used by a registry editor GUI currently in development. It doesn’t clash with concept model described here, they are different things. |
@ronaldtse I got a couple of questions. See IEV 102-04-22 on old Electropedia.
ж јд
, which probably means "feminine singular". We surely need to display it this way, but the question is how to represent it in data?ж јд
or normalized asf
?m/f
which means "masculine with optional feminine" (it's different than masculine, feminine, or neuter). We surely need to display it this way, but the question is how to represent it in data?m/f
or maybe there is another common notation for mixed masculine-feminine genders like this one?Also note that there may be more genders or alike. For example, in some Slavic languages (Czech, Slovene) nouns are further divided into animate and inanimate ones. I am not sure how important is that, but Wiktionary denotes that next to gender (see this).
The text was updated successfully, but these errors were encountered: