Gender normalization (localization) #20

skalee · 2021-01-05T08:36:06Z

@ronaldtse I got a couple of questions. See IEV 102-04-22 on old Electropedia.

Entry in Serbian (апсциса, <дуж криве> ж јд) has gender ж јд, which probably means "feminine singular". We surely need to display it this way, but the question is how to represent it in data? ж јд or normalized as f?
Entry in Dutch (abscis, m/f) has gender, m/f which means "masculine with optional feminine" (it's different than masculine, feminine, or neuter). We surely need to display it this way, but the question is how to represent it in data? m/f or maybe there is another common notation for mixed masculine-feminine genders like this one?

Also note that there may be more genders or alike. For example, in some Slavic languages (Czech, Slovene) nouns are further divided into animate and inanimate ones. I am not sure how important is that, but Wiktionary denotes that next to gender (see this).

The text was updated successfully, but these errors were encountered:

skalee · 2021-01-06T06:59:59Z

Furthermore, some languages may have different set of genders for singular and plural. One example is Polish, in which most linguist distinguish 5 genders: 3 for singular (masculine, feminine and neuter) and 2 for plural (virile and non-virile). It must be noted though that some linguists prefer different classifications, for example Polish entries in IEV use an old-school approach with masculine and feminine genders in plural (102-03-13).

My conclusion is that it will be difficult to develop a discrete set of genders which will work for every language and for every project. Perhaps we should allow arbitrary genders, but I'm not sure if Glossarist Desktop supports that. Perhaps we should be even more elastic and describe terms with an array of arbitrary grammar classifiers rather than have separate fields for gender, plurality, etc.

strogonoff · 2021-01-09T00:26:57Z

For what it’s worth, here is how grammatical properties of nouns are typed in Glossarist model:

https://github.com/glossarist/glossarist-desktop/blob/4105c7a2b2b1f5085c748af3ce0fdb27fd7e3149/src/models/concepts.ts#L188

Common and neuter genders are supported.
Grammatical number (plural/singular) and gender are separate.

Not sure if this helps and what you are trying to achieve, just saw this issue in my notifications.

skalee · 2021-01-09T10:26:19Z

What does "common" gender stand for? Is it kinda "not applicable" or "unspecified"? Or maybe it's kinda "masculine or feminine, but not neuter"?

Not sure what you are trying to achieve.

I'm trying to achieve something more elastic as there are languages which have more than three genders. For example in context of IEV, Dutch has m, f, n, and m/f.

strogonoff · 2021-01-09T11:09:38Z

I recommend using fully qualified gender names instead of one-letter abbreviations to reduce ambiguity.

For linguistic background of neuter/common see e.g. https://en.wikipedia.org/wiki/Grammatical_gender

skalee · 2021-01-09T12:05:53Z

For linguistic background of neuter/common see e.g. https://en.wikipedia.org/wiki/Grammatical_gender

Thanks! It explains everything.

I recommend using fully qualified gender names instead of one-letter abbreviations to reduce ambiguity.

I'm okay with either option.

Still, I'm not sure if set of just four genders will be future-proof. For example, some languages distinguish for example animate and inanimate nouns, and most vocabularies display that next to gender, because it's useful for users. Moreover, some languages (e.g. Polish) distinguish different genders in singular (masculine, feminine, neuter) and in plural (virile, non-virile). These two extra genders in plural can be internally represented as masculine and feminine, and that's probably technically correct, but at some point I guess we'll have to do some mapping in the interface in both Geolexica and Glossarist desktop so that more appropriate verbiage is used.

That said, what you proposed should be enough in context of IEV and I'm okay with that.

strogonoff · 2021-01-10T04:41:03Z

For linguistic background of neuter/common see e.g. https://en.wikipedia.org/wiki/Grammatical_gender

Thanks! It explains everything.

I recommend using fully qualified gender names instead of one-letter abbreviations to reduce ambiguity.

I'm okay with either option.

Still, I'm not sure if set of just four genders will be future-proof. For example, some languages distinguish for example animate and inanimate nouns, and most vocabularies display that next to gender, because it's useful for users. Moreover, some languages (e.g. Polish) distinguish different genders in singular (masculine, feminine, neuter) and in plural (virile, non-virile). These two extra genders in plural can be internally represented as masculine and feminine, and that's probably technically correct, but at some point I guess we'll have to do some mapping in the interface in both Geolexica and Glossarist desktop so that more appropriate verbiage is used.

That said, what you proposed should be enough in context of IEV and I'm okay with that.

Animate/inanimate property could be added if needed, but like you say, for glossaries we deal with it may not be relevant.

Generally, in linguistics there are different competing ways of classifying verbal expressions. Control bodies can disagree with each other which one they use. Also, they always evolve.

I think user-configurable versioned schemas (like what we are trying to do with generic registry schema) is the way to go. Some vocabularies may need more finely detailed grammatical properties, but for others those properties may not matter.

skalee · 2021-01-10T06:38:39Z

Generally, in linguistics there are different competing ways of classifying verbal expressions. Control bodies can disagree with each other which one they use. Also, they always evolve.

Indeed, this is my primary concern too. But after your clarifications, what we adopted seems enough for now, at least I haven't found any outstanding case yet. Closing?

strogonoff · 2021-01-10T06:49:50Z

No objections from my side…

…

On 10 Jan 2021, at 3:38 PM, Sebastian Skałacki ***@***.***> wrote: Generally, in linguistics there are different competing ways of classifying verbal expressions. Control bodies can disagree with each other which one they use. Also, they always evolve. Indeed, this is my primary concern too. But after your clarifications, what we adopted seems enough for now, at least I haven't found any outstanding case yet. Closing? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ronaldtse · 2021-01-11T04:04:07Z

I think user-configurable versioned schemas (like what we are trying to do with generic registry schema) is the way to go. Some vocabularies may need more finely detailed grammatical properties, but for others those properties may not matter.

Agree. It is difficult to have different control bodies agree on an identical set of language gender, so leaving it customizable is easiest for now.

skalee · 2021-01-11T09:51:32Z

BTW, what's "generic registry schema"? I'm certainly not on the same page here.

strogonoff · 2021-01-11T09:56:20Z

It’s data schema used by a registry editor GUI currently in development. It doesn’t clash with concept model described here, they are different things.

skalee added the question Further information is requested label Jan 5, 2021

skalee assigned ronaldtse Jan 5, 2021

skalee mentioned this issue Jan 6, 2021

Term attribute gender and plurality when not specified glossarist/iev-data#81

Open

skalee closed this as completed Jan 10, 2021

skalee mentioned this issue Jan 11, 2021

IEV improvements: add "masculine/feminine" grammatical gender #12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gender normalization (localization) #20

Gender normalization (localization) #20

skalee commented Jan 5, 2021 •

edited

Loading

skalee commented Jan 6, 2021

strogonoff commented Jan 9, 2021

skalee commented Jan 9, 2021 •

edited

Loading

strogonoff commented Jan 9, 2021

skalee commented Jan 9, 2021 •

edited

Loading

strogonoff commented Jan 10, 2021

skalee commented Jan 10, 2021

strogonoff commented Jan 10, 2021 via email

ronaldtse commented Jan 11, 2021

skalee commented Jan 11, 2021

strogonoff commented Jan 11, 2021

Gender normalization (localization) #20

Gender normalization (localization) #20

Comments

skalee commented Jan 5, 2021 • edited Loading

skalee commented Jan 6, 2021

strogonoff commented Jan 9, 2021

skalee commented Jan 9, 2021 • edited Loading

strogonoff commented Jan 9, 2021

skalee commented Jan 9, 2021 • edited Loading

strogonoff commented Jan 10, 2021

skalee commented Jan 10, 2021

strogonoff commented Jan 10, 2021 via email

ronaldtse commented Jan 11, 2021

skalee commented Jan 11, 2021

strogonoff commented Jan 11, 2021

skalee commented Jan 5, 2021 •

edited

Loading

skalee commented Jan 9, 2021 •

edited

Loading

skalee commented Jan 9, 2021 •

edited

Loading