Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix idempotency in to_concepticon #14

Merged
merged 2 commits into from
Feb 14, 2024

Conversation

johenglisch
Copy link
Contributor

I think I found the culprit.

@johenglisch
Copy link
Contributor Author

Btw, I've been wondering about this sort key function:

pysem/src/pysem/glosses.py

Lines 343 to 347 in 903b6f1

results = sorted(
{row: True for row in results},
key=lambda x: (x[-1], 1 if g.pos == x[-2] else 0, x[-3]),
reverse=True,
)[:max_matches]

It seems it compares the part of speech of a match to a variable called g. But from what I can tell g is just a temp variable used in the for loops right above. Is that intentional?

@LinguList
Copy link
Contributor

How bad, yes!

@LinguList
Copy link
Contributor

Can you provide a quick fix?

@LinguList
Copy link
Contributor

Sorry, I should of course do so.

In [1]: from pysem.glosses import MAPPINGS
In [2]: MAPPINGS["de"]["arm"]
Out[2]: [['1674', 'POOR', 7, 'adjective', 1]]

@LinguList
Copy link
Contributor

The rows, with the mappings, have the POS information in row[3]. So the correct order should be:

                key=lambda x: (x[-1], 1 if pos == x[-2] else 0, x[-3]),

pos is the part of speech that has been passed directly or not. If it is present and matches, it should be preferred, therefore we give 1.

@johenglisch
Copy link
Contributor Author

Okay, thanks, I implemented the fix.

['1674', 'POOR', 7, 'adjective', 1]

Sry for going on another tangent but what do the elements in this list actually mean? From context I can reconstruct:

0: Concepticon ID
1: Concepticon Gloss
2: ?
3: Part of Speech
4: ?

(I suspect I knew the answer to this at some point and then forgot.)

@LinguList
Copy link
Contributor

0: Concepticon ID
1: Concepticon Gloss
2: HOW OFTEN THIS WAS MAPPED TO THE GLOSS
3: Part of Speech
4: IF IT IS THE PRIMARY MAPPING (we use 1/0 in concepticon/mappings)

@LinguList
Copy link
Contributor

The argument is: the more often you map something like "you (sg)" to Concepticon THOU, the more comfortable you are to map it another time.

@LinguList LinguList merged commit 2af9d59 into lingpy:master Feb 14, 2024
@johenglisch johenglisch deleted the to-concepticon-idempotency branch February 14, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants