Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficult to override lexeme ID #262

Open
SimonGreenhill opened this issue Sep 14, 2022 · 4 comments
Open

Difficult to override lexeme ID #262

SimonGreenhill opened this issue Sep 14, 2022 · 4 comments

Comments

@SimonGreenhill
Copy link
Contributor

SimonGreenhill commented Sep 14, 2022

I'm creating a dataset at the moment from a relational database and I'd like the lexeme IDs to be the primary keys from the database. I assumed that if I specified ID=x in the add_forms* call, then the ID would be kept, however, the code always calls self.lexeme_id:

def lexeme_id(self, kw):
self._count[(kw['Language_ID'], kw['Parameter_ID'])] += 1
return '{0}-{1}-{2}'.format(
kw['Language_ID'],
kw['Parameter_ID'],
self._count[(kw['Language_ID'], kw['Parameter_ID'])])

Can we make lexeme_id be ignored if ID is given?

@LinguList
Copy link
Contributor

LinguList commented Sep 14, 2022 via email

@LinguList
Copy link
Contributor

LinguList commented Sep 14, 2022 via email

@SimonGreenhill
Copy link
Contributor Author

Yes -- I'm using Local_ID but this feels like a workaround. If someone is explicitly specifying an ID then we should respect that.

At the very least silently replacing given ID's with a generated one is surprising and unexpected which could lead to glitches (I only spotted this when I was analysing the CLDF data sometime down the track and could not figure out why the list of lexemes I needed was wrong). It's worse because it's only the lexemes (add_form*) that do this and languages, parameters, etc are quite happy to have my manually curated IDs.

@xrotwang
Copy link
Contributor

I think I agree that the call to lexeme_id should only happen in add_forms_from_value, because it's the potential splitting of values into several forms that made this functionality necessary in the first place. (Maybe add_form_* should have something like kw.setdefault('ID', self.lexeme_id(...)) for backwards compat.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants