Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makecldf fails #3

Closed
SimonGreenhill opened this issue Oct 6, 2019 · 9 comments
Closed

makecldf fails #3

SimonGreenhill opened this issue Oct 6, 2019 · 9 comments
Assignees

Comments

@SimonGreenhill
Copy link
Contributor

... on this entry: word 57 = "?" here.

...which means that the following is passed to add_form:

{'Language_ID': '661', 'Parameter_ID': '57', 'Value': '?', 'Source': ['15258'], 'Cognacy': None, 'Comment': "er-jai 'be married (of woman)' p.78", 'Loan': False, 'Local_ID': '173141', 'Form': None}

... and then we fail with

Traceback (most recent call last):
  File "/Users/simon/projects/lexibank2018/env/bin/lexibank", line 11, in <module>
    load_entry_point('pylexibank', 'console_scripts', 'lexibank')()
  File "/Users/simon/projects/lexibank2018/pylexibank/src/pylexibank/__main__.py", line 139, in main
    sys.exit(parser.main())
  File "/Users/simon/projects/lexibank2018/env/lib/python3.7/site-packages/clldutils-2.8.0-py3.7.egg/clldutils/clilib.py", line 110, in main
    catch_all=catch_all, parsed_args=args)
  File "/Users/simon/projects/lexibank2018/env/lib/python3.7/site-packages/clldutils-2.8.0-py3.7.egg/clldutils/clilib.py", line 82, in main
    self.commands[args.command](args)
  File "/Users/simon/projects/lexibank2018/env/lib/python3.7/site-packages/clldutils-2.8.0-py3.7.egg/clldutils/clilib.py", line 35, in __call__
    return self.func(args)
  File "/Users/simon/projects/lexibank2018/pylexibank/src/pylexibank/commands/misc.py", line 149, in makecldf
    with_dataset(args, Dataset._install)
  File "/Users/simon/projects/lexibank2018/pylexibank/src/pylexibank/commands/util.py", line 28, in with_dataset
    func(get_dataset(args, dataset.id), **vars(args))
  File "/Users/simon/projects/lexibank2018/pylexibank/src/pylexibank/dataset.py", line 437, in _install
    if self.cmd_install(**kw) == NOOP:
  File "/Users/simon/projects/lexibank2018/abvd/lexibank_abvd.py", line 39, in cmd_install
    source=[b for b in bibs if b.id in refs.get(wl.id, [])]
  File "/Users/simon/projects/lexibank2018/pylexibank/src/pylexibank/providers/abvd.py", line 231, in to_cldf
    Local_ID=entry.id,
  File "/Users/simon/projects/lexibank2018/pylexibank/src/pylexibank/cldf.py", line 222, in add_lexemes
    lexemes = self.add_forms_from_value(split_value=split_value, **kw)
  File "/Users/simon/projects/lexibank2018/pylexibank/src/pylexibank/cldf.py", line 208, in add_forms_from_value
    kw_ = self.add_form(with_morphemes=with_morphemes, **kw_)
  File "/Users/simon/projects/lexibank2018/pylexibank/src/pylexibank/cldf.py", line 157, in add_form
    raise ValueError('language, concept, value, and form '
ValueError: language, concept, value, and form must be supplied

What's the best way to fix this? Should add_form catch this? or should this be caught before getting to add_form?

@LinguList
Copy link
Contributor

This is the new behavior!

@LinguList
Copy link
Contributor

It is there to catch explicitly these errors.

@LinguList
Copy link
Contributor

Go to line 222 in pylexibank/providers/abvd.py and change add_lexemes to add_forms_from_value.

First fix, as add_lexemes is deprecated.

@LinguList
Copy link
Contributor

Ah, and if you think that the ? is "normal" behavior, you can fix this also from within there, and add a line that checks if this form actually is a question makr or not. I.e., you check entry.name and see if it is a valid entry.

@LinguList
Copy link
Contributor

Ah, I get it now, sorry for not reading properly before. There is something wrong, I guess, as the entry should not pass the clean_form command, which is called after splitting it, and yields an empty form, which is then NOT passed to add_form. And clean_form reacts per default specifically on ? as a character here.

@LinguList
Copy link
Contributor

LinguList commented Oct 6, 2019

So the fix we need is in pylexibank code, dataset.py:

    def split_forms(self, item, value):
        if value in self.lexemes:  # pragma: no cover
            self.log.debug('overriding via lexemes.csv: %r -> %r' % (value, self.lexemes[value]))
        value = self.lexemes.get(value, value)
        return [self.clean_form(item, form)
                for form in split_text_with_context(value, separators='/,;')]

needs to be modified to:

    def split_forms(self, item, value):
        if value in self.lexemes:  # pragma: no cover
            self.log.debug('overriding via lexemes.csv: %r -> %r' % (value, self.lexemes[value]))
        value = self.lexemes.get(value, value)
        forms = [self.clean_form(item, form)
                for form in split_text_with_context(value, separators='/,;')]
        return [f for f in form if f]

Or similar. As this will only return forms that are not None, and this is crashing the code by now.

@LinguList
Copy link
Contributor

I'd say, this is a good example, why it was good to modify the behavior of the add_lexemes to being more transparent.

@LinguList
Copy link
Contributor

Just linked this as a bug in pylexibank.

@SimonGreenhill
Copy link
Contributor Author

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants