Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmenter error #127

Open
milesscherrer opened this issue May 9, 2017 · 2 comments
Open

Segmenter error #127

milesscherrer opened this issue May 9, 2017 · 2 comments

Comments

@milesscherrer
Copy link

We are building a Swedish IEPY pipeline where we do not have a syntactic parser (there is none available in Swedish). Running the active learning core, we get the following error in the hydrate function of models.py.

File "/home/ubuntu/.local/lib/python3.5/site-packages/iepy/data/models.py", line 388, in
self.syntactic_sentences = [doc.syntactic_sentences[s] for s in self.sentences]
IndexError: list index out of range

We backtracked the error to our non-existent parsing output, which as we understand it is used by the segmenter. As we do not have a syntactic parser, is there some way of bypassing the segment-based labelling and only doing the document based? Or any way to bypass the syntactic parsing in the segmenter? As syntactic parsing was added in the 0.9.3 version, how did the segmenter work at that point?

@jmansilla
Copy link
Contributor

It seems that the easiest hack from your side is to add a dummy SyntacticParsing step to your preprocess pipeline. It should return a sequences of parse trees strings (in the format that can be parsed by nltk.tree.Tree.fromstring).

So in pseudocode your dummy syntactic parser should be doing:
syntactic_parsing = ["()" for sent in sentences]

@milesscherrer
Copy link
Author

milesscherrer commented May 9, 2017

Thanks, I'll try that out. I also looked into the 0.9.2 version (before syntactic parsing was added) and saw that it was missing the line 388 in models.py.

self.syntactic_sentences = [doc.syntactic_sentences[s] for s in self.sentences]

We tried removing the line and got it working. Running it however, there was an error in the entity offset (entities pointing to the previous token) for most sentences with a couple of sentences with correct entity offsets in the beginning.

Not sure if the entity offset error is due to our general customisation of IEPY or if it could be due to the removal of line 388, but we are trying to backtrack the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants