Skip to content

Conversation

@svlandeg
Copy link
Contributor

@svlandeg svlandeg commented Jul 4, 2019

Attempt to fix spaCy issue explosion/spaCy#3880. The error in that issue is raised by ops.pyx on the line assert len(X) == 0 (https://github.com/explosion/thinc/blob/master/thinc/neural/ops.pyx#L138), and it happens when the tagger model is run with an empty doc as last element in a call to nlp.pipe().

I think this happens because the flatten function adds one additional bit of padding at the end (https://github.com/explosion/thinc/blob/master/thinc/neural/ops.pyx#L119) which is not stripped away when unflatten removes the padding only when length != 0 (https://github.com/explosion/thinc/blob/master/thinc/neural/ops.pyx#L136) for the last variable length in lengths (accessed outside of its for loop).

I don't have a proper way of testing this yet, so I'm definitely not sure about this.

@honnibal
Copy link
Member

honnibal commented Jul 5, 2019

This makes a lot of sense! I've hit this error as well recently with empty docs. It makes perfect sense that the error arises when the doc is last, thanks. It should be easy to add a test for this in the unit tests.

@honnibal honnibal merged commit 32d213b into explosion:master Jul 10, 2019
@svlandeg svlandeg deleted the bugfix/unflatten branch July 10, 2019 11:16
@svlandeg svlandeg restored the bugfix/unflatten branch October 1, 2019 11:57
@svlandeg svlandeg deleted the bugfix/unflatten branch October 1, 2019 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants