Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unshift" action can result in early parse termination and disconnected parses #7056

Closed
honnibal opened this issue Feb 14, 2021 · 1 comment · Fixed by #7057
Closed

"Unshift" action can result in early parse termination and disconnected parses #7056

honnibal opened this issue Feb 14, 2021 · 1 comment · Fixed by #7057
Labels
bug Bugs and behaviour differing from documentation feat / parser Feature: Dependency Parser

Comments

@honnibal
Copy link
Member

honnibal commented Feb 14, 2021

This bug results in the behaviour described by #7035.

For v3 I refactored the parser transition system and some of the StateClass internals. In doing this, I introduced a bug that could result in states incorrectly being marked as finished. Parses in this state would be disconnected, which would show up as sentence boundaries.

The details of this are tricky and technical, but for the record:

The parser's transition system is based on the arc-eager transitions, but it allows "non-monotonic" actions that can correct for previous mistakes. One such action is if the model predicts Reduce when that stack-item does not have a head, the stack-item is placed back in the buffer.

In order to improve the efficiency of the beam parsing, I changed the StateC data structure so that these rebuffered items are accumulated in a vector, but otherwise the position in the sentence is marked by an integer. This lets us copy states more efficiently than the previous structure that maintained a queue of all words in the buffer.

The bug that I introduced was to not check the state._rebuffered queue when checking whether we were at the end of a sentence. Therefore, if we reached the end of the sentence and there were words on the rebuffered queue, the state would say "Okay I have no words in the buffer and no words on the stack, we're finished!". But we weren't finished --- we hadn't processed the rebuffered words.

Getting to the end of the sentence and rebuffering words is part of the intended design of the transition system, especially for short sentences. Consider this example of the bug, using the transitions predicted by the en_core_web_sm model. We'll write the state like this, Stack | Buffer, and then the transition on the next line, followed by the resulting state.

| Severe pain , after trauma 
    Shift 
Severe | pain , after trauma
    L-amod (pain, Severe) 
| pain , after trauma
    Shift
pain | , after trauma
    Shift       (This is an error, R-pobj is correct here)
pain , | after trauma
    Shift       (This is also an error: we should Reduce (unshift) the "," so we can attach 'after' to 'pain')
pain , after | trauma
    R-pobj (after, trauma)
pain , after trauma |
    Reduce
pain , after |

At this state we have pain, , and after on the stack, and none of them have heads (as they were all moved onto the stack with Shift, rather than Right-Arc). We're in this state due to previous mistakes, but we want to teach the parser to recover from it. The non-monotonic "unshift" action lets us do this. The parser model correctly predicts the next action:

pain , after |
    Reduce (unshift)
pain , | after

This is where the bug occurrs. At this state, the parse state would be regarded as complete, and we'd have three sentences headed by the words that weren't attached yet. After the bug fix, the parser continues correctly:

pain , after |
    Reduce (unshift)
pain , | after
    Reduce (unshift)
pain | , after
    R-punct (pain, )
pain , | after
    Reduce
pain | after
    R-prep
pain after |
    Reduce
pain |
    Reduce
|
@honnibal honnibal added bug Bugs and behaviour differing from documentation feat / parser Feature: Dependency Parser labels Feb 14, 2021
honnibal added a commit that referenced this issue Feb 14, 2021
* Add test for #7035

* Update test for issue 7056

* Fix test

* Fix transitions method used in testing

* Fix state eol detection when rebuffer

* Clean up redundant fix
@github-actions
Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / parser Feature: Dependency Parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant