The most frequent incorrectly assigned dependencies #1649

redstar12 · 2017-11-27T11:11:16Z

How can I find out which incorrectly assigned dependencies are the most frequent?

honnibal · 2017-11-28T15:28:15Z

Parse lots of text
Make a frequency count of (head word, dependency relation, child word) triples
Annotate the top 1000 or so, marking them according to whether they seem likely to be incorrect.

This is a good idea for improving the parser on your data. You might want to sign up for the beta of our annotation tool Prodigy: https://prodi.gy . What I did before Prodigy was to have a text file and easy macros that would mark an entry correct or incorrect and move to the next like, so that decisions would be made in a single keypress. This will work fine for your task.

redstar12 · 2017-11-28T18:54:16Z

Thank you very much for your reply! Prodigy is great! But there is no possibility to annotate syntactic dependencies in the moment, right? When is this option available?

ines · 2017-11-28T22:28:09Z

@redstar12 Thanks! At the moment, no, but we're definitely planning a displaCy-style annotation interface as well. For this use case, you're only going to be looking at one dependency at a time and collecting simple, binary feedback. So you could just add a string version to your data, and create annotation tasks that look like this:

{"text": "like → nsubj → I", "data": {"head": "like", "dep": "nsubj", "child": "I", "freq": 40302}}

You can then and use the mark recipe to collect annotations in order. Prodigy will show you the "text" in the web app, and add an "answer" key to your annotation task in the dataset.

Btw, if you do want to generate displaCy visualizations for your extracted dependencies, for example so you can inspect them better, you can use the built-in displacy module with manual=True and pass in a dictionary of words and arcs – or in your case, only the one arc you care about. See here for more info.

deps = {'words': [{'text': 'I'}, {'text': 'like'}, {'text': 'green'}, {'text': 'apples'}], 
        'arcs': [{'start': 0, 'end': 1, 'label': 'nsubj', 'dir': 'left'}]}
svg = displacy.render(deps, style='dep', manual=True)

lock · 2018-05-08T05:55:37Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the usage General spaCy usage label Nov 28, 2017

ines closed this as completed Dec 6, 2017

lock bot locked as resolved and limited conversation to collaborators May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The most frequent incorrectly assigned dependencies #1649

The most frequent incorrectly assigned dependencies #1649

redstar12 commented Nov 27, 2017

honnibal commented Nov 28, 2017

redstar12 commented Nov 28, 2017

ines commented Nov 28, 2017

lock bot commented May 8, 2018

The most frequent incorrectly assigned dependencies #1649

The most frequent incorrectly assigned dependencies #1649

Comments

redstar12 commented Nov 27, 2017

honnibal commented Nov 28, 2017

redstar12 commented Nov 28, 2017

ines commented Nov 28, 2017

lock bot commented May 8, 2018