Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The most frequent incorrectly assigned dependencies #1649

Closed
redstar12 opened this issue Nov 27, 2017 · 4 comments
Closed

The most frequent incorrectly assigned dependencies #1649

redstar12 opened this issue Nov 27, 2017 · 4 comments
Labels
usage General spaCy usage

Comments

@redstar12
Copy link

How can I find out which incorrectly assigned dependencies are the most frequent?

@honnibal
Copy link
Member

  1. Parse lots of text

  2. Make a frequency count of (head word, dependency relation, child word) triples

  3. Annotate the top 1000 or so, marking them according to whether they seem likely to be incorrect.

This is a good idea for improving the parser on your data. You might want to sign up for the beta of our annotation tool Prodigy: https://prodi.gy . What I did before Prodigy was to have a text file and easy macros that would mark an entry correct or incorrect and move to the next like, so that decisions would be made in a single keypress. This will work fine for your task.

@honnibal honnibal added the usage General spaCy usage label Nov 28, 2017
@redstar12
Copy link
Author

Thank you very much for your reply! Prodigy is great! But there is no possibility to annotate syntactic dependencies in the moment, right? When is this option available?

@ines
Copy link
Member

ines commented Nov 28, 2017

@redstar12 Thanks! At the moment, no, but we're definitely planning a displaCy-style annotation interface as well. For this use case, you're only going to be looking at one dependency at a time and collecting simple, binary feedback. So you could just add a string version to your data, and create annotation tasks that look like this:

{"text": "like → nsubj → I", "data": {"head": "like", "dep": "nsubj", "child": "I", "freq": 40302}}

You can then and use the mark recipe to collect annotations in order. Prodigy will show you the "text" in the web app, and add an "answer" key to your annotation task in the dataset.

Btw, if you do want to generate displaCy visualizations for your extracted dependencies, for example so you can inspect them better, you can use the built-in displacy module with manual=True and pass in a dictionary of words and arcs – or in your case, only the one arc you care about. See here for more info.

deps = {'words': [{'text': 'I'}, {'text': 'like'}, {'text': 'green'}, {'text': 'apples'}], 
        'arcs': [{'start': 0, 'end': 1, 'label': 'nsubj', 'dir': 'left'}]}
svg = displacy.render(deps, style='dep', manual=True)

@ines ines closed this as completed Dec 6, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

3 participants