Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition-based dependency parsing algorithms #694

Closed
stevenbird opened this issue Jul 8, 2014 · 16 comments
Closed

Transition-based dependency parsing algorithms #694

stevenbird opened this issue Jul 8, 2014 · 16 comments
Assignees
Milestone

Comments

@stevenbird
Copy link
Member

Implement Nivre's arc-eager and arc-standard algorithms.
http://aclweb.org/anthology/J08-4003.pdf
(Note that there have been some minor improvements made to the algorithms since these publications; please contact the author for details.)

@charnugagoo
Copy link

I strongly recommend to implement this algorithm or shift-reduce parser since these linear parsing algorithm could be very fast comparing to bottom-up parser. For some situation, say parsing large data set, this would be very useful. And in academic, these linear parsers could be almost as good (in accuracy) as bottom-up parser.

I really want to work on this, but I don't have much after-work time. I wish I could help not as a coder, as a reviewer or at least in discussion.

Thanks!

@stnatic
Copy link

stnatic commented Nov 25, 2014

I would be interested in contributing. I think that the biggest question is how to implement the classifier. We could either do the entire training process in Python or to start by importing a pre-trained model, e.g. from MaltParser.

@charnugagoo we could start a thread on nltk-dev group and hear some feedback from the authors of ntlk

@stevenbird
Copy link
Member Author

Our philosophy in NLTK is to provide both, i.e. pure Python implementation(s) that have educational value, and interfaces to optimised third-party implementations. If you can pick a pre-trained model that you would like included with the NLTK-data collection, I'm happy to take care of adding that.

@charnugagoo
Copy link

@stnatic Personally, I recommend perceptron, which works well when I was writing my own shift-reduce parser. An average perceptron is very easy to implement. I searched there are some papers talking about perceptron + arc eager, so I think it would work in this case.

I agree. Could you send out an email to nltk-dev, please?

@stevenbird stevenbird added this to the late-December milestone Dec 9, 2014
@stnatic
Copy link

stnatic commented Dec 12, 2014

I had almost no time during the last 2 weeks so I apologize for not starting a mailing group thread.

I see that @longdt219 is now assigned to this issue. Although, it seems to be a rather big task assuming that the classifier would be written in pure Python. Is there still any room for contribution? If so, please let me know.

@longdt219
Copy link
Contributor

Hi @stnatic, I'm a bit confuse here, why we should re-implement the classifier given that nltk already provides many classifiers i.e. svm, maxent etc.

@stnatic
Copy link

stnatic commented Dec 16, 2014

Hi @longdt219, I didn't mean implementing SVM from the ground up, I rather meant that converting a CoNLL treebank data to learning data for the learning black-box might take a solid amount of programming effort.

I think that the standard feature model used for arc-eager is the one described in http://www.maltparser.org/userguide.html#featurespec, I've read some paper that also worked assuming that model.

I believe that for a treebank data to be converted to a set of learning examples it is required to solve the following problem - given feature vector as above one would have to find an operation (e.g. do operation LEFT-ARC with label OBJ) that would lead to a correct parse.

If there is any way in which assist you on this then let me know. I am greatly interested in contributing to this particular issue.

@longdt219
Copy link
Contributor

Hi @stevenbird
In the original implementation, they use libsvm. Could we include libsvm to nltk or should we substitute with the implemented maxent classifier ?

@stevenbird
Copy link
Member Author

It is fine to have a libsvm dependency. The only question is whether we need to redistribute the svm.py wrapper, or else use pip or some other package management tool to install it (cf https://github.com/Salinger/libsvm-python).

@jnothman
Copy link
Contributor

Another alternative is to rely on scikit-learn's libsvm wrappers.

On 23 December 2014 at 10:07, Steven Bird notifications@github.com wrote:

It is fine to have a libsvm dependency. The only question is whether we
need to redistribute the svm.py wrapper, or else use pip or some other
package management tool to install it (cf
https://github.com/Salinger/libsvm-python).


Reply to this email directly or view it on GitHub
#694 (comment).

@stevenbird
Copy link
Member Author

Thanks @jnothman – that would be better, since we already have a dependency on scikit-learn.

@longdt219
Copy link
Contributor

Hi @stevenbird and all,
Here is some results from our arc-eager and arc-standard implementation compared with Malt Parser on part of English PTB with

  • training data : 181 sentences (165: projective, 16: non-projective)
  • testing data : 214 sentences

Our implementation :
A. Arc-standard (UAS,LAS)

  • Training data : (0.961, 0.956)
  • Test data : (0.737, 0.708)

B. Arc-Eager (UAS,LAS)

  • Training data : (0.954, 0.948)
  • Test data : (0.726, 0.696)

Malt parser :
A. Arc-Standard (UAS,LAS)

  • Test data : (0.756, 0.725)

B. Arc-Eager

  • Test data (0.759, 0.726)

The discrepancies between our implementation and Malt parser are

  • Different feature sets: For simplicity we only use basic features.
  • Learning algorithm: Our implementation is slower mainly because scikit-learn libsvm wrapper haven't implemented SVC.decision_function for sparse matrix. I have to used SVC.predict_proba which use cross-validation.

Some points of implementation I would like to discuss :
(1) Currently, the code only accept ConLL shared task data format for both training and testing. What else should I implement ?
(2) Should I add more complicated features for better performance
(3) Should we focus on improving the learning speed by

  • Use different wrapper for libsvm
  • Separate the training data to different bin based on some features. Train different classifiers for each bin

@syllog1sm
Copy link

Hi,

You may be interested in my write up on shift-reduce dependency parsing here: https://honnibal.wordpress.com/2013/12/18/a-simple-fast-algorithm-for-natural-language-dependency-parsing/ . Note that the post describes the arc-hybrid transition system, just because it was slightly easier to describe.

The implementation, https://gist.github.com/syllog1sm/10343947 , is BSD licensed, so you can adapt it for use in NLTK directly if you like.

I think there are a few things worth emphasising, as far as performance goes (both efficiency and accuracy).

First, it's really much better to use Averaged Perceptron, or some other method which can be trained in an error-driven way. You don't want to do batch learning. Batch learning makes it difficult to train from negative examples effectively, and this makes a very big difference to accuracy.

Second, when you train the parser, you parser, you should really use the Goldberg and Nivre (2012) "dynamic oracle" strategy. This means that you follow the moves the parser predicts during training, rather than forcing the parser to follow only the gold-standard derivations. Again, this makes your training data match the run-time conditions much more closely, which makes it much more accurate.

Third, I've not taken any trouble to optimise the perceptron in that implementation, since I wanted to keep the code concise for the tutorial. If you map all your strings to integers and use numpy arrays, you should be able to make the parser much faster.

@longdt219
Copy link
Contributor

Thank you @syllog1sm. They are very detail comments.

@stevenbird stevenbird modified the milestones: 3.0.2, late-December Jan 16, 2015
stevenbird added a commit that referenced this issue Jan 18, 2015
@stevenbird
Copy link
Member Author

Thanks @syllog1sm – our goal here has just been to implement the published algorithms, for educational purposes. Further extensions are welcome, but beyond the scope of what @longdt219 is able to do. Feel free to submit a pull request any time.

@stevenbird
Copy link
Member Author

I've updated our roadmap for dependency parser support, including Honnibal's shift-reduce parser, cf.
https://github.com/nltk/nltk/wiki/Dependency-Parsing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants