Stabilized MaltParser API #944

alvations · 2015-04-09T02:03:58Z

From #943,

MaltParser was requiring all sorts of weird os.environ to make it find the binary and then call jar file with environment java classpath.

The new API requires only where the user saves his/her installed version of maltparser and finds the jar files using os.walk and uses full classpath and org.maltparser.Malt to call Maltparser instead of -jar
Also the generate_malt_command makes updating the API to suit Maltparser easier.

I've tried with Maltparser-1.7.2 and Maltparser-1.8

This is in response to the multiple questions - http://stackoverflow.com/questions/14009330/how-to-use-malt-parser-in-python-nltk - http://stackoverflow.com/questions/21815891/dependency-parser-using-nltk-and-maltparser - http://stackoverflow.com/questions/20091698/malt-parser-throwing-class-not-found-exception - http://stackoverflow.com/questions/29513187/maltparser-not-working-in-python-nltk

By using the -cp, it's more dynamic than calling the jar file and then using `os.environ` to setup the dependencies.

TODO: train model from scratch

alvations · 2015-04-09T02:15:41Z

However there remain problems with DependencyGraph and how it reads the maltparser output files.

Pre-trained models from http://www.maltparser.org/mco/mco.html outputs uncased chunk labels, e.g. nsubj, null, dobj, poss:

1    I    _    PRP    PRP    _    2    nsubj    _    _
2    shot    _    VBD    VBD    _    0    null    _    _
3    an    _    DT    DT    _    4    det    _    _
4    elephant    _    NN    NN    _    2    dobj    _    _
5    in    _    IN    IN    _    2    prep    _    _
6    my    _    PRP$    PRP$    _    7    poss    _    _
7    pajamas    _    NN    NN    _    5    pobj    _    _

But DependencyChart is expecting nice chunk tags, e.g. ROOT, SUBJ, SPEC, OBJ. E.g.

1    John    _    NNP   _    _    2    SUBJ    _    _
2    sees    _    VB    _    _    0    ROOT    _    _
3    a       _    DT    _    _    4    SPEC    _    _
4    dog     _    NN    _    _    2    OBJ     _    _

The demo is fine with we parse using a trained model from NLTK. So the awkward find_binary and NLTK's job to call MaltParser to retrieve the output is seamless.

But there's still problem when reading the parses from a pre-trained model in NLTK:

from nltk.parse import malt
from nltk import word_tokenize, sent_tokenize

indir = '/home/alvas/maltparser-1.7.2/dist/maltparser-1.7.2/'
modelfilepath = '/home/alvas/engmalt.linear-1.7.mco'
maltParser = malt.MaltParser(path_to_maltparser=indir, model=modelfilepath)

sentences = [word_tokenize(sent) for sent in sent_tokenize('I shot an elephant in my pajamas. This is a foobar sentence')]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in sentences]

maltParser.tagged_parse_sents(tagged_sentences)

[out]:

Traceback (most recent call last):
  File "/home/alvas/git/nltk/test_malt.py", line 15, in <module>
    print maltParser.tagged_parse_sents(tagged_sentences)
  File "/home/alvas/git/nltk/newermalt.py", line 132, in tagged_parse_sents
    DependencyGraph.load(output_file.name))
  File "/home/alvas/git/nltk/nltk/parse/dependencygraph.py", line 156, in load
    for tree_str in infile.read().split('\n\n')
  File "/home/alvas/git/nltk/nltk/parse/dependencygraph.py", line 72, in __init__
    cell_separator=cell_separator,
  File "/home/alvas/git/nltk/nltk/parse/dependencygraph.py", line 260, in _parse
    "The graph does'n contain a node "
nltk.parse.dependencygraph.DependencyGraphError: The graph does'n contain a node that depends on the root element.

Although, there was an outputfile created from MaltParser if we add print output_file.read() before https://github.com/alvations/nltk/blob/develop/nltk/parse/malt.py#L158

output_file.read() prints:

1   I   _   PRP PRP _   2   nsubj   _   _
2   shot    _   VBD VBD _   0   null    _   _
3   an  _   DT  DT  _   4   det _   _
4   elephant    _   NN  NN  _   2   dobj    _   _
5   in  _   IN  IN  _   2   prep    _   _
6   my  _   PRP$    PRP$    _   7   poss    _   _
7   pajamas _   NN  NN  _   5   pobj    _   _
8   .   _   .   .   _   2   punct   _   _

1   This    _   DT  DT  _   5   nsubj   _   _
2   is  _   VBZ VBZ _   5   cop _   _
3   a   _   DT  DT  _   5   det _   _
4   foobar  _   NN  NN  _   5   nn  _   _
5   sentence    _   NN  NN  _   0   null    _

alvations · 2015-04-09T02:25:28Z

@dhgarrette , @kmike, @heatherleaf , @stevenbird .

Any idea why the pre-trained model outputs is unreadable by DependencyChart.load()?
Or are there some secret options in maltparser that can make it readable to DependencyChart?

I'll leave this as it is now and let someone else deal with the dependency parses. I'll go back to the translate, model and align packages =)

Santosh-Gupta · 2015-04-09T19:54:32Z

Thanks Alvations!!

I was wondering if you could give an example of how to use it in python.

alvations · 2015-04-09T21:08:39Z

@Santosh-Gupta , the demo() shows how you can train a parser and then use it. But loading the pre-trained model is still messy because of DependencyChart objects

stevenbird · 2015-04-18T23:32:04Z

@alvations, that error message was introduced in e0f0630#diff-31ba76604fcce0dbd82cdfd1dba4233d.

@dimazest it looks like this change gets in the way of loading pre-trained models. Are you able to investigate please?

stevenbird · 2015-04-27T06:56:41Z

Just pinging you again @dimazest

dimazest · 2015-04-27T13:50:26Z

Sorry, I somehow missed the first mention, I'll have a look to this right now...

…ion. This should resolve issues faced at nltk#944. However, there is code that depends on a fake root node, for example the tree visualisation code reads this and FStructure.to_depgraph() sets it.

stevenbird · 2015-05-19T01:03:11Z

@dimazest thanks for the PR. @alvations, are you able to load pre-trained models now?

…one() and pre-trained models

alvations · 2015-05-26T13:38:15Z

Sorry for the late reply. @dimazest thanks for the fix!! @stevenbird, now the malt API works with pre-trained model.

I'm not sure why it only works with malt.MaltParser.parse_one(sentence):

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)

But when i tried to do malt.MaltParser.parse_sents(sentences) for multiple sentence, it didn't return me an iterable of DependencyGraph but a listiterator:

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)
>>> print(next(mp.parse_sents([sent,sent2])))
<listiterator object at 0x7f0a2e4d3d90> 
>>> print(next(next(mp.parse_sents([sent,sent2]))))
[{u'address': 0,
  u'ctag': u'TOP',
  u'deps': [2],
  u'feats': None,
  u'lemma': None,
  u'rel': u'TOP',
  u'tag': u'TOP',
  u'word': None},
 {u'address': 1,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'I'},
 {u'address': 2,
  u'ctag': u'NN',
  u'deps': [1, 11],
  u'feats': u'_',
  u'head': 0,
  u'lemma': u'_',
  u'rel': u'null',
  u'tag': u'NN',
  u'word': u'shot'},
 {u'address': 3,
  u'ctag': u'AT',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'AT',
  u'word': u'an'},
 {u'address': 4,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'elephant'},
 {u'address': 5,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'in'},
 {u'address': 6,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'my'},
 {u'address': 7,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'pajamas'},
 {u'address': 8,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'Time'},
 {u'address': 9,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'flies'},
 {u'address': 10,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'like'},
 {u'address': 11,
  u'ctag': u'NN',
  u'deps': [3, 4, 5, 6, 7, 8, 9, 10],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'dep',
  u'tag': u'NN',
  u'word': u'banana'}]

alvations · 2015-06-03T10:24:28Z

With help from http://goo.gl/TpW1iY, I manage to get a tree from parse_sents() by calling print(next(next(mp.parse_sents([sent,sent2]))).tree()). Somehow the parse_sents() looks to be broken, it was combining two sentences into one instead of parsing them separately.

    # Initialize a MaltParser object with a pre-trained model.
    mp = MaltParser(path_to_maltparser=path_to_maltparser, model=path_to_model) 
    sent = 'I shot an elephant in my pajamas'.split()
    sent2 = 'Time flies like banana'.split()
    # Parse a single sentence.
    print(mp.parse_one(sent).tree())
    print(next(next(mp.parse_sents([sent,sent2]))).tree())

[out]:

(pajamas (shot I) an elephant in my)
(shot I (banana an elephant in my pajamas Time flies like))

alvations · 2015-06-04T16:28:35Z

@dimazest @stevenbird: Fixed at last, now we can easily malt any sentences with the API. And i'll be able to use this for tree2string models in nltk.translate

stevenbird · 2015-06-09T00:31:48Z

Thanks @alvations and @dimazest.

If either of you has time it would be nice to include a doctest with little demonstration in the docstring for the MaltParser class, cf: https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L120

Syncing with bleeding edge develop branch

dimazest · 2015-07-24T06:59:26Z

nltk/parse/malt.py

+        (shot I (elephant an) (in (pajamas my)) .)
+	"""
+	def __init__(self, parser_dirname, model_filename=None, tagger=None, 
+				 additional_java_args=[]):


Please make additional_java_args=None and add this

if additional_java_args is None: additional_java_args = []

as having mutable default parameters might lead to obscure bugs.

alvations · 2015-07-24T19:31:00Z

@dimazest , @stevenbird It's all patched up.

stevenbird · 2015-08-14T14:08:45Z

Thanks @dimazest for the code review, and @alvations for all this work. It's looking good to me, so I'm going to merge.

Stabilized MaltParser API

alvations added 7 commits April 8, 2015 18:33

Added a way to find the MaltParser dependencies

27657d2

Changed the jar file to Java-nic --classpath

f6f4c9c

By using the -cp, it's more dynamic than calling the jar file and then using `os.environ` to setup the dependencies.

Update malt.py

1f9de89

Fixed parsed from pre-trained model

b3990ea

TODO: train model from scratch

PEP8 80 margin width

2651283

Working version of malt.py

0bd226c

alvations mentioned this pull request Apr 9, 2015

Stabalize MaltParser once and for all #943

Closed

Made tagged sentence to conll into a function

ed2fbf7

stevenbird self-assigned this Apr 18, 2015

dimazest mentioned this pull request Apr 28, 2015

Warn a user when there is no ROOT node, instead of throwing an exception... #966

Merged

alvations added 2 commits May 26, 2015 10:28

resolved merge conflict for malt.py

6fe11ed

added fixed the conflicts and malt parser api now working with parse_…

c96216c

…one() and pre-trained models

remove unused imports

0bc00c4

alvations added 2 commits June 4, 2015 18:25

fixed the wrong indent at line 56

4e530e3

removed the non-cannonical test case, now the demo works =)

8454819

Merge pull request #25 from nltk/develop

cf96304

Syncing with bleeding edge develop branch

dimazest reviewed Jul 24, 2015
View reviewed changes

alvations added 10 commits July 24, 2015 20:14

Made changes to suggestions from @dimazest's code review

136f682

removed the weird whitespace

d71b00e

added newline to between top level function

54141f5

added newline to between top level function

c000a5b

ensure \n\n between top level functions

5540623

changed tagged_sents in parsed_sents to a generator instead of list

f4c16c2

Cut down the demo, put demo in docstrings

026040b

Use context manager in parse_tagged_sents() instead of try...finally

1cdc8fa

Remove un-needed demo code outside of docstring

8b6b4bb

move taggedsent_to_conll() to nltk.parse.util

90a0471

stevenbird added a commit that referenced this pull request Aug 14, 2015

Merge pull request #944 from alvations/patch-1

73fc655

Stabilized MaltParser API

stevenbird merged commit 73fc655 into nltk:develop Aug 14, 2015

alvations deleted the patch-1 branch August 25, 2015 10:36

alvations mentioned this pull request Nov 30, 2015

nltk.parse.MaltParser instantiation fails if malt .jar not found on hardcoded path list #311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilized MaltParser API #944

Stabilized MaltParser API #944

alvations commented Apr 9, 2015

alvations commented Apr 9, 2015

alvations commented Apr 9, 2015

Santosh-Gupta commented Apr 9, 2015

alvations commented Apr 9, 2015

stevenbird commented Apr 18, 2015

stevenbird commented Apr 27, 2015

dimazest commented Apr 27, 2015

stevenbird commented May 19, 2015

alvations commented May 26, 2015

alvations commented Jun 3, 2015

alvations commented Jun 4, 2015

stevenbird commented Jun 9, 2015

dimazest Jul 24, 2015

alvations commented Jul 24, 2015

stevenbird commented Aug 14, 2015

Stabilized MaltParser API #944

Stabilized MaltParser API #944

Conversation

alvations commented Apr 9, 2015

alvations commented Apr 9, 2015

alvations commented Apr 9, 2015

Santosh-Gupta commented Apr 9, 2015

alvations commented Apr 9, 2015

stevenbird commented Apr 18, 2015

stevenbird commented Apr 27, 2015

dimazest commented Apr 27, 2015

stevenbird commented May 19, 2015

alvations commented May 26, 2015

alvations commented Jun 3, 2015

alvations commented Jun 4, 2015

stevenbird commented Jun 9, 2015

dimazest Jul 24, 2015

Choose a reason for hiding this comment

alvations commented Jul 24, 2015

stevenbird commented Aug 14, 2015