Escape parentheses in NLTK parse tree #2352

BramVanroy · 2019-08-01T16:01:53Z

In NLTK we can convert a parentheses tree into an actual Tree object. However, when a token contains parentheses, the parsing is not what you would expect since NLTK parses those parentheses as a new node.

As an example, take the sentence

They like(d) it a lot

This could be parsed as

(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))

But if you parse this with NLTK into a tree, and output it - it is clear that the (d) is parsed as a new node, which is no surprise.

from nltk import Tree

s = '(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))'

tree = Tree.fromstring(s)
print(tree)

The result is

(S
  (NP (PRP They))
  (VP like (d ) (NP (PRP it)) (NP (DT a) (NN lot)))
  (. .))

So (d ) is a node inside the VP rather than part of the token like. Is there a way in the tree parser to escape parentheses?

[cross post from SO]

The text was updated successfully, but these errors were encountered:

alvations · 2019-08-20T05:46:56Z

Hmmm, in this case, is there a reason why the opening and closing brackets are not converted to -LRB- and -RRB- before parsing the tree?

More specifically, where did (S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .)) come from? Knowing where it comes from, we can find the correct regex or preprocessing steps that's missing the -LRB- and -RRB- conversion.

alvations added the parsing label Aug 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape parentheses in NLTK parse tree #2352

Escape parentheses in NLTK parse tree #2352

BramVanroy commented Aug 1, 2019

alvations commented Aug 20, 2019

Escape parentheses in NLTK parse tree #2352

Escape parentheses in NLTK parse tree #2352

Comments

BramVanroy commented Aug 1, 2019

alvations commented Aug 20, 2019