Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape parentheses in NLTK parse tree #2352

Open
BramVanroy opened this issue Aug 1, 2019 · 1 comment
Open

Escape parentheses in NLTK parse tree #2352

BramVanroy opened this issue Aug 1, 2019 · 1 comment
Labels

Comments

@BramVanroy
Copy link

In NLTK we can convert a parentheses tree into an actual Tree object. However, when a token contains parentheses, the parsing is not what you would expect since NLTK parses those parentheses as a new node.

As an example, take the sentence

They like(d) it a lot

This could be parsed as

(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))

But if you parse this with NLTK into a tree, and output it - it is clear that the (d) is parsed as a new node, which is no surprise.

from nltk import Tree

s = '(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))'

tree = Tree.fromstring(s)
print(tree)

The result is

(S
  (NP (PRP They))
  (VP like (d ) (NP (PRP it)) (NP (DT a) (NN lot)))
  (. .))

So (d ) is a node inside the VP rather than part of the token like. Is there a way in the tree parser to escape parentheses?

[cross post from SO]

@alvations
Copy link
Contributor

Hmmm, in this case, is there a reason why the opening and closing brackets are not converted to -LRB- and -RRB- before parsing the tree?

More specifically, where did (S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .)) come from? Knowing where it comes from, we can find the correct regex or preprocessing steps that's missing the -LRB- and -RRB- conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants