You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In NLTK we can convert a parentheses tree into an actual Tree object. However, when a token contains parentheses, the parsing is not what you would expect since NLTK parses those parentheses as a new node.
As an example, take the sentence
They like(d) it a lot
This could be parsed as
(S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .))
But if you parse this with NLTK into a tree, and output it - it is clear that the (d) is parsed as a new node, which is no surprise.
Hmmm, in this case, is there a reason why the opening and closing brackets are not converted to -LRB- and -RRB- before parsing the tree?
More specifically, where did (S (NP (PRP They)) (VP like(d) (NP (PRP it)) (NP (DT a) (NN lot))) (. .)) come from? Knowing where it comes from, we can find the correct regex or preprocessing steps that's missing the -LRB- and -RRB- conversion.
In NLTK we can convert a parentheses tree into an actual Tree object. However, when a token contains parentheses, the parsing is not what you would expect since NLTK parses those parentheses as a new node.
As an example, take the sentence
This could be parsed as
But if you parse this with NLTK into a tree, and output it - it is clear that the
(d)
is parsed as a new node, which is no surprise.The result is
So
(d )
is a node inside the VP rather than part of the tokenlike
. Is there a way in the tree parser to escape parentheses?[cross post from SO]
The text was updated successfully, but these errors were encountered: