-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chomsky_normal_form() for grammars #1884
Comments
The Grammar transformation to CNF is rather complex and hasn't yet been implemented. It would be good if there's an attempt at it, but it might not be trival. If anyone is interested in contributing, a good algorithm to start out is
|
I was looking around for exactly this and stumbled here, Also if there's no CYK parser as mentioned in the other issue, I'd be happy to send a PR for that as well |
@virresh @alvations Sorry for letting #1722 get stale. Life has gotten in the way. @virresh If you'd like to take a wack at this be my guest. I wrote a conversion to CNF years ago. https://github.com/aetilley/pcfg/blob/master/src/pcfg.py#L524 I remember it reordering the steps in the usual algorithm but I also remember convincing myself that they were equivalent. Proceed with caution. I'm going to close the other issue. |
There is a problem with CNFs for CFGs; they are returning duplicated productions. You can test this by: import nltk
grammar = nltk.data.load("grammars/large_grammars/atis.cfg")
grammar = grammar.chomsky_normal_form()
print(len(grammar.productions()))
print(len(list(set(grammar.productions())))) grammar has |
I have just stumbled upon this thread. I have written my own version of the CFG.chomsky_normal_form() method because the one present was incomplete (cannot deal with empty productions, etc.). Mine works. Also simplifies the grammar if possible. I'd be happy to contribute it. I haven't contributed to NLTK before. How does this work? |
@stefkauf Information on how to contribute can be found in CONTRIBUTING.md. |
nltk.tree.Tree
has achomsky_normal_form()
function, but grammars don't. Since CNF is a form of the grammar, it should, also.The text was updated successfully, but these errors were encountered: