-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Devise data structure (AST) for the parsed results of user-supplied commands #37
Comments
One thing that will need to be decided upon is which sentence type to parse. The user commands can either be interpreted as imperative statements: "pick up the book" or as implicit 1st person narrative: "I pick up the book". In either case, OpenNLP parses them differently. Compare: {:chunked ({:phrase ["I"] :tag "NP"}
{:phrase ["pick"] :tag "VP"}
{:phrase ["up"] :tag "PRT"}
{:phrase ["the" "book"] :tag "NP"})
:tagged (["I" "PRP"]
["pick" "VBP"]
["up" "RP"]
["the" "DT"]
["book" "NN"]
["." "."])} with: {:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["up"] :tag "SBAR"}
{:phrase ["the" "book"] :tag "NP"})
:tagged (["Pick" "VBG"]
["up" "IN"]
["the" "DT"]
["book" "NN"]
["." "."])} Then, contrast these with the following: {:chunked ({:phrase ["I"] :tag "NP"}
{:phrase ["pick"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["up"] :tag "ADVP"})
:tagged (["I" "PRP"]
["pick" "VBP"]
["the" "DT"]
["book" "NN"]
["up" "RB"]
["." "."])} {:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["up"] :tag "ADVP"})
:tagged (["Pick" "VBG"]
["the" "DT"]
["book" "NN"]
["up" "RB"]
["." "."])} Tagged part-of-speech references: |
We are going to want to capitalize, though: {:chunked ({:phrase ["pick"] :tag "VP"}
{:phrase ["up"] :tag "ADVP"}
{:phrase ["the" "book"] :tag "NP"})
:tagged (["pick" "VB"] ["up" "RP"] ["the" "DT"] ["book" "NN"] ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["up"] :tag "SBAR"}
{:phrase ["the" "book"] :tag "NP"})
:tagged (["Pick" "VBG"] ["up" "IN"] ["the" "DT"] ["book" "NN"] ["." "."])} |
These might serve as a good starting place for thinking about converting a parsed sentence into an AST: [hxgm30.language.repl] λ=> (nlp/parse "Pick up the book from the table.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["up"] :tag "SBAR"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["from"] :tag "PP"}
{:phrase ["the" "table"] :tag "NP"})
:tagged (["Pick" "VBG"]
["up" "IN"]
["the" "DT"]
["book" "NN"]
["from" "IN"]
["the" "DT"]
["table" "NN"]
["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book off of the floor.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["up"] :tag "SBAR"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["off"] :tag "PP"}
{:phrase ["of"] :tag "PP"}
{:phrase ["the" "floor"] :tag "NP"})
:tagged (["Pick" "VBG"]
["up" "IN"]
["the" "DT"]
["book" "NN"]
["off" "IN"]
["of" "IN"]
["the" "DT"]
["floor" "NN"]
["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book from the shelf.")
{:chunked ({:phrase ["Take"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["from"] :tag "PP"}
{:phrase ["the" "shelf"] :tag "NP"})
:tagged (["Take" "VB"]
["the" "DT"]
["book" "NN"]
["from" "IN"]
["the" "DT"]
["shelf" "NN"]
["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book out of the open chest.")
{:chunked ({:phrase ["Take"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["out"] :tag "PP"}
{:phrase ["of"] :tag "PP"}
{:phrase ["the" "open" "chest"] :tag "NP"})
:tagged (["Take" "VB"]
["the" "DT"]
["book" "NN"]
["out" "IN"]
["of" "IN"]
["the" "DT"]
["open" "JJ"]
["chest" "NN"]
["." "."])} |
I think a good place to start might be just keeping it simple: strip out everything but nouns and verbs:
|
Compare the simplified versions: [hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book from the table." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["table" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book off of the floor." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["floor" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book from the shelf." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["shelf" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book out of the open chest." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["chest" "NN"]) |
Parse output now looks like this: {:ast {:action "take" :object "book" :relations ("chest")}
:chunked ({:phrase ["I"] :tag "NP"}
{:phrase ["take"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["out"] :tag "PP"}
{:phrase ["of"] :tag "PP"}
{:phrase ["the" "open" "chest"] :tag "NP"})
:tagged (["I" "PRP"]
["take" "VBP"]
["the" "DT"]
["book" "NN"]
["out" "IN"]
["of" "IN"]
["the" "DT"]
["open" "JJ"]
["chest" "NN"]
["." "."])
:tokens ["I" "take" "the" "book" "out" "of" "the" "open" "chest" "."]} |
This is an interesting problem to solve, and one where I suspect the first iteration will be quite naïve. I suspect:
The text was updated successfully, but these errors were encountered: