Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devise data structure (AST) for the parsed results of user-supplied commands #37

Closed
oubiwann opened this issue Oct 18, 2018 · 6 comments
Assignees

Comments

@oubiwann
Copy link
Member

oubiwann commented Oct 18, 2018

This is an interesting problem to solve, and one where I suspect the first iteration will be quite naïve. I suspect:

  1. The verb will be the root node of the tree
  2. Objects in the world will support protocols (a la Java interfaces), and thus will be able to say whether they support the given action indicated by the verb
  3. Deeper structure will only be revealed through experimentation
@oubiwann
Copy link
Member Author

One thing that will need to be decided upon is which sentence type to parse. The user commands can either be interpreted as imperative statements: "pick up the book" or as implicit 1st person narrative: "I pick up the book". In either case, OpenNLP parses them differently.

Compare:

{:chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["pick"] :tag "VP"}
           {:phrase ["up"] :tag "PRT"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["I" "PRP"]
          ["pick" "VBP"]
          ["up" "RP"]
          ["the" "DT"]
          ["book" "NN"]
          ["." "."])}

with:

{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["Pick" "VBG"] 
          ["up" "IN"] 
          ["the" "DT"] 
          ["book" "NN"] 
          ["." "."])}

Then, contrast these with the following:

{:chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["pick"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["up"] :tag "ADVP"})
 :tagged (["I" "PRP"]
          ["pick" "VBP"]
          ["the" "DT"]
          ["book" "NN"]
          ["up" "RB"]
          ["." "."])}
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["up"] :tag "ADVP"})
 :tagged (["Pick" "VBG"] 
          ["the" "DT"] 
          ["book" "NN"] 
          ["up" "RB"] 
          ["." "."])}

Tagged part-of-speech references:

@oubiwann
Copy link
Member Author

oubiwann commented Oct 18, 2018

We are going to want to capitalize, though:

{:chunked ({:phrase ["pick"] :tag "VP"}
           {:phrase ["up"] :tag "ADVP"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["pick" "VB"] ["up" "RP"] ["the" "DT"] ["book" "NN"] ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["Pick" "VBG"] ["up" "IN"] ["the" "DT"] ["book" "NN"] ["." "."])}

@oubiwann
Copy link
Member Author

These might serve as a good starting place for thinking about converting a parsed sentence into an AST:

[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book from the table.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["from"] :tag "PP"}
           {:phrase ["the" "table"] :tag "NP"})
 :tagged (["Pick" "VBG"]
          ["up" "IN"]
          ["the" "DT"]
          ["book" "NN"]
          ["from" "IN"]
          ["the" "DT"]
          ["table" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book off of the floor.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["off"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "floor"] :tag "NP"})
 :tagged (["Pick" "VBG"]
          ["up" "IN"]
          ["the" "DT"]
          ["book" "NN"]
          ["off" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["floor" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book from the shelf.")
{:chunked ({:phrase ["Take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["from"] :tag "PP"}
           {:phrase ["the" "shelf"] :tag "NP"})
 :tagged (["Take" "VB"]
          ["the" "DT"]
          ["book" "NN"]
          ["from" "IN"]
          ["the" "DT"]
          ["shelf" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book out of the open chest.")
{:chunked ({:phrase ["Take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["out"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "open" "chest"] :tag "NP"})
 :tagged (["Take" "VB"]
          ["the" "DT"]
          ["book" "NN"]
          ["out" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["open" "JJ"]
          ["chest" "NN"]
          ["." "."])}

@oubiwann
Copy link
Member Author

I think a good place to start might be just keeping it simple: strip out everything but nouns and verbs:

  • this assumes there's only one verb
  • the first noun indicates the object of the action
  • any additional nouns help set that noun in context (i.e., helping to uniquely identify the noun in question

@oubiwann
Copy link
Member Author

Compare the simplified versions:

[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book from the table." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["table" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book off of the floor." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["floor" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book from the shelf." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["shelf" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book out of the open chest." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["chest" "NN"])

@oubiwann
Copy link
Member Author

Parse output now looks like this:

{:ast {:action "take" :object "book" :relations ("chest")}
 :chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["out"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "open" "chest"] :tag "NP"})
 :tagged (["I" "PRP"]
          ["take" "VBP"]
          ["the" "DT"]
          ["book" "NN"]
          ["out" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["open" "JJ"]
          ["chest" "NN"]
          ["." "."])
 :tokens ["I" "take" "the" "book" "out" "of" "the" "open" "chest" "."]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant