-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate grammar parser/blockly interface overlap #1
Comments
Thanks @HaiqiXu for the explanations in our meeting :) I have a better understanding now (I made some edits to the procedure sketch in my original post). It turns out that our assumption (Simon's, Enkhbold's, mine) that the grammar parser and the Blockly interface do more or less the same thing (that is, implement the grammar) was not correct. This explains Haiqi's hesitation and means that unifying the grammar parser and the Blockly interface into a single module isn't as straightforward as we might have at first supposed. That's because the recognition of the core concepts influences the way that grammatical components can combine.1 Using the concept dictionary, some words are converted into the concepts they represent. It is only then that the phrase is passed to the parser, which identifies the functional roles. This doesn't mean that we should keep the architecture as it is right now, because problems2 remain:
I see two ways in which we can address these problems:
I'm not yet sure about either of these approaches, so I'm going to play with Blockly and the grammar and see what makes sense. Footnotes
|
Thinking about it more, I think the best approach is to create a declarative representation of the grammar that carries enough information to:
This way, the grammar can look much like Haiqi's original ANTLR implementation, with the associated ease of editing - but we can still get rid of the parsing step. Also, this would avoid spaghetti coding the two steps in an overzealous effort to unify them (despite the current overlap, there really are conceptually separate procedures at the cores of the two systems). We will also need a tool to check whether all questions can still be represented with Blockly, because it would be a chore to test otherwise. This will be discussed in another issue when we get to it. |
In preparation for the merge with the Blockly interface, I've moved Haiqi's code to a subdirectory. See issues #1 and #2: the Blockly interface should strictly *construct* natural language questions, and the identification of functional roles should be wholly separate from this issue of natural language processing.
I have created a While I expect the However, changes to the Blockly interface should probably not be made to https://github.com/quangis/quangis-web or https://github.com/HaiqiXu/haiqixu.github.io, but here, instead. (And, in time, removed from |
For reference: I can verify that the recognition of core concepts indeed influences the parsing process, as mentioned in this comment. Consider:
|
Following up on the previous comment, we need a separate module for recognizing concepts. This module would have only one responsibility: converting a string into a concept.^[1] This is a difficult task on its own, so it's imperative that it is understandable and replacable (!) in isolation, not muddled by other concerns like parsing a whole sentence. It can be wrapped up in a service, depending on where it is needed. Now, suppose that we accept, for example, a network in a block where the grammar would require a field. Is that a problem?
In any case, the concept recognizer should be isolated. |
The pressing issues with this part of the pipeline are with robustness, scalability and testing. For the final product, we need a lot of simplifications. To organize and document the development, I will be tracking that in the issue tracker here.
Currently, if I understand correctly, the procedure can be roughly sketched as follows. I will edit as I go along; please comment if I am mistaken.
nltk
is used to detect and clean adjectives like 'nearest', so that the important nouns can be isolated and recognized in subsequent steps.allennlp
.spaCy
.cct
types via manually constructed rules based on the concepts/extents/transformations that were found in previous steps.The issue is that this is rather fragile; it depends (among other things) on:
We have chosen
blockly
to constrain the natural language at the user end, in such a way that the questions that may be presented to the parser are questions that the parser can handle. However, this only formats the question to reduce the problems of an otherwise unchanged natural language processing pipeline. As discussed in the meeting and elsewhere:blockly
instead of freeform text, we will no longer need named entity recognition or question cleaning. This would strip out thenltk
,spaCy
, andallenlp
packages, tremendously simplifying the process.blockly
-constructed query can output something different than what's written on the blocks, we might even forego the natural language parser completely, in favour of JSON output at the blockly level (or another format that is easily parsed). This would eliminate even the ANTLR parser, further reducing complexity. The downside is that we would no longer be able to parse freeform text (though that would be impacted by the removal of named entity recognition anyway). We could describe this with JSON Schema to really pin it down.This would make this repository not so much a
geo-question-parser
as much as ageo-question-formulator
. This is good, because the code right now is very complex and very fitted to the specific questions in the original corpus, which isn't acceptable in a situation where users can pose their own questions.Note: If we simplify to this extent, it might be nice to use
rdflib.js
to output a transformation graph directly, but that is for later.The process would thus become:
blockly
, construct JSON that represents a question.cct
types via rules.I'm not sure to what extent we can still simplify step 2. Depending how much code would be left, it would be nice to port/rewrite in JavaScript, alongside
blockly
, so that we can visualize most things client-side and with minimal moving parts.The text was updated successfully, but these errors were encountered: