A crowdsourcing approach to natural language programming
JavaScript CSS
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
static
templates
test
.gitignore
LICENSE
README.md
config.js.sample
defaultWidget.html
earley.js
langNodes.js
package.json
server.js
utils.js

README.md

I no longer plan to develop this project, but I am working on something similar at http://contextscript.com

Language

Use Case:

  1. The user types a command that makes sense to them into a search box. For example:

generate an html form with name age and gender fields draw a teapot sum(1,2,3)

  1. The command is parsed using an ambiguous grammar, generating many possible interpretations. An interpretation is a parse tree with a widget attached to its root (other nodes may also have widgets). A widget is a small webpage that performs a particular function relevant to the interpretation it is attached to.

  2. The user sees a list of widgets corresponding to various interpretations of their command. For example, if the user's command was "draw a teapot" they might see widgets containing various teapot drawings and perhaps widgets with instructions how to draw a teapot. The user can up-vote good interpretations and down-vote irrelevant interpretations to improve future rankings. In addition to votes, probability of occurance also affects an interpration's ranking.

  3. If none of the resulting interpretations are to the user's liking, they can add new language nodes to the grammar that do what they want. This is easily done using a query like "add language node."

Live Examples:

You can see the languageNodes used in all these examples by clicking more->view source.

Language Terminology for computer scientists:

LangNode => Production rule + meta data

Category => Non-termial (LHS of production rule)

Components => RHS of production rule

Interpretation ~> Multi-parse tree*

*A multi-parse tree differs from a standard parse tree in that each node in it has multiple interpretations under it.

Planned* formats (as json schemas):

*note that the current formats differ in a few ways.

langNode:

    {
        type: "object",
        properties: {
            category: { type: "string" },
            components: {
                type: "array",
                items: [
                    {
                        description: "terminal",
                        type: "string"
                    },
                    {
                        description: "non-terminal",
                        type: "object",
                        properties: {
                            category: {
                                type: ["string", "array"]
                                description: "This can be a category name (which corresponds to an array of language nodes in the database) or an array of language nodes defined inline.",
                            }
                        }
                    },
                    {
                        description: "regular expression",
                        type: "object",
                        properties: {
                            regex: { type: "string" }
                        }
                    }
                ]
            },
            repository: {
                type: "object",
                properties: {
                    type: { 
                        description: "Currently only gist is supported.",
                        type: "string" 
                    },
                    gistId: {
                        description: "Only needed for gist type repositories.",
                        type: "string"
                    },
                    lastSync: { type: "string"}
                }
            },
            content: {
                description: "The complete content of the json object used to construct this langNode. You can put any metadata you want here.",
                type: "object"
            }
        }
    }

Interpretation: (Add "&json=true" to your url search strings to see what things actually look like)

    {
        type: "object",
        properties: {
            query: {
                description: "The command/query the user searched for.",
                type: "string"
            },
            category: {
                description: "The category that the search was done under. (e.g. main/science/code)",
                type: "string"
            },
            interpretations: {
                description: "The interpretations of the query",
                type: "array",
                items: {
                    type: "object",
                    description: "A langNode extended with an interpretations array like this one."
                }
            }
        }
    }

Server API:

Upsert language node

TODO

Get parse tree:

//This example uses jQuery and jQuery-URL-Parser
var interpId = $.url().param('interpId'); 
$.getJSON($.url().param('serverUrl') + '/interpretations/' + interpId, function(data) {
	var multiParseTree = data.root;
});

Roadmap:

Add a way to view usage data. For example, making it possible to see the most frequently unparsed queries so it is possible to see what kind of widgets are needed. Ideally, this could be done in a widget, but the server will need to expose an api for the queries database.

There is a need for some kind of API to make it easier to deal with multi-parse trees in widgets. One idea I like for this is having callback functions that get called multiple times for each interpretation.

Voting/ranking is not implemented. This will also require github or some other authentication service to prevent ballot box stuffing. First there needs to be enough widgets that there is a need to rank them though.

In the distant future:

Create interface for widgets to request access to resources from the parent site. For example, if the user links their github account to the main site, widgets could request the ability to modify one of the user's repositories from it.

Opt-in personalized rankings.

Parsing of non-textual input (e.g. voice, video)