Skip to content

nsip/otf-classifier

Repository files navigation

otf-classifier

Note
This is experimental and proof-of-concept code
Note
Project has moved to go modules, remember to export GO111MODULE=on for go get and go build if you are working without module support as default.

This code is based on https://github.com/nsip/curriculum-align. It puts in place a document classifier (https://en.wikipedia.org/wiki/Tf–idf) to classify arbitrary text as aligning to the learning progressions the code is provisioned with, and outputting the alignments as a web service.

The code is set up to align to indicators; however training corpora are built up against both indicator codes and development level codes, and the latter can be used instead. You can switch by setting the variable granularity in classifier.go to be "Devlevel" instead of "Indicator".

Binary distributions of the code are available in the build/ directory.

The web service is made available as a library (Align()); the cmd directory contains a sample shell for it, which is used in the binary distribution. In the sample shell, the web service runs on port 1576. The test script test.sh issues representative REST queries against the web service.

The web service takes the following arguments:

GET http://localhost:1576/align?area=W&text=....

where area is the learning area (Numeracy or Literacy), and text is the text to be aligned. Both the text and the area parameters are obligatory.

For example:

http://localhost:1576/align?area=Numeracy&text=information

For larger payloads or automated environments calling the classifier, an equivalent POST method is also avialable which will accept a JSON payload:

curl http://localhost:1576/align -H 'Content-Type: application/json' -d'{"area":"literacy","text":"confident sentences"}'

The response is a JSON list of structs, one for each curriculum standard that the service is configured for, with the following fields:

  • Item: the identifier of the curriculum item (indicator) whose alignment is reported

  • DevLevel: the identifier of the development level corresponding to the curriculum item (indicator)

  • Path: the path down to the indicator or development level

  • Text: the text of the curriculum item whose alignment is reported

  • Score: the score of the alignment. This is the score generated by github.com/jbrukh/bayesian: it is a negative number, and the higher the number (i.e. the closer to zero), the better the alignment of the text to the curriculum standard.

  • Matches: the top five words that were the basis for the alignment of the curriculum item to the text

    • Text: the matching word

    • Score: the logarithmic score

For example:

[
  {
    "Item": "uri/version/00b902b5-7065-430f-b6de-f9b92aac85ff",
    "Text": "identifies symmetry in the environment",
    "DevLevel": "UGP3",
    "Path": [
      {
        "Key": "General Capability",
        "Val": "Numeracy"
      },
      {
        "Key": "Element",
        "Val": "Measurement and geometry"
      },
      {
        "Key": "Sub-element",
        "Val": "Understanding geometric properties"
      },
      {
        "Key": "Progression Level",
        "Val": "UGP3"
      },
      {
        "Key": "Heading",
        "Val": "Transformations"
      },
      {
        "Key": "Indicator",
        "Val": "identifies and creates patterns involving one- and two-step transformations of shapes (e.g. uses pattern blocks to create a pattern and describes how the pattern was created)"
      }
    ],
    "Score": -58.50905189596907,
    "Matches": [
      {
        "Word": "collects",
        "Score": -25.328436022934504
      },
      {
        "Word": "information",
        "Score": -25.328436022934504
      }
    ]
  }
]

The documents passed to the document classifier are also indexed, and can be queried through the web service index:

GET http://localhost:1576/index?search=word

The Path lookup of an indicator or progression level code or URI can be queried through the web service lookup:

GET http://localhost:1576/lookup?search=UGP3

To use embedded in other labstack.echo webservers, replicate the cmd/main.go main() code:

align.Init()
e := echo.New()
e.GET("/align", align.Align)
e.GET("/lookup", func(c echo.Context) error {
                query := c.QueryParam("search")
                ret, err := align.Lookup(query)
                if err != nil {
                        return err
                } else {
                        return c.String(http.StatusOK, string(ret))
                }
        })
e.GET("/index", func(c echo.Context) error {
                query := c.QueryParam("search")
                ret, err := align.Search(query)
                if err != nil {
                        return err
                } else {
                        return c.String(http.StatusOK, string(ret))
                }
        })

The web service is configured to read any JSON files in the curricula folder of the executable; the file included in the distribution is a mockup of the proposed machine encoding of the National Learning Progressions.

1576

The word "curriculum" began as a Latin word which means "a race" or "the course of a race" (which in turn derives from the verb currere meaning "to run/to proceed"). The first known use in an educational context is in the Professio Regia, a work by University of Paris professor Petrus Ramus published posthumously in 1576.