Skip to content

OKBQA Platform

Jin-Dong Kim edited this page Aug 8, 2018 · 12 revisions

Goal

  • To report performance evaluation of OKBQA workflows

ToDo

  • To make other modules ready for use.
  • To evaluate workflows using NLQ50.
  • To find reasonable evaluation data set.

Members

  • Jin-Dong Kim (Coordinator, Templator)
  • Sang-Min An (Evaluator)
  • Jiseong Kim (Controller)
  • Andre Freitas (Disambigurator)

1st day progress

More OKBQA modules

  • [TGM] LODQA Templator - done
  • [DM] Stargraph disambiguator

Evaluation

  • Evaluator web service URL
http://ws.okbqa.org:31999/evaluation
  • test-configuration.json
{
  "language": "en",
  "data_url": "http://ws.okbqa.org/down/dbpedia_train_answers.json",
  "config": {
    "address": {
      "TGM": [
        "http://ws.okbqa.org:1515/templategeneration/rocknrole"
      ],
      "DM": [
        "http://ws.okbqa.org:2357/agdistis/run"
      ],
      "QGM": [
        "http://ws.okbqa.org:38401/queries"
      ],
      "AGM": [
        "http://ws.okbqa.org:7745/agm"
      ],
      "KB": [
        [
          "http://kbox.kaist.ac.kr:5889/sparql",
          "http://en.dbpedia2014.kaist.ac.kr"
        ]
      ]
    },
    "sequence": [
      "TGM",
      "DM",
      "QGM",
      "AGM"
    ],
    "timelimit": 100,
    "n" : 5
  }
}
  • curl command
curl -i -H "Content-Type:application/json" -d @test-configuration.json http://ws.okbqa.org:31999/evaluation

2nd day progress

Evaluation setup

  • The OKBQA default English workflow
  • Data: QALD3 for DBpedia 3.8
    • 100 questions in the training data set
    • 7 questions are dropped due to empty answer
    • converted to JSON (93 questions)

Initial Evaluation Results

  • correct_answer_rate : 0.06451612903225806,
  • partial_answer_rate : 0.20430107526881722,
  • false_negative_answer_rate : 0.5268817204301075,
  • false_positive_answer_rate : 0.5698924731182796,

False Positive Example

  • Question: Give me all movies with Tom Cruise
  • SPARQL
SELECT ?v3 WHERE {
   ?v3 a <http://dbpedia.org/ontology/Film> .
   ?v3 ?v10 <http://dbpedia.org/resource/Tom_Cruise>
}

Issues

  • components are not yet engineered for the dataset

To Do

  • Error analysis and report to the community.
  • Evaluation of multiple workflows remains to be done.

Example of Improvement

  • One problem in the QGM is identified and fixed
    • correct_answer_rate : 0.06451612903225806,
    • partial_answer_rate : 0.21505376344086022,
    • false_negative_answer_rate : 0.4838709677419355,
    • false_positive_answer_rate : 0.5376344086021505

Controller update: I/O format checking and alarming

  • Now, Controller checks whether I/O formats of user-provided modules are consistent with the standard I/O specification of OKBQA.
  • When Controller faces the inconsistency of I/O formats, it immediately stops the pipeline and reports about the inconsistency to users.
  • This I/O format checking is done only if names of modules are one of "TGM", "DM", "QGM", or "AGM". Otherwise, you can use any freely designed I/O formats for your own pipeline specification.