Skip to content
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Branch: master
Clone or download
rodrigopivi Merge pull request #60 from mminchev/file-names
Adding support for custom dataset file names
Latest commit 5f73f8f Feb 10, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci update ci config and readme badges Sep 18, 2018
.vscode Chatito V2.0.0 - better faster stronger Jun 11, 2018
examples allow slots from aliases, update spec, tests, add ci, badges Sep 18, 2018
parser allow slots from aliases, update spec, tests, add ci, badges Sep 18, 2018
public v2.1.0, add comments, entity arguments, ide improvements, gatsby stat… Aug 21, 2018
src
web fix bulk generation path and editor checkbox Aug 25, 2018
.babelrc v2.1.0, add comments, entity arguments, ide improvements, gatsby stat… Aug 21, 2018
.gitignore fix codeflesk, reactjson and gatsby deps Aug 21, 2018
.npmignore update npmignore Aug 25, 2018
.nvmrc v2.1.0, add comments, entity arguments, ide improvements, gatsby stat… Aug 21, 2018
.prettierrc v2.1.0, add comments, entity arguments, ide improvements, gatsby stat… Aug 21, 2018
LICENSE Chatito V2.0.0 - better faster stronger Jun 11, 2018
gatsby-config.js gatsby updates Aug 21, 2018
gatsby-node.js fix icon and custom path prefix and add local storage for ide Aug 22, 2018
package-lock.json update deps Sep 18, 2018
package.json update deps Sep 18, 2018
readme.md Updating example usage in readme file. Feb 7, 2019
screenshot.jpg update screenshot and fix line numbers Aug 21, 2018
spec.md allow slots from aliases, update spec, tests, add ci, badges Sep 18, 2018
tsconfig.json v2.1.0, add comments, entity arguments, ide improvements, gatsby stat… Aug 21, 2018
tslint.json v2.1.0, add comments, entity arguments, ide improvements, gatsby stat… Aug 21, 2018

readme.md

Chatito

npm version CircleCI branch npm License

Alt text

Try the online IDE!

Donate

Alt text

Designing and maintaining chatito takes time and effort, if it was usefull for you, please consider making a donation and share the abundance! :)

Overview

Chatito helps you generate datasets for training and validating chatbot models using a minimalistic DSL.

If you are building chatbots using commercial models, open source frameworks or writing your own natural language processing model, you need training examples. Chatito is here to help you.

This project contains the:

Chatito language

For the full language specification and documentation, please refer to the DSL spec document.

Adapters

The language is independent from the generated output format and because each model can receive different parameters and settings, there are 3 data format adapters provided. This section describes the adapters, their specific behaviors and use cases:

Default format

Use the default format if you plan to train a custom model or if you are writting a custom adapter. This is the most flexible format because you can annotate Slots and Intents with custom entity arguments, and they all will be present at the generated output, so for example, you could also include dialog/response generation logic with the dsl. E.g.:

%[some intent]('context': 'some annotation')
    @[some slot] ~[please?]

@[some slot]('required': 'true', 'type': 'some type')
    ~[some alias here]

Custom entities like 'context', 'required' and 'type' will be available at the output so you can handle this custom arguments as you want.

Rasa NLU

Rasa NLU is a great open source framework for training NLU models. One particular behavior of the Rasa adapter is that when a slot definition sentence only contains one alias, the generated rasa dataset will map the alias as a synonym. e.g.:

%[some intent]('training': '1')
    @[some slot]

@[some slot]
    ~[some slot synonyms]

~[some slot synonyms]
    synonym 1
    synonym 2

In this example, the generated rasa dataset will contain the entity_synonyms of synonym 1 and synonym 2 mapping to some slot synonyms.

Snips NLU

Snips NLU is another great open source framework for NLU. One particular behavior of the Snips adapter is that you can define entity types for the slots. e.g.:

%[date search]('training':'1')
   for @[date]

@[date]('entity': 'snips/datetime')
    ~[today]
    ~[tomorrow]

In the previous example, all @[date] values will be taged with the snips/datetime entity tag.

NPM package

Chatito is supports nodejs v8.11.2 LTS or higher.

Install it globally:

npm i chatito -g

Or locally:

npm i chatito --save

Then create a definition file (e.g.: trainClimateBot.chatito) with your code.

Run the npm generator:

npx chatito trainClimateBot.chatito

The generated dataset should be available next to your definition file.

Here is the full npm generator options:

npx chatito <pathToFileOrDirectory> --format=<format> --formatOptions=<formatOptions> --outputPath=<outputPath> --trainingFileName=<trainingFileName> --testingFileName=<testingFileName>
  • <pathToFileOrDirectory> path to a .chatito file or a directory that contains chatito files. If it is a directory, will search recursively for all *.chatito files inside and use them to generate the dataset. e.g.: lightsChange.chatito or ./chatitoFilesFolder
  • <format> Optional. default, rasa or snips
  • <formatOptions> Optional. Path to a .json file that each adapter optionally can use
  • <outputPath> Optional. The directory where to save the generated datasets. Uses the current directory as default.
  • <trainingFileName> Optional. The name of the generated training dataset file. Do not forget to add a .json extension at the end. Uses <format>_dataset_training.json as default file name.
  • <testingFileName> Optional. The name of the generated testing dataset file. Do not forget to add a .json extension at the end. Uses <format>_dataset_testing.json as default file name.

Author and maintainer

Rodrigo Pimentel

You can’t perform that action at this time.