No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data Typoedit May 16, 2016
lib Reduce number of regexes for performance Oct 18, 2016
test Reduce number of regexes for performance Oct 18, 2016
.gitignore initial commit Apr 23, 2014
index.js initial commit Apr 23, 2014
package.json Version bump Oct 26, 2016
readme.md Update readme.md Dec 13, 2016

readme.md

Depreciation Notice

This project has been re-imagined in https://github.com/bot-ai/bot-lang

Normalize, clean and fix text

npm install node-normalizer

The simple app processes input and tries to make it consumable for a bot.

The order in which the processing happes is important.

  • <xxx means sentence start then xxx
    1. spelling corrections for common spelling errors
    1. idiom conversions
    1. junk word removal from sentence
    1. special sentence effects (question, exclamation, revert question)
    1. abbreviation expansion and canonization
  • for abbreviations, do not use _ before the .
  • for apostrophied left side, must follow tokenizing conventions
  • for apostrophied right side, it means do not spell check the word, the apostrophe will disappear
  • Format is left phrase separated by _ yields right phrase separated by +
  • if right side is %value means set that bit on the sentence (%EXCLAMATIONMARK %QUESTIONMARK)
  • if right side is a ~word its an interjection
  • only proper names should have capital letters
  • Right phrase missing means delete left phrase
  • Substitutions files include:
  • we use + because we dont want the resulting phrase recognized by the idiom processor and thus cause the processor to delete the phrase
  • xxx> means sentence then end stop
  • if you want to have the result NOT tokenized, put it in quotes