Skip to content
No description, website, or topics provided.
JavaScript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
lib Reduce number of regexes for performance Oct 18, 2016
test
.gitignore
index.js
package.json
readme.md

readme.md

Depreciation Notice

This project has been re-imagined in https://github.com/bot-ai/bot-lang

Normalize, clean and fix text

npm install node-normalizer

The simple app processes input and tries to make it consumable for a bot.

The order in which the processing happes is important.

  • <xxx means sentence start then xxx
    1. spelling corrections for common spelling errors
    1. idiom conversions
    1. junk word removal from sentence
    1. special sentence effects (question, exclamation, revert question)
    1. abbreviation expansion and canonization
  • for abbreviations, do not use _ before the .
  • for apostrophied left side, must follow tokenizing conventions
  • for apostrophied right side, it means do not spell check the word, the apostrophe will disappear
  • Format is left phrase separated by _ yields right phrase separated by +
  • if right side is %value means set that bit on the sentence (%EXCLAMATIONMARK %QUESTIONMARK)
  • if right side is a ~word its an interjection
  • only proper names should have capital letters
  • Right phrase missing means delete left phrase
  • Substitutions files include:
  • we use + because we dont want the resulting phrase recognized by the idiom processor and thus cause the processor to delete the phrase
  • xxx> means sentence then end stop
  • if you want to have the result NOT tokenized, put it in quotes
You can’t perform that action at this time.