tokens

Tokens is a simple nlp utility (written in go) for tokenizing strings using common split regular expressions for whitespace, words, emoticons, urls and more.

View the docs.

Installation

$ go get github.com/nyxtom/tokens

Example

import "github.com/nyxtom/tokens"

func main() {
	fmt.Println(tokens.SplitNatural("hello world, this is @nyxtom!"))
}

Expressions

RepeatedPunctRegexp (repeated punctuation)
NumericRegexp (expression to test if a given string is only numeric)
CashTagRegexp ($GOOG, $ATT and various cashtags used in twitter or other places)
HashTagRegexp (#hashtags)
MentionRegexp (@mentions)
HTTPWWWRegexp (determine if a url is prefixed with https? and or www)
URLRegexp (regular expression for finding urls based on a variant of daringfireball.net/2010/07/improved_regex_for_matching_urls)
EmailRegexp
EmoticonsRegexp
EmoticonWordPunctuationRegexp
WordPunctuationRegexp

Word punctuation contains many patterns including detecting partial urls, file paths, money, numerics, decimals, words with hyphens, abbreviations, numeric / words (3D), phone numbers, repeated punctuations, and non-whitespace.

LICENSE

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
expressions.go		expressions.go
expressions_test.go		expressions_test.go
filter.go		filter.go
match.go		match.go
split.go		split.go
split_test.go		split_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

expressions.go

expressions.go

expressions_test.go

expressions_test.go

filter.go

filter.go

match.go

match.go

split.go

split.go

split_test.go

split_test.go

Repository files navigation

tokens

Installation

Example

Expressions

LICENSE

About

Releases

Packages

Languages

License

nyxtom/tokens

Folders and files

Latest commit

History

Repository files navigation

tokens

Installation

Example

Expressions

LICENSE

About

Resources

License

Stars

Watchers

Forks

Languages