GitHub - sinjax/trendminer-python: python trendminer code

to run on a file: cat input | python twokenize.py | python langid.py | python stemming.py > output

ad you will get the same tweets with some extra fields in the json: tokens - list of tokens tok_lang - string with proper words separated by whitespace lang_det - the detected language of the tweet stemming - list of stems

Works at a rate of about 1 million tweets/hour , although it's likely to be actually faster

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.svn		.svn
bivariate		bivariate
clustering		clustering
langid		langid
lodie_extract		lodie_extract
polls		polls
stemmer		stemmer
tokeniser		tokeniser
.gitignore		.gitignore
README		README
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

sinjax/trendminer-python

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages