Text Processing for Small or Big Data Files in R
R C++ M4
Latest commit fb53367 Jan 21, 2017 @mlampros committed on GitHub Update tokenization.h


CRAN_Status_Badge Travis-CI Build Status codecov.io AppVeyor build status Downloads


The textTinyR package consists of text pre-processing functions for small or big data files. More details on the functionality of the textTinyR can be found in the blog-post and in the package Vignette. The R package can be installed, in the following OS's: Linux, Mac and Windows. However, there are some limitations :

  • there is no support for chinese, japanese, korean, thai or languages with ambiguous word boundaries.
  • there is no support functions for utf-locale on windows, meaning only english character strings or files can be input and pre-processed.

System Requirements ( for unix OS's )


sudo apt-get install libboost-all-dev

sudo apt-get update

sudo apt-get install libboost-locale-dev


yum install boost-devel

Macintosh OSX/brew

The boost library will be installed on Macintosh OSx using the Homebrew package manager,

If the boost library is already installed using brew install boost then it must be removed using the following command,

brew uninstall boost

Then the formula for the boost library should be modified using a text editor (TextEdit, TextMate, etc). The formula is saved in:


The user should open the boost.rb formula and replace the following code chunk beginning from (approx.) line 71,

# layout should be synchronized with boost-python
args = ["--prefix=#{prefix}",

if build.with? "single"
  args << "threading=multi,single"
  args << "threading=multi"

with the following code chunk,

# layout should be synchronized with boost-python
args = ["--prefix=#{prefix}",

#if build.with? "single"
#  args << "threading=multi,single"
#  args << "threading=multi"

Then the user should save the changes, close the file and run,

brew update

to apply the changes.

Then he/she should open a new terminal (console) and type the following command, which installs the boost library using the modified formula from source, (warning: there are two dashes before : build-from-source)

brew install /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/boost.rb --build-from-source

That's it.

Installation of the textTinyR package (CRAN, Github)

To install the package from CRAN use,

install.packages('textTinyR', clean = TRUE)

and to download the latest version from Github use the install_github function of the devtools package,

devtools::install_github(repo = 'mlampros/textTinyR', clean = TRUE)

Use the following link to report bugs/issues,