Bindings to Google's Compact Language Detector 3 (experimental)
C++ Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
man
src
tools
.Rbuildignore
.gitignore
.travis.yml
DESCRIPTION
NAMESPACE
NEWS
README.md
appveyor.yml
cld3.Rproj
cleanup
configure

README.md

cld3

R Wrapper for Google's Compact Language Detector 3

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Build Status AppVeyor Build Status Coverage Status CRAN_Status_Badge CRAN RStudio mirror downloads Github Stars

Google's Compact Language Detector 3 is a neural network model for language identification and the successor of CLD2 (available from) CRAN. This version is still experimental and uses a novell algorithm with different properties and outcomes. For more information see: https://github.com/google/cld3#readme

The function detect_language() is vectorised and guesses the the language of each string in text or returns NA if the language could not reliably be determined.

> library(cld3)
> example(cld3)

cld3> # Vectorized best guess
cld3> detect_language(c("To be or not to be?", "Ce n'est pas grave.", "猿も木から落ちる"))
[1] "en" "fr" "ja"

The function detect_language_multi() is not vectorised and detects all languages inside the entire character vector as a whole.

cld3> # Multiple languages in one text
cld3> detect_language_mixed("This piece of text is in English. Този текст е на Български.", size = 3)
  language probability reliable proportion
1       bg   0.9173891     TRUE  0.5853658
2       en   0.9999790     TRUE  0.4146341
3      und   0.0000000    FALSE  0.0000000

Installation

Binary packages for OS-X or Windows can be installed directly from CRAN:

install.packages("cld3")

Installation from source on Linux or OSX requires Google's Protocol Buffers library. On Debian or Ubuntu install libprotobuf-dev and protobuf-compiler:

sudo apt-get install -y libprotobuf-dev protobuf-compiler

On Fedora we need protobuf-devel:

sudo yum install protobuf-devel

On CentOS / RHEL we install protobuf-devel via EPEL:

sudo yum install epel-release
sudo yum install protobuf-devel

On OS-X use protobuf from Homebrew:

brew install protobuf