spaCy-CLD: Bringing simple language detection to spaCy

This package is a spaCy 2.0 extension that adds language detection to spaCy's text processing pipeline. Inspired from a discussion here.

Installation

pip install spacy_cld

Usage

Adding the spaCy-CLD component to the processing pipeline is relatively simple:

import spacy
from spacy_cld import LanguageDetector

nlp = spacy.load('en')
language_detector = LanguageDetector()
nlp.add_pipe(language_detector)
doc = nlp('This is some English text.')

doc._.languages  # ['en']
doc._.language_scores['en']  # 0.96

spaCy-CLD operates on Doc and Span spaCy objects. When called on a Doc or Span, the object is given two attributes: languages (a list of up to 3 language codes) and language_scores (a dictionary mapping language codes to confidence scores between 0 and 1).

Under the hood

spacy-cld is a little extension that wraps the PYCLD2 Python library, which in turn wraps the Compact Language Detector 2 C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language.

For additional details, see the linked project pages for PYCLD2 and CLD2.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
spacy_cld		spacy_cld
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spaCy-CLD: Bringing simple language detection to spaCy

Installation

Usage

Under the hood

About

Releases

Packages

Contributors 3

Languages

License

nickdavidhaynes/spacy-cld

Folders and files

Latest commit

History

Repository files navigation

spaCy-CLD: Bringing simple language detection to spaCy

Installation

Usage

Under the hood

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages