Skip to content

Commit

Permalink
fixed issue #18
Browse files Browse the repository at this point in the history
  • Loading branch information
saffsd committed Feb 4, 2014
1 parent daffdff commit 1e1ed50
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 2 deletions.
11 changes: 11 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,17 @@ When using ``langid.py`` as a library, the set_languages method can be used to c
>>> langid.classify("I do not speak english")
('en', 0.99176190378750373)

Batch Mode
----------

``langid.py`` supports batch mode processing, which can be invoked with the ``-b`` flag.
In this mode, ``langid.py`` reads a list of paths to files to classify as arguments.
If no arguments are supplied, ``langid.py`` reads the list of paths from ``stdin``,
this is useful for using ``langid.py`` with UNIX utilities such as ``find``.

In batch mode, ``langid.py`` uses ``multiprocessing`` to invoke multiple instances of
the classifier, utilizing all available CPUs to classify documents in parallel.

.. Probability Normalization
Probability Normalization
Expand Down
9 changes: 7 additions & 2 deletions langid/langid.py
Original file line number Diff line number Diff line change
Expand Up @@ -543,8 +543,13 @@ def _process(text):
import multiprocessing as mp

def generate_paths():
for line in sys.stdin:
path = line.strip()
if len(args) > 0:
paths = args
else:
from itertools import imap
paths = map(str.strip,sys.stdin)

for path in paths:
if path:
if os.path.isfile(path):
yield path
Expand Down

0 comments on commit 1e1ed50

Please sign in to comment.