Skip to content


Switch branches/tags

Latest commit

This version includes bugfixes:
* Fixing float prefixes for large numbers
* Making numerizer independent of the spacy model used

Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Build Status


A Python module to convert natural language numerics into ints and floats. This is a port of the Ruby gem numerizer


The numerizer library can be installed from PyPI as follows:

$ pip install numerizer

or from source as follows:

$ git clone
$ cd numerizer
$ pip install -e .


>>> from numerizer import numerize
>>> numerize('forty two')
>>> numerize('forty-two')
>>> numerize('four hundred and sixty two')
>>> numerize('one fifty')
>>> numerize('twelve hundred')
>>> numerize('twenty one thousand four hundred and seventy three')
>>> numerize('one million two hundred and fifty thousand and seven')
>>> numerize('one billion and one')
>>> numerize('nine and three quarters')
>>> numerize('platform nine and three quarters')
'platform 9.75'

Using the SpaCy extension

Since version 0.2, numerizer is available as a SpaCy extension.

Any named entities of a quantitative nature within a SpaCy document can be numerized as follows:

>>> from spacy import load
>>> nlp = load('en_core_web_sm')  # or load any other spaCy model
>>> doc = nlp('The projected revenue for the next quarter is over two million dollars.')
>>> doc._.numerize()
{the next quarter: 'the next 1/4', over two million dollars: 'over 2000000 dollars'}

Users can specify which entity types are to be numerized, by using the labels argument in the extension function, as follows:

>>> doc._.numerize(labels=['MONEY'])  # only numerize entities of type 'MONEY'
{over two million dollars: 'over 2000000 dollars'}

The extension is available for tokens and spans as well.

>>> two_million = doc[-4:-2]  # span corresponding to "two million"
>>> two_million._.numerize()
>>> quarter = doc[6]  # token corresponding to "quarter"
>>> quarter._.numerized


For R users, a wrapper library has been developed by @amrrs. Try it out here.