Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Build Status Project Stage downloads


Essence is a Natural Language Processing (NLP) and Text Summarization library for Elixir. The work is currently in very early stages.


  • Tokenization (Basic, done)
  • Sentence Detection and Chunking (Basic, done)
  • Vocabulary (Basic, done)
  • Documents (Draft, done)
  • Concordance (done)
  • Readability (ARI done, SMOG done, FC todo, GF done, DC done, CL done)
  • Reading Time estimates (how long would it take somebody to read the given text, useful for blog posts / articles)
  • Speaking Time estimates (how long would it take somebody to present the given content, useful for speeches, presentations)
  • Text Corpora
  • Bi-Grams
  • Tri-Grams
  • n-Grams
  • Stopwords for English
  • Common Names in English (male, female, ambiguous)
  • Dictionary words in English
  • Dale-Challe's dictionary of easy English words
  • Frequency Measures: TF, TF/IDF, ...
  • Time-Series Documents
  • Dispersion
  • Similarity Measures
  • Part of Speech Tagging
  • Sentiment Analysis
  • Classification
  • Summarization
  • Document Hierarchies


If available in Hex, the package can be installed as:

  1. Add essence to your list of dependencies in mix.exs:
def deps do
  [{:essence, "~> 0.2.0"}]


In the following examples we will use test/genesis.txt, which is a copy of the book of genesis from the King James Bible (

We provide a convenience method for reading the plain text of the book of genesis into Essence via the method Essence.genesis/1

Let's first create a document from the text:

iex> document = Essence.Document.from_text Essence.genesis

We can see that the text contains 1,533 paragraphs, 1,663 sentences and 44,741 tokens.

iex> document |> Essence.Document.enumerate_tokens |> Enum.count
iex> document |> Essence.Document.paragraphs |> Enum.count
iex> document |> Essence.Document.sentences |> Enum.count

What might the first sentence of genesis be?

iex> Essence.Document.sentence document, 0

Now let's compute the frequency distribution for tokens in the book of genesis:

iex> fd = Essence.Vocabulary.freq_dist document

What is the vocabulary of this text?

iex> vocabulary = Essence.Vocabulary.vocabulary document

or alternatively we can use the frequency distribution for the equivalent expression:

iex> vocabulary = Map.keys fd

What might the top 10 most frequent tokens be?

iex> vocabulary |> Enum.sort_by( fn(x) -> Map.get(fd, x) end, &>=/2 ) |> Enum.slice(1, 10)
["and", "the", "of", ".", "And", ":", "his", "he", "to", ";"]

Next, we can compute the lexical richness of the text:

iex> Essence.Vocabulary.lexical_richness document

Let's get a concordance view on 'Adam':

iex> Essence.Document.concordance(document, "Adam")

nd brought them unto Adam to see what he would
hem : and whatsoever Adam called every living c
e name thereof . And Adam gave names to all cat
 the field ; but for Adam there was not found a
p sleep to fall upon Adam , and he slept : and
r unto the man . And Adam said , This is now bo
ool of the day : and Adam and his wife hid them
LORD God called unto Adam , and said unto him ,
over thee . And unto Adam he said , Because tho
lt thou return . And Adam called his wife's nam
of all living . Unto Adam also and to his wife
e tree of life . And Adam knew Eve his wife ; a
 and sevenfold . And Adam knew his wife again ;
f the generations of Adam . In the day that God
nd called their name Adam , in the day when the
y were created . And Adam lived an hundred and
th : And the days of Adam after he had begotten
nd all the days that Adam lived were nine hundr


Essence is a library for Natural Language Processing and Text Summarization in Elixir.







No packages published