In [1]:
!pip install textstat



# English

In [32]:
import textstat

In [33]:
textstat.set_lang("en")

In [34]:
text = """
At a time when more soldiers are committing suicide than are dying in battle, it is well to remember that, no matter how thoroughly indoctrinated the belief in the superiority of an abstraction, there remains within each of us a powerful life-force that can never be fully repressed. What Gandhi called Satyagraha – a “Truth-force” or “Soul-force” – remains deep within us as, perhaps, the greatest power at work upon each of us. The state – and the civilization it is helping to bring down – will continue to fight this life-force in every conceivable manner, not simply in the war system, but in efforts to regulate even the most miniscule details of life’s expressions.
When the minds and the spirits of men and women combine to address, with intelligence, what we have done to ourselves – and are doing to our children and grandchildren – we may be able to walk away from our roles as servo-mechanisms to state and corporate power interests, and to discover how to live according to that life-force within each of us. To those unable or unwilling to confront the wickedness implicit in their robotic existences, there will be nothing but unfocused anger and giggling to accompany their trip into the awaiting black-hole..
"""

In [35]:
# Count Syllables
textstat.syllable_count(text)

317

In [36]:
# Lexicon count
textstat.lexicon_count(text, removepunct=True)

209

In [37]:
# Sentence count
textstat.sentence_count(text)

5

In [38]:
# Flesch Reading Ease formula
textstat.flesch_reading_ease(text)

37.51

In [39]:
# Flesch-Kincaid Grade Level
textstat.flesch_kincaid_grade(text)

18.4

In [40]:
# Fog Scale (Gunning FOG Formula)
textstat.gunning_fog(text)

20.55

Meaning that a High school junior can read this.

In [41]:
# SMOG Index # Similar to FOG
textstat.smog_index(text)

15.9

In [42]:
# Automated Readability Index
textstat.automated_readability_index(text)

22.5

In [12]:
# Coleman-Liau Index
textstat.coleman_liau_index(text)

8.94

In [13]:
# Linsear Write Formula
textstat.linsear_write_formula(text)

10.7

In [14]:
# Dale-Chall Readability Score
textstat.dale_chall_readability_score(text)

7.54

Meaning that an average 9th or 10th-grade student can read it.

In [43]:
# Readability Consensus
textstat.text_standard(text, float_output=False)

'20th and 21st grade'

Meaning that in general someone that has finished 11th or 12th grade could understand this piece.

In [17]:
# Time to read the text in seconds
textstat.reading_time(text)

6.08

In [45]:
# Run all at once
import inspect
funcs = ["textstat." + inspect.getmembers(textstat, predicate=inspect.ismethod)[i][0] for i in range(1,28)]

In [46]:
for elem in funcs:
    method = eval(elem)
    textstat.set_lang("en")
    print(elem.split(".")[1])
    print(method(text))
    print(" ")

avg_character_per_word
4.88
 
avg_letter_per_word
4.77
 
avg_sentence_length
41.8
 
avg_sentence_per_word
0.02
 
avg_syllables_per_word
1.5
 
char_count
1020
 
coleman_liau_index
11.27
 
dale_chall_readability_score
9.11
 
dale_chall_readability_score_v2
9.11
 
difficult_words
45
 
difficult_words_list
['unable', 'repressed', 'confront', 'superiority', 'indoctrinated', 'awaiting', 'system', 'abstraction', 'doing', 'robotic', 'soldiers', 'miniscule', 'giggling', 'corporate', 'belief', 'greatest', 'efforts', 'suicide', 'continue', 'within', 'remains', 'regulate', 'simply', 'thoroughly', 'expressions', 'committing', 'interests', 'gandhi', 'spirits', 'intelligence', 'helping', 'existences', 'satyagraha', 'according', 'implicit', 'wickedness', 'details', 'civilization', 'manner', 'accompany', 'servo', 'unfocused', 'combine', 'mechanisms', 'conceivable']
 
flesch_kincaid_grade
18.4
 
flesch_reading_ease
37.51
 
gunning_fog
20.55
 
letter_count
996
 
lexicon_count
209
 
linsear_write_formula


# Spanish - Español

In [20]:
text = """
La ciencia de datos es el foco principal de la mayoría de las ciencias y estudios en este momento, 
necesita muchas cosas como inteligencia artificial, programación, estadísticas, 
comprensión del negocio, habilidades de presentación efectivas y mucho más. 
Por eso no es fácil de entender o estudiar. Pero podemos hacerlo, lo estamos haciendo.
La ciencia de datos se ha convertido en el marco de resolución de 
problemas estándar para la academia y la industria y va a ser así 
por un tiempo. Pero debemos recordar de dónde venimos, 
quiénes somos y hacia dónde vamos.
"""

In [21]:
textstat.set_lang("es")

## Note: The only readibility function implemented is the Fernandez Huerta Readability Formula which is a variant of the Flesch Reading Ease formula

In [22]:
textstat.flesch_reading_ease(text)

61.75

In [23]:
# Time to read the text in seconds
textstat.reading_time(text)

6.92

In [24]:
# This works so-so in Spanish
textstat.difficult_words_list(text)

['venimos',
 'resolución',
 'muchas',
 'estadísticas',
 'hacerlo',
 'dónde',
 'mucho',
 'pero',
 'estudios',
 'presentación',
 'ciencia',
 'datos',
 'comprensión',
 'mayoría',
 'negocio',
 'como',
 'vamos',
 'quiénes',
 'momento',
 'inteligencia',
 'programación',
 'industria',
 'habilidades',
 'convertido',
 'ciencias',
 'efectivas',
 'estamos',
 'marco',
 'estándar',
 'recordar',
 'cosas',
 'estudiar',
 'principal',
 'artificial',
 'fácil',
 'necesita',
 'hacia',
 'entender',
 'debemos',
 'academia',
 'tiempo',
 'para',
 'somos',
 'problemas',
 'haciendo',
 'foco',
 'podemos',
 'este']

# Check spelling

In [26]:
!pip install autocorrect

Collecting autocorrect
  Downloading https://files.pythonhosted.org/packages/a9/b0/a1d628fa192e8ebf124b4cebc2a42b4e3aa65b8052fdf4888e04fadf3e8d/autocorrect-1.1.0.tar.gz (1.8MB)
Building wheels for collected packages: autocorrect
  Building wheel for autocorrect (setup.py): started
  Building wheel for autocorrect (setup.py): finished with status 'done'
  Created wheel for autocorrect: filename=autocorrect-1.1.0-cp37-none-any.whl size=1810772 sha256=d9bfba981f19045e9dd11fa93ac8e1f5cfce6054e380d5f260e909b9a85dc20b
  Stored in directory: C:\Users\Pranav\AppData\Local\pip\Cache\wheels\78\7f\b1\527522820ae623df6a2dbe14f778d23adaea4bebe43f7ebcfe
Successfully built autocorrect
Installing collected packages: autocorrect
Successfully installed autocorrect-1.1.0


In [47]:
# Here I'm misspelling :
# presentation as presentatio
# focus as focsu
# framework as framwork 
text = """
Data science is the main focsu of most sciences and studies right now, 
it needs a lot of things like AI, programming, statistics, 
business understanding, effective presentatio skills and much more. 
That's why it's not easy to understand or study. But we can do it, we are doing it.
Data science has become the standard solving problem framwork for academia and 
the industry and it's going to be like that for a while. But we need to remember 
where we are coming from, who we are and where we are going.
"""

In [48]:
from autocorrect import Speller

check = Speller(lang='en')

check(text)

"\nData science is the main focus of most sciences and studies right now, \nit needs a lot of things like Ai, programming, statistics, \nbusiness understanding, effective presentation skills and much more. \nThat's why it's not easy to understand or study. But we can do it, we are doing it.\nData science has become the standard solving problem framework for academia and \nthe industry and it's going to be like that for a while. But we need to remember \nwhere we are coming from, who we are and where we are going.\n"