<a href="https://colab.research.google.com/github/nicolashernandez/READI-LREC22/blob/main/readi_reproduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This notebook comes from the git repository available [here](https://github.com/nicolashernandez/READI-LREC22/)  
It will show how to reproduce the contents of the READI paper available [here](https://cental.uclouvain.be/readi2022/accepted.html), then show a few examples on how to manipulate the library.  
In order to speed up deep learning applications significantly, please enable GPU in this notebook's parameters :  
Edit -> Notebook Settings -> Hardware Accelerator : GPU
 


# Setup : Import dependencies then library

## Setup : NEED to restart runtime after downloading spacy model

In [1]:
%%capture
!python -m spacy download fr_core_news_sm
#This cell only needs to be run once.

Restart the runtime once (Ctrl+M . OR Runtime > Restart Runtime) then execute the following

In [1]:
import spacy

In [2]:
spacy.load("fr_core_news_sm")

<spacy.lang.fr.French at 0x7ff21f8d0ed0>

##Setup : Importing library and assorted data

In [3]:
%%capture
# 1. Download project and set current directory
!git clone https://github.com/nicolashernandez/READI-LREC22/
%cd READI-LREC22/

In [4]:
%%capture
# 2. Install module, should take around a minute to install every dependency
%cd readability
!pip install .
%cd ..

In [5]:
# 3. Add project directory to the path
import sys,os
sys.path.append(os.getcwd())
sys.path.append(os.path.join(os.getcwd(),"readability"))
sys.path.append(os.path.join(os.getcwd(),"readability","readability"))

In [10]:
import readability

# Recreating experiments

Six files are located in the git repository that was cloned : in the READI-LREC22/demo folder.  
These contain the cleaned and formatted content of the corpuses used in our project, and will be used for the demonstrations.

In [11]:
import pickle
with open(os.path.join(os.getcwd(),"data","tokens_split.pkl"), "rb") as file:
    corpus_ljl = pickle.load(file)
with open(os.path.join(os.getcwd(),"data","bibebook.com.pkl"), "rb") as file:
    corpus_bb = pickle.load(file)
with open(os.path.join(os.getcwd(),"data","JeLisLibre_md.pkl"), "rb") as file:
    corpus_jll = pickle.load(file)


If you wish to view the content, simply treat it as a dictionary containing texts, classes can be known by doing dict.keys().  
Each text being a list of sentences, which are lists of tokens.  
For instance: corpus_ljl['level1'][0][0] would give you the first sentence of the first text in the ljl corpus, for the "level1" class.  


In [12]:
corpus_ljl['level1'][0][0]

["Aujourd'hui",
 ',',
 'toute',
 'la',
 'famille',
 'est',
 'allée',
 'à',
 'la',
 'fête',
 'foraine',
 '.']

In [13]:
for level in corpus_bb.keys():
  for text in corpus_bb[level][:]:
    if len(text)==0:
      corpus_bb[level].remove(text)

for level in corpus_jll.keys():
  for text in corpus_jll[level][:]:
    if len(text)==0:
      corpus_jll[level].remove(text)

##Reproducing the contents of table 2

In [14]:
import pandas as pd

In [15]:
corp_info_ljl = readability.Readability(corpus_ljl).corpus_info()
corp_info_ljl.rename({"Nombre de fichiers" : "Nombre de fichiers artificiel"}, axis = 'index', inplace = True)
original_documents = [240,314,134,58,746]
extract_ljl = pd.DataFrame([corp_info_ljl.loc["Nombre de fichiers artificiel"],corp_info_ljl.loc['Nombre de phrases total'],corp_info_ljl.loc['Nombre de tokens']])
extract_ljl.loc["Nombre de fichiers original"] = original_documents
extract_ljl.columns.name = "Corpus ljl"

corp_info_bb = readability.Readability(corpus_bb).corpus_info()
corp_info_bb.rename({'Nombre de fichiers' : 'Nombre de fichiers artificiel'}, axis = 'index', inplace = True)
original_documents = [52,91,65,208]
extract_bb = pd.DataFrame([corp_info_bb.loc["Nombre de fichiers artificiel"],corp_info_bb.loc['Nombre de phrases total'],corp_info_bb.loc['Nombre de tokens']])
extract_bb.loc["Nombre de fichiers original"] = original_documents
extract_bb.columns.name = "Corpus bb"

corp_info_jll = readability.Readability(corpus_jll).corpus_info()
corp_info_jll.rename({'Nombre de fichiers' : 'Nombre de fichiers artificiel'}, axis = 'index', inplace = True)
original_documents = [13,12,10,9,44]
extract_jll = pd.DataFrame([corp_info_jll.loc["Nombre de fichiers artificiel"],corp_info_jll.loc['Nombre de phrases total'],corp_info_jll.loc['Nombre de tokens']])
extract_jll.loc["Nombre de fichiers original"] = original_documents
extract_jll.columns.name = "Corpus jll"

Acquiring Natural Language Processor...
DEBUG: Spacy model location (already installed) :  /usr/local/lib/python3.7/dist-packages/fr_core_news_sm/fr_core_news_sm-3.3.0
Acquiring Natural Language Processor...
DEBUG: Spacy model location (already installed) :  /usr/local/lib/python3.7/dist-packages/fr_core_news_sm/fr_core_news_sm-3.3.0
Acquiring Natural Language Processor...
DEBUG: Spacy model location (already installed) :  /usr/local/lib/python3.7/dist-packages/fr_core_news_sm/fr_core_news_sm-3.3.0


In [16]:
extract_ljl

Corpus ljl,level1,level2,level3,level4,total
Nombre de fichiers artificiel,240.0,628.0,670.0,522.0,2060.0
Nombre de phrases total,4880.0,13049.0,10354.0,7743.0,36026.0
Nombre de tokens,38976.0,128019.0,124901.0,101165.0,393061.0
Nombre de fichiers original,240.0,314.0,134.0,58.0,746.0


In [17]:
extract_bb

Corpus bb,intermédiaire,avancée,aisée,total
Nombre de fichiers artificiel,1729.0,1253.0,986.0,3968.0
Nombre de phrases total,22088.0,15762.0,12274.0,50124.0
Nombre de tokens,315369.0,232604.0,173939.0,721912.0
Nombre de fichiers original,52.0,91.0,65.0,208.0


In [18]:
extract_jll

Corpus jll,cycle4_3e,cycle4_4e,cycle4_5e,cycle3_6e,total
Nombre de fichiers artificiel,986.0,989.0,1187.0,1283.0,4445.0
Nombre de phrases total,14689.0,13553.0,13818.0,13463.0,55523.0
Nombre de tokens,188091.0,195375.0,211099.0,256573.0,851138.0
Nombre de fichiers original,13.0,12.0,10.0,9.0,44.0


##Reproducting the contents of table 3

###Traditional scores

In [19]:
%%capture
scores_ljl = readability.Readability(corpus_ljl).compile().scores()
scores_bb = readability.Readability(corpus_bb).compile().scores()
scores_jll = readability.Readability(corpus_jll).compile().scores()

In [20]:
scores_ljl

Mean values,level1,level2,level3,level4,Pearson Score
The Gunning fog index GFI,45.132518,67.697721,91.866336,105.669951,0.475915
The Automated readability index ARI,14.238996,19.932585,25.719148,27.7577,0.472037
The Flesch reading ease FRE,90.625507,84.87584,82.799719,81.03216,-0.402143
The Flesch-Kincaid grade level FKGL,4.5768,6.681592,8.323094,9.019,0.451786
The Simple Measure of Gobbledygook SMOG,16.060678,18.765666,21.195241,22.162764,0.481589
Reading Ease Level,92.465376,82.239781,75.110005,71.711107,-0.408414


In [21]:
scores_bb

Mean values,intermédiaire,avancée,aisée,Pearson Score
The Gunning fog index GFI,128.933829,122.607686,122.606339,-0.037663
The Automated readability index ARI,36.714696,36.263151,35.554898,-0.024102
The Flesch reading ease FRE,92.762222,94.296337,92.604799,0.02549
The Flesch-Kincaid grade level FKGL,9.675927,9.383585,9.426934,-0.025253
The Simple Measure of Gobbledygook SMOG,24.061616,23.95396,23.846759,-0.014903
Reading Ease Level,73.569151,75.304801,74.499195,0.025572


In [22]:
scores_jll

Mean values,cycle4_3e,cycle4_4e,cycle4_5e,cycle3_6e,Pearson Score
The Gunning fog index GFI,104.065761,102.421179,132.388006,119.822886,0.111777
The Automated readability index ARI,34.358482,36.11901,40.75461,46.380268,0.195434
The Flesch reading ease FRE,114.201297,117.745389,101.240928,123.757483,-0.122791
The Flesch-Kincaid grade level FKGL,6.245672,6.271303,9.533058,7.957749,0.179573
The Simple Measure of Gobbledygook SMOG,21.115594,21.694732,24.385067,23.741976,0.194284
Reading Ease Level,94.329612,95.735498,77.177792,91.457773,-0.131285


###Pseudo-perplexity (takes around an hour to calculate)

In [23]:
perplexity_calculator = readability.readability.perplexity.pppl_calculator
perplexity_calculator.load_model()

Downloading:   0%|          | 0.00/538 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/510M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/853k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/513k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/121 [00:00<?, ?B/s]

0

This will take around an hour and a half to calculate everything, even with the GPU enabled.

In [24]:
perplex_ljl = perplexity_calculator.PPPL_score(corpus_ljl)
perplex_bb = perplexity_calculator.PPPL_score(corpus_bb)
perplex_jll = perplexity_calculator.PPPL_score(corpus_jll)

Now calculating pseudo-perplexity for class : level1
Now calculating pseudo-perplexity for class : level2
Now calculating pseudo-perplexity for class : level3
Now calculating pseudo-perplexity for class : level4
Now calculating pseudo-perplexity for class : intermédiaire
Now calculating pseudo-perplexity for class : avancée
Now calculating pseudo-perplexity for class : aisée
Now calculating pseudo-perplexity for class : cycle4_3e
Now calculating pseudo-perplexity for class : cycle4_4e
Now calculating pseudo-perplexity for class : cycle4_5e
Now calculating pseudo-perplexity for class : cycle3_6e


In [25]:
#TODO : Put this function in the library.
from scipy.stats import pearsonr
pearson = []
ppl_list = []
labels = []
for level in corpus_ljl.keys():
  for val in perplex_ljl[level]:
    ppl_list.append(val)
    labels.append(list(corpus_ljl.keys()).index(level))

maxppl = max(ppl_list)
ppl_list = [val/maxppl for val in ppl_list]

pearson.append(pearsonr(ppl_list,labels)[0])

moy_ppl= list()
for level in corpus_ljl.keys():
  moy=0
  for score in perplex_ljl[level]:
    moy+= score/len(perplex_ljl[level])
  moy_ppl.append(moy)

scores_ljl.loc["Pseudo perplexity"] = moy_ppl + pearson

In [26]:
pearson = []
ppl_list = []
labels = []
for level in corpus_bb.keys():
  for val in perplex_bb[level]:
    ppl_list.append(val)
    labels.append(list(corpus_bb.keys()).index(level))

maxppl = max(ppl_list)
ppl_list = [val/maxppl for val in ppl_list]

pearson.append(pearsonr(ppl_list,labels)[0])

moy_ppl= list()
for level in corpus_bb.keys():
  moy=0
  for score in perplex_bb[level]:
    moy+= score/len(perplex_bb[level])
  moy_ppl.append(moy)

scores_bb.loc["Pseudo perplexity"] = moy_ppl + pearson

In [27]:
pearson = []
ppl_list = []
labels = []
for level in corpus_jll.keys():
  for val in perplex_jll[level]:
    ppl_list.append(val)
    labels.append(list(corpus_jll.keys()).index(level))

maxppl = max(ppl_list)
ppl_list = [val/maxppl for val in ppl_list]

pearson.append(pearsonr(ppl_list,labels)[0])

moy_ppl= list()
for level in corpus_jll.keys():
  moy=0
  for score in perplex_jll[level]:
    moy+= score/len(perplex_jll[level])
  moy_ppl.append(moy)

scores_jll.loc["Pseudo perplexity"] = moy_ppl + pearson

In [28]:
scores_ljl

Mean values,level1,level2,level3,level4,Pearson Score
The Gunning fog index GFI,45.132518,67.697721,91.866336,105.669951,0.475915
The Automated readability index ARI,14.238996,19.932585,25.719148,27.7577,0.472037
The Flesch reading ease FRE,90.625507,84.87584,82.799719,81.03216,-0.402143
The Flesch-Kincaid grade level FKGL,4.5768,6.681592,8.323094,9.019,0.451786
The Simple Measure of Gobbledygook SMOG,16.060678,18.765666,21.195241,22.162764,0.481589
Reading Ease Level,92.465376,82.239781,75.110005,71.711107,-0.408414
Pseudo perplexity,53.319739,55.769791,62.392847,62.07661,0.040924


In [29]:
scores_bb

Mean values,intermédiaire,avancée,aisée,Pearson Score
The Gunning fog index GFI,128.933829,122.607686,122.606339,-0.037663
The Automated readability index ARI,36.714696,36.263151,35.554898,-0.024102
The Flesch reading ease FRE,92.762222,94.296337,92.604799,0.02549
The Flesch-Kincaid grade level FKGL,9.675927,9.383585,9.426934,-0.025253
The Simple Measure of Gobbledygook SMOG,24.061616,23.95396,23.846759,-0.014903
Reading Ease Level,73.569151,75.304801,74.499195,0.025572
Pseudo perplexity,414.00173,161.6246,152.336135,-0.129933


In [30]:
scores_jll

Mean values,cycle4_3e,cycle4_4e,cycle4_5e,cycle3_6e,Pearson Score
The Gunning fog index GFI,104.065761,102.421179,132.388006,119.822886,0.111777
The Automated readability index ARI,34.358482,36.11901,40.75461,46.380268,0.195434
The Flesch reading ease FRE,114.201297,117.745389,101.240928,123.757483,-0.122791
The Flesch-Kincaid grade level FKGL,6.245672,6.271303,9.533058,7.957749,0.179573
The Simple Measure of Gobbledygook SMOG,21.115594,21.694732,24.385067,23.741976,0.194284
Reading Ease Level,94.329612,95.735498,77.177792,91.457773,-0.131285
Pseudo perplexity,169.453304,172.710205,114.057694,177.678495,-0.024565


##Reproducing the contents of table 4 for MLP and SVM

This should take around 50 minutes to compute on Colab.

In [31]:
from readability.methods import methods

In [32]:
methods.demo_doMethods(corpus_ljl,plot=False)

Matrix dimensions: (2060, 11661)
Vocabulary size: 11661
MLP RESULTS
cross-validation result for 5 runs = 0.479126213592233
              precision    recall  f1-score   support

      level1       0.45      0.46      0.45       240
      level2       0.47      0.63      0.54       628
      level3       0.47      0.47      0.47       670
      level4       0.54      0.32      0.40       522

    accuracy                           0.48      2060
   macro avg       0.48      0.47      0.47      2060
weighted avg       0.49      0.48      0.47      2060

SVM RESULTS
cross-validation result for 5 runs = 0.4757281553398058
              precision    recall  f1-score   support

      level1       0.46      0.42      0.44       240
      level2       0.46      0.60      0.52       628
      level3       0.47      0.49      0.48       670
      level4       0.53      0.33      0.41       522

    accuracy                           0.48      2060
   macro avg       0.48      0.46      0.46     

In [33]:
methods.demo_doMethods(corpus_bb,plot=False)

Matrix dimensions: (3968, 19590)
Vocabulary size: 19590
MLP RESULTS
cross-validation result for 5 runs = 0.4977301387137453
                precision    recall  f1-score   support

intermédiaire       0.52      0.60      0.56      1729
      avancée       0.51      0.48      0.50      1253
        aisée       0.43      0.33      0.37       986

      accuracy                           0.50      3968
     macro avg       0.48      0.47      0.48      3968
  weighted avg       0.49      0.50      0.49      3968

SVM RESULTS
cross-validation result for 5 runs = 0.5176462180095991
                precision    recall  f1-score   support

intermédiaire       0.52      0.66      0.59      1729
      avancée       0.55      0.46      0.50      1253
        aisée       0.46      0.33      0.39       986

      accuracy                           0.52      3968
     macro avg       0.51      0.49      0.49      3968
  weighted avg       0.51      0.52      0.51      3968



In [34]:
methods.demo_doMethods(corpus_jll,plot=False)

Matrix dimensions: (4445, 19128)
Vocabulary size: 19128
MLP RESULTS
cross-validation result for 5 runs = 0.604949381327334
              precision    recall  f1-score   support

   cycle4_3e       0.56      0.58      0.57       986
   cycle4_4e       0.41      0.38      0.39       989
   cycle4_5e       0.80      0.61      0.69      1187
   cycle3_6e       0.64      0.79      0.71      1283

    accuracy                           0.60      4445
   macro avg       0.60      0.59      0.59      4445
weighted avg       0.61      0.60      0.60      4445

SVM RESULTS
cross-validation result for 5 runs = 0.5739032620922384
              precision    recall  f1-score   support

   cycle4_3e       0.52      0.48      0.50       986
   cycle4_4e       0.42      0.31      0.36       989
   cycle4_5e       0.75      0.61      0.67      1187
   cycle3_6e       0.57      0.81      0.67      1283

    accuracy                           0.57      4445
   macro avg       0.56      0.55      0.55     

## How to reproduce the results in table 4 for fastText and CamemBERT

In [35]:
from readability.models import models, fasttext, bert

The following demonstration uses the csv files available in the data/ folder, encoded in one-hot vector format.  
It relies on the ktrain library (wrapping around Keras) to help configure and train models for deep learning use.  
Please enable the GPU to make these much faster :  
Edit -> Notebook Settings -> Hardware Accelerator : GPU

###fastText

In [37]:
fasttext.demo_doFastText("ljl") #Can pass "ljl", "bibebook.com", "JeLisLibre", or "all" as a parameter
# Takes around 15 minutes without GPU for the ljl corpus (default parameter) on free colab
# Takes around 3 minute with GPU enabled.

FileNotFoundError: ignored

###CamemBERT

This takes multiple hours without having enabled the GPU, remember to do this before:    
Edit -> Notebook Settings -> Hardware Accelerator : GPU

In [None]:
bert.demo_doBert() #Can pass "ljl", "bibebook.com", "JeLisLibre", or "all" as a parameter
#Takes around 15 minutes for the ljl corpus on GPU (default parameter)

# Examples of use

## Importing data for the examples

In [None]:
import pickle
with open(os.path.join(os.getcwd(),"data","tokens_split.pkl"), "rb") as file:
    corpus = pickle.load(file)

In [None]:
#This can also be done by doing a wget :
#!wget -nc https://github.com/nicolashernandez/READI-LREC22/blob/main/data/tokens_split.pkl?raw=true -P data
#with open(os.path.join(os.getcwd(),"data","tokens_split.pkl?raw=true"), "rb") as file:
#    corpus = pickle.load(file)

## Example one : Using the library for a text

Texts can be strings, but it is preferred to prepare them beforehand as tokenized sentences. ( list(list()) )  
If using spacy, something like this can be used :  
new_text = [[token.text for token in sent] for sent in spacy(text).sents]  
And to remove punctuation marks, this can be done instead :  
new_text = [[token.text for token in sent if not token.is_punct] for sent in spacy(temp).sents]

A readability instance is created by calling readability.Readability(text)  
The following arguments are optional : lang, nlp_name, perplexity_processor  
By default, this instance will use the french language, by using a spacy_sm nlp processor, and gpt2 for processing perplexity

In [None]:
import pandas as pd
import spacy
#Types of available formats for a text:
r = readability.Readability(corpus['level1'][0]) # A text in the list(list()) format used internally
#r = readability.Readability(' '.join(corpus['level1'][0][0])) # A string, it will be converted into a list(list()), of size 1, with 12 tokens, including punctuation

Acquiring Natural Language Processor...
DEBUG: Spacy model location (already installed) :  /usr/local/lib/python3.7/dist-packages/fr_core_news_sm/fr_core_news_sm-2.2.5


Common scores can be accessed by using the corresponding function.

In [None]:
gfi = r.gfi()
gfi #is 61.52380952380953

61.52380952380953

More conveniently, a list of these scores can be obtained by using .scores()

In [None]:
r.scores()

{'ari': 21.503161490683233,
 'fkgl': 8.382298136645964,
 'fre': 54.47311594202901,
 'gfi': 61.52380952380953,
 'rel': 73.00333333333334,
 'smog': 13.023866798666859}

In order to speed the calculations needed by these functions, the .compile() function can be used.  
It calculates most of the statistics needed for a text, and puts it in the .statistics attribute of the Readability object.  
These can be viewed by doing .stats(), or directly accessing the .statistics attribute.  
For example : .statistics.totalWords

In [None]:
r.compile()
r.stats()
r.statistics.totalWords

totalWords = 230
totalLongWords = 30
totalSentences = 21
totalCharacters = 837
totalSyllables = 384
nbPolysyllables = 63


230

## Example two : Using the library for a corpus

Currently, a corpus will be recognized by the library only if provided with the following structure :  
type(corpus) = dict[class][text][sentence][token]  
For instance, corpA['class1'][0][0][0] should return the first token of the first sentence of the first text of class 'class1', for the corpus 'corpA'.

In [None]:
r = readability.Readability(corpus)

Acquiring Natural Language Processor...
DEBUG: Spacy model location (already installed) :  /usr/local/lib/python3.7/dist-packages/fr_core_news_sm/fr_core_news_sm-2.2.5


A useful function resuming the contents of the corpus is available, called .corpus_info()

In [None]:
r.corpus_info()

Unnamed: 0,level1,level2,level3,level4,total
Nombre de fichiers,240.0,628.0,670.0,522.0,2060.0
Nombre de phrases total,4880.0,13049.0,10354.0,7743.0,36026.0
Nombre de phrases moyen,20.0,21.0,15.0,15.0,17.0
Longueur moyenne de phrase,8.0,10.0,12.0,13.0,11.0
Nombre de tokens,38976.0,128019.0,124901.0,101165.0,393061.0
Nombre de token moyen,162.0,204.0,186.0,194.0,191.0
Taille du vocabulaire,4836.0,10903.0,11953.0,11410.0,23100.0
Taille moyenne du vocabulaire,99.0,130.0,127.0,149.0,2257.0


When using a corpus, the Readability object's methods can return different types of results, but the behavior is similar:  
Instead of returning a value, or a list, the methods may return them in a dict[class][text_index] format.  
Additionally, .compile() will create the .corpus_statistics attribute instead of .statistics.  
.stats() will print the statistics of the first text in each class, in addition to showing the mean values.

In [None]:
r.compile()
r.stats()

Class level1
totalWords = 230
totalLongWords = 30
totalSentences = 21
totalCharacters = 837
totalSyllables = 384
nbPolysyllables = 63
Class level2
totalWords = 138
totalLongWords = 26
totalSentences = 8
totalCharacters = 555
totalSyllables = 240
nbPolysyllables = 43
Class level3
totalWords = 104
totalLongWords = 16
totalSentences = 11
totalCharacters = 405
totalSyllables = 184
nbPolysyllables = 21
Class level4
totalWords = 567
totalLongWords = 112
totalSentences = 35
totalCharacters = 2307
totalSyllables = 972
nbPolysyllables = 151


In [None]:
gfi_corp = r.gfi()
gfi_corp['level1'][0] #Is also 61.52380952380953

class level1 text 0 score 61.52380952380953
class level2 text 0 score 136.9
class level3 text 0 score 61.963636363636375
class level4 text 0 score 134.48


61.52380952380953

r.scores behaves differently, instead of giving the scores for each text, it returns a dataframe showing the mean values, (and prints out the standard deviation)

In [None]:
r.scores()

Standard Deviation values                   level1     level2     level3  \
The Gunning fog index GFI                22.638448  28.598931  38.724814   
The Automated readability index ARI       6.265479   6.977522   8.614690   
The Flesch reading ease FRE              26.013539  24.790444  29.308209   
The Flesch-Kincaid grade level FKGL       3.386159   2.447647   2.501191   
The Simple Measure of Gobbledygook SMOG   1.728092   1.647957   1.839104   
Reading Ease Level                       19.108738  12.993784  12.457079   

Standard Deviation values                   level4  
The Gunning fog index GFI                45.192761  
The Automated readability index ARI       9.156945  
The Flesch reading ease FRE              32.117630  
The Flesch-Kincaid grade level FKGL       2.833996  
The Simple Measure of Gobbledygook SMOG   1.978900  
Reading Ease Level                       14.318104  


Mean values,level1,level2,level3,level4,Pearson Score
The Gunning fog index GFI,45.132518,67.697721,91.866336,105.669951,0.475915
The Automated readability index ARI,14.238996,19.932585,25.719148,27.7577,0.472037
The Flesch reading ease FRE,90.625507,84.87584,82.799719,81.03216,-0.402143
The Flesch-Kincaid grade level FKGL,4.5768,6.681592,8.323094,9.019,0.451786
The Simple Measure of Gobbledygook SMOG,10.110286,11.521299,12.760749,13.210278,0.471106
Reading Ease Level,92.465376,82.239781,75.110005,71.711107,-0.408414


In addition, machine learning and deep learning applications can be used with the corpus' data to help develop NLP solutions

In [None]:
#r.importmodel(camembert)
#r.configmodel(params)
#r.train(mode=autofit)