# Readability Assessment through Recurrent Neural Network

Readability assessment is a well known problem in natural language processing field. Giving someone the suitable text for his level of comprehension (not so easy and not so hard) could maximize his understanding and enjoyment. In this notebook we are trying to assess the readability of a given text regardless of the text subject using recurrent neural network.

## Corpus
> OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification  
> Sowmya Vajjala and Ivana Lučić  
> 2018  
> Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 297–304. Association for Computational Linguistics.  
> [url](http://aclweb.org/anthology/W18-0535). [bib file](https://aclanthology.coli.uni-saarland.de/papers/W18-0535/w18-0535.bib)

Please cite the above paper if you use this corpus in your research.

[![DOI](https://zenodo.org/badge/128919409.svg)](https://zenodo.org/badge/latestdoi/128919409)

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

Now let's dive into our corpus.

In [1]:
from __future__ import print_function
import sys
sys.path.append("/home/ms10596/Documents/match")
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from research.utils.one_stop_english import OneStopEnglish
from tabulate import tabulate
from IPython.display import display, HTML
from research.utils.loading import load_glove_embeddings
corpus = OneStopEnglish()

Reading level|Avg. Num. Words|Std. Dev|Number of Articles
---|---|---|---
Elementary|533.17|103.79|189
Intermediate|676.59|117.15|189
Advanced|820.49|162.52|189



In [4]:
@interact
def show_articles(i=(0,188,1), words=(0,1000,1)):
    data = [
        ["Advanced",corpus.articles[i+189+189][:words]], 
        ["Intermediate",corpus.articles[i+189][:words]], 
        ["Elementary",corpus.articles[i][:words]]
    ]
    headers = ['Reading Level', 'Example']
    display(HTML(tabulate(data,tablefmt='html', headers=headers)+"<style>th,td {font-size: 10px}</style>"))

interactive(children=(IntSlider(value=94, description='i', max=188), IntSlider(value=500, description='words',…

## ~~Feature Extraction~~

In [23]:
@interact
def show_features(i=(0,60,1)):
    data = [["Adjectives (ADJ) like warm, fat"],["Adverbs (ADV) like almost, too, very"],["Articles (ART) like a, an, the"],["Conjunctions (CONJ) like for, and, nor"],
            ["Interjections (INTERJ) like wow, oops, ouch"],["Nouns (NOUN) like boy, girl, doctor, town"],
            ["Numerals (NUM) like 1, 155, 89"],["Past participles (PASTPART) like taken, eaten"],
            ["Prepositions (PREP) like at, for, in, off"],["Pronouns (PRON) like he, she, you, I"],
            ["Punctuation (PUNCT) like ?, :, ;, ."],["Special symbols (SYMBOL) like @, %,"] ,
            ["Adjectives (ADJ) like warm, fat"],["Adverbs (ADV) like almost, too, very"],
             ["Articles (ART) like a, an, the"],["Conjunctions (CONJ) like for, and, nor"],
             ["Interjections (INTERJ) like wow, oops, ouch"],["Nouns (NOUN) like boy, girl, doctor, town"],
             ["Numerals (NUM) like 1, 155, 89"],["Past participles (PASTPART) like taken, eaten"],
             ["Prepositions (PREP) like at, for, in, off"],["Pronouns (PRON) like he, she, you, I"],
             ["Punctuation (PUNCT) like ?, :, ;, ."],["Special symbols (SYMBOL) like @, %, "],
             ["Nominal phrases (NP) like The dog on the sofa"],["Adjectival phrases (AP) like The movie was terrible"],
             ["Prepositional phrases (PP) like We stayed by the river"],["Adverbial phrases (ADVP) like The carpenter hit the nail with a hammer."],
             ["Temporal auxiliary verb phrases (VTEMP) like I was wondering about."],["Aspectual auxiliary verb phrases (VASP) l have seen the light"],
             ["Modal auxiliary verb phrases (VMOD) I should study"],["Copulative verb phrases (VCOP) John is happy"],
             ["Past participle verb phrases (VPASTPART) I have bought"],["Gerundive verb phrases (VGER) Running is a good exercise"],
             ["Infinitive verb phrases (VINF) I want to study."],["Finite verb phrases (VF) She plays guitar"],
             ["Sub-clause phrases (SC e REL)Until Mr. Sanchez has his first cup"],["Verb phrases (VF e VCOP) Ali is going to school"],
             ["Number of sentences"],["Number of words"],["Number of different words"],["Number of different verbs forms"],
             ["Number of auxiliary verbs"],["Number of main verbs"],["Average number of verb phrases per sentence"],
             ["Average length of sentences"],["Average length of syllables per word"],
             ["Average size of verbal chains"],["Average size of coordination relation’s chains"],
             ["Frequency of words with 1-4 syllables"],["Frequency of words with more than 4 syllables"],
             ["Total number of dependencies"],["Total number of tree nodes"],["Number of pronouns per noun phrases (NP)"],
             ["Number of NP with a definite or demonstrative determiner"],["Number of NP with a indefinite determiner"],
             ["Number of subordinate clauses (SC/REL chunks)"], ["Number of coordination relations"], ["Number of omit subjects"],
             ["Flesch Reading Ease BR readability measure"]
    ]
    display(HTML(tabulate(data[0:i],tablefmt='html')+"<style>th,td {font-size: 10px}</style>"))
    

interactive(children=(IntSlider(value=30, description='i', max=60), Output()), _dom_classes=('widget-interact'…