# Readability Assessment through Recurrent Neural Network

Readability assessment is a well known problem in natural language processing field. Giving someone the suitable text for his level of comprehension (not so easy and not so hard) could maximize his understanding and enjoyment. In this notebook we are trying to assess the readability of a given text regardless of the text subject using recurrent neural network.

## Corpus
> OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification  
> Sowmya Vajjala and Ivana Lučić  
> 2018  
> Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 297–304. Association for Computational Linguistics.  
> [url](http://aclweb.org/anthology/W18-0535). [bib file](https://aclanthology.coli.uni-saarland.de/papers/W18-0535/w18-0535.bib)

Please cite the above paper if you use this corpus in your research.

[![DOI](https://zenodo.org/badge/128919409.svg)](https://zenodo.org/badge/latestdoi/128919409)

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

Now let's dive into our corpus.

In [1]:
from __future__ import print_function
import sys
sys.path.append("/home/ms10596/Documents/match")
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from research.utils.one_stop_english import OneStopEnglish
from tabulate import tabulate
from IPython.display import display, HTML
from research.utils.loading import load_glove_embeddings
corpus = OneStopEnglish()

Reading level|Avg. Num. Words|Std. Dev|Number of Articles
---|---|---|---
Elementary|533.17|103.79|189
Intermediate|676.59|117.15|189
Advanced|820.49|162.52|189



In [4]:
@interact
def show_articles(i=(0,188,1), words=(0,1000,1)):
    data = [
        ["Advanced",corpus.articles[i+189+189][:words]], 
        ["Intermediate",corpus.articles[i+189][:words]], 
        ["Elementary",corpus.articles[i][:words]]
    ]
    headers = ['Reading Level', 'Example']
    display(HTML(tabulate(data,tablefmt='html', headers=headers)+"<style>th,td {font-size: 10px}</style>"))

interactive(children=(IntSlider(value=94, description='i', max=188), IntSlider(value=500, description='words',…

In [3]:
x = load_glove_embeddings()
print(x['the'])

['0.418' '0.24968' '-0.41242' '0.1217' '0.34527' '-0.044457' '-0.49688'
 '-0.17862' '-0.00066023' '-0.6566' '0.27843' '-0.14767' '-0.55677'
 '0.14658' '-0.0095095' '0.011658' '0.10204' '-0.12792' '-0.8443'
 '-0.12181' '-0.016801' '-0.33279' '-0.1552' '-0.23131' '-0.19181'
 '-1.8823' '-0.76746' '0.099051' '-0.42125' '-0.19526' '4.0071' '-0.18594'
 '-0.52287' '-0.31681' '0.00059213' '0.0074449' '0.17778' '-0.15897'
 '0.012041' '-0.054223' '-0.29871' '-0.15749' '-0.34758' '-0.045637'
 '-0.44251' '0.18785' '0.0027849' '-0.18411' '-0.11514' '-0.78581']
