**NOTE: Before running this notebook, be sure to place your copy of the play in the same folder as the notebook.**

# 12.4 Readability Assessment with Textatistic
* Text **readability** is affected by 
    * vocabulary used
    * sentence structure
    * sentence length
    * topic 
    * and more. 
* [**Grammarly**](https://www.grammarly.com) uses tools like these to tune writing for readability
* Textatistic uses several popular readability formulas
    * **Flesch Reading Ease**
    * **Flesch-Kincaid**
    * **Gunning Fog**
    * **Simple Measure of Gobbledygook (SMOG)** 
    * **Dale-Chall**
    

### Install Textatistic
```python
pip install textatistic
```

### Calculating Statistics and Readability Scores

In [1]:
from pathlib import Path

In [2]:
text = Path('RomeoAndJuliet.txt').read_text()

In [3]:
from textatistic import Textatistic

In [4]:
readability = Textatistic(text)

* `Textatistic` method **`dict`** returns a dictionary containing various statistics and the readability scores: 

In [5]:
%precision 3

'%.3f'

In [6]:
readability.dict()

{'char_count': 115141,
 'word_count': 26120,
 'sent_count': 3218,
 'sybl_count': 30678,
 'notdalechall_count': 5823,
 'polysyblword_count': 693,
 'flesch_score': 99.234,
 'fleschkincaid_score': 1.435,
 'gunningfog_score': 4.308,
 'smog_score': 5.780,
 'dalechall_score': 7.559}

### Calculating Statistics and Readability Scores (cont.)
* Each of the values in the dictionary is also accessible via a `Textatistic` property of the same name as the keys shown in the preceding output. The statistics produced include: 
* `char_count`—number of characters 
* `word_count`—number of words 
* `sent_count`—number of sentences 
* `sybl_count`—number of syllables 
* `notdalechall_count`—# of words not on the Dale-Chall list (words understood by 80% of 5th graders) 
    * Higher is less readable
* `polysyblword_count`—# of words with 3+ syllables
* `flesch_score`—Flesch Reading Ease score
    * 90+ considered readable by 5th graders
    * <30 require a college degree

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Calculating Statistics and Readability Scores (cont.)
* `fleschkincaid_score`—Flesch-Kincaid score corresponds to a **specific grade level**
* `gunningfog_score`—Gunning Fog index value corresponds to a **specific grade level**
* `smog_score`—[Simple Measure of Gobbledygook (SMOG)](https://en.wikipedia.org/wiki/SMOG)
    * Corresponds to **years of education** required to understand text 
    * Considered particularly effective for **healthcare materials**
* `dalechall_score`—Dale-Chall score
    * Maps to **grade levels** from 4 and below to college graduate (grade 16) and above
    * Considered **most reliable** for a **broad range of text types** 
    * [Dale Chall on Wikipedia](https://en.wikipedia.org/wiki/Readability#The_Dale%E2%80%93Chall_formula)

<hr style="height:2px; border:none; color:#000; background-color:#000;">

### Calculating Statistics and Readability Scores (cont.)
* [More details on each of the readability scores produced here and several others](https://en.wikipedia.org/wiki/Readability)
* [The Textatistic documentation also shows the readability formulas used](https://en.wikipedia.org/wiki/Readability)