# Sentence Quality Scorer

This is the third notebook which evaluates three metrics. 
* The grading level for each clause
* The reading ease in each clause
* The quality of sentence for each clause.

### Introduction to Grading ease and Reading Level Scoring
Grading ease and Reading Levels are computed using the metrics elaborated here: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests

In [1]:
#!pip install textacy
#!python3 -m spacy download en # tjm removed this may not run correct python
import textacy
import pandas as pd
from textacy.text_stats import TextStats
print("Loaded")

Loaded


The input for this notebook is from "abstraction_scored.csv"

In [2]:
df = pd.read_csv("./abstraction_scored.csv")
df.clauses_text_final = df.clauses_text_final.apply(eval)
df.voice = df.voice.apply(eval)
df.abstraction_score = df.abstraction_score.apply(eval)
df.sample(frac = 1).head(10)

Unnamed: 0,UID,survey_id,prompt_number,prompt_id,prompt,response,clauses_text_final,voice,score,PassAct,idx,abstraction_score,abstraction_score_normalized
174,01765.035.,1765,35,35,My conscience bothers me if,I don't clean up the messes I make as an imper...,"[I don t clean up the messes, I make as an imp...","[P_bevb_x, A_def, A_pron_x]",5.5,a,174,"[0.25, 0.25, 0.14]","[1.0, 1.0, 0.56]"
138,01953.016.,1953,16,16,I feel sorry,for those who are suffering because they don't...,"[for those, who are suffering, because they do...","[Undefined, P_bevb_x, P_bevb_x, A_pron_x]",4.5,a,138,"[0.14, 0.14, 0.25, 0.25]","[0.56, 0.56, 1.0, 1.0]"
200,01889.009.,1889,9,9,Education,"is invaluable, which is grounded in the wisdom...","[is invaluable, which is grounded in the wisdo...","[P_bevb_x, P_bevb_x, P_bevb_x, P_bevb_x]",6.5,a,200,"[0.14, 0.25, 0.25, 0.14]","[0.56, 1.0, 1.0, 0.56]"
93,02359.003.,2359,3,3,Change is,Reflecting and improving your practice.,"[Reflecting and, improving your practice]","[A_def, A_pron_x]",3.5,a,93,"[0.25, 0.25]","[1.0, 1.0]"
102,01754.009.,1754,9,9,Education,is a lifelong joy and is necessary for continu...,"[is a lifelong joy and, is necessary for conti...","[P_bevb_x, P_bevb_x]",4.0,p,102,"[0.14, 0.25]","[0.56, 1.0]"
122,03379.036.,3379,36,48,Sometimes I wish that,making the world better would be easier and ev...,"[making the world better, would be easier and ...","[A_def, P_bevb_x, A_pron_x]",4.5,a,122,"[0.14, 0.14, 0.22]","[0.56, 0.56, 0.88]"
84,02784.025.,2784,25,25,My main problem is,Staying In A Good mindset Long Term.,[Staying In A Good mindset],[A_def],3.5,a,84,[0.22],[0.88]
130,03450.027.,3450,27,45,People who step out of line,are doing so for a reason. I want to understan...,"[are doing so for a reason, I want to understa...","[P_bevb_x, A_def, A_def, P_bevb_x, A_def]",4.5,a,130,"[0.14, 0.22, 0.14, 0.25, 0.25]","[0.56, 0.88, 0.56, 1.0, 1.0]"
145,01904.032.,1904,32,32,If I can't get what I want,"i'm frustrated, and sometimes feel sorry for m...","[and, i m frustrated, sometimes feel sorry for...","[Undefined, Undefined, A_def, A_def, A_def, P_...",5.0,p,145,"[0.14, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2...","[0.56, 0.48, 0.56, 0.56, 0.56, 0.56, 0.56, 1.0..."
208,01889.029.,1889,29,29,If my mother,and father find their full divine embrace with...,[and father find their full divine embrace wit...,"[A_pron_x, P_bevb_x, A_pron_x]",6.5,a,208,"[0.25, 0.14, 0.14]","[1.0, 0.56, 0.56]"


The raw values of reading ease and grading levels are computed. These are available via Textacy's TextStats.

In [4]:
def score_readability(text):
    doc = textacy.Doc(text, lang = u'en_core_web_lg') # tjm changed lang = "en"
    ts = TextStats(doc)
    return ts.readability_stats

df['readability_attributes_score'] = df.clauses_text_final.apply(lambda arr: [score_readability(x) for x in arr])
df['grading_level'] = df.readability_attributes_score.apply(lambda dct_arr: [round(dct['flesch_kincaid_grade_level'], 2) for dct in dct_arr])
df['reading_ease'] = df.readability_attributes_score.apply(lambda dct_arr: [round(dct['flesch_reading_ease'], 2) for dct in dct_arr])
_ = """[{'flesch_kincaid_grade_level': 0.6257142857142846, 'flesch_reading_ease': 103.04428571428573, 'smog_index': 3.1291, 'gunning_fog_index': 2.8000000000000003, 'coleman_liau_index': 2.6518669999999993, 'automated_readability_index': 0.23714285714285666, 'lix': 7.0, 'gulpease_index': 93.28571428571428, 'wiener_sachtextformel': -2.5074571428571426}]"""
del df['readability_attributes_score']
df.sample(frac = 1).head(10)

Unnamed: 0,UID,survey_id,prompt_number,prompt_id,prompt,response,clauses_text_final,voice,score,PassAct,idx,abstraction_score,abstraction_score_normalized,grading_level,reading_ease
188,03151.017.,3151,17,17,When they avoided me,I was reminded of my school years. Loneliness ...,"[I was reminded of my school years, moved thro...","[A_pron_x, A_def, A_def, A_def, P_bevb_x]",6.0,p,188,"[0.25, 0.14, 0.25, 0.25, 0.25]","[1.0, 0.56, 1.0, 1.0, 1.0]","[2.31, 0.52, -2.23, 9.57, 6.91]","[90.96, 102.05, 118.18, 33.58, 57.07]"
172,02619.023.,2619,23,23,I am,"I am having trouble knowing what I am, this or...","[I am having trouble, what I am this or, that ...","[P_bevb_x, P_bevb_x, P_bevb_x, A_def]",5.5,a,172,"[0.22, 0.22, 0.22, 0.14]","[0.88, 0.88, 0.88, 0.56]","[3.67, -1.84, 3.93, 0.72]","[75.88, 117.16, 95.42, 97.03]"
76,03343.029.,3343,29,29,If my mother,was different I would also be different,"[was different, I would also be different]","[P_bevb_x, P_bevb_x]",3.0,p,76,"[0.14, 0.22]","[0.56, 0.88]","[8.79, 5.24]","[35.61, 66.4]"
82,02041.028.,2041,28,46,A partner has the right to,"To give their suggestions, ideas, to come up w...","[To give their suggestions ideas, to come up w...","[A_pron_x, A_pron_x]",3.5,a,82,"[0.14, 0.25]","[0.56, 1.0]","[2.88, 2.28]","[83.32, 92.97]"
97,02150.021.,2150,21,21,I just can't stand people who,constantly interrupt others to bring the conve...,"[all the time these days, constantly interrupt...","[Undefined, A_def, P_bevb_x]",3.5,a,97,"[0.14, 0.25, 0.14]","[0.56, 1.0, 0.56]","[-1.84, 9.55, 3.67]","[117.16, 44.41, 75.88]"
155,03410.009.,3410,9,9,Education,"comes in all forms. Formal, street smarts, rel...","[comes in all forms and more, Formal street sm...","[A_def, A_def]",5.0,p,155,"[0.25, 0.25]","[1.0, 1.0]","[-1.45, 10.74]","[116.15, 30.53]"
166,02619.021.,2619,21,21,I just can't stand people who,I just can't stand people who stand people on ...,"[I just can t stand people, who stand people o...","[P_bevb_x, A_def, P_bevb_x, P_bevb_x, A_def, P...",5.5,a,166,"[0.25, 0.22, 0.25, 0.22, 0.22, 0.25, 0.22]","[1.0, 0.88, 1.0, 0.88, 0.88, 1.0, 0.88]","[0.52, 0.52, 0.8, 0.72, -1.45, 3.76, -3.01]","[102.05, 102.05, 103.54, 97.03, 116.15, 82.39,..."
174,01765.035.,1765,35,35,My conscience bothers me if,I don't clean up the messes I make as an imper...,"[I don t clean up the messes, I make as an imp...","[P_bevb_x, A_def, A_pron_x]",5.5,a,174,"[0.25, 0.25, 0.14]","[1.0, 1.0, 0.56]","[0.63, 4.45, 6.01]","[103.04, 73.85, 69.78]"
219,01889.009.,1889,9,9,Education,"is invaluable, which is grounded in the wisdom...","[is invaluable, which is grounded in the wisdo...","[P_bevb_x, P_bevb_x, P_bevb_x, P_bevb_x]",6.5,a,219,"[0.14, 0.25, 0.25, 0.14]","[0.56, 1.0, 1.0, 0.56]","[8.79, 2.34, 2.48, 5.68]","[35.61, 94.3, 87.95, 66.79]"
185,03151.008.,3151,8,8,What gets me into trouble is,Falling from eternity and spaciousness into co...,[Falling from eternity and spaciousness into c...,"[A_def, A_def, P_bevb_x, A_def]",6.0,p,185,"[0.25, 0.14, 0.14, 0.14]","[1.0, 0.56, 0.56, 0.56]","[11.13, 8.79, 5.24, 12.32]","[29.52, 35.61, 66.4, 17.45]"


The absolute metrics are normalized between a 0-1 scale. The reading ease is inversely proportional to the quality of the sentence. So the reading-ease values are negated and these negated scores are reverse normalized.

In [5]:
def normalize(row, x_max, x_min, reverse_arr = False):
    if not reverse_arr:
        return [round((x - x_min)/(x_max - x_min), 2) for x in row]
    return [round((-1*x - x_min)/(x_max - x_min), 2) for x in row]

reading_ease = df['reading_ease'].tolist()
reading_ease = [j for i in reading_ease for j in i]
reading_ease = [-1*x for x in reading_ease]
x_max, x_min = max(reading_ease), min(reading_ease)
df['reading_ease_normalized'] = df['reading_ease'].apply(lambda arr : normalize(arr, x_max, x_min, reverse_arr = True))

grading_levels = df['grading_level'].tolist()
grading_levels = [j for i in grading_levels for j in i]
x_max, x_min = max(grading_levels), min(grading_levels)
df['grading_level_normalized'] = df['grading_level'].apply(lambda arr : normalize(arr, x_max, x_min, reverse_arr = False))
df.sample(frac = 1).head(10)

Unnamed: 0,UID,survey_id,prompt_number,prompt_id,prompt,response,clauses_text_final,voice,score,PassAct,idx,abstraction_score,abstraction_score_normalized,grading_level,reading_ease,reading_ease_normalized,grading_level_normalized
132,02686.011.,2686,11,39,What I like to do best is,find myself deeply engaged in acts of expression.,"[find, myself deeply engaged in acts of expres...","[A_def, A_def]",4.5,a,132,"[0.14, 0.25]","[0.56, 1.0]","[-3.4, 5.68]","[121.22, 66.79]","[0.0, 0.02]","[0.0, 0.02]"
169,01805.021.,1805,21,21,I just can't stand people who,I perceive this is a creation of a human mind ...,"[I perceive this, is a creation of a human min...","[A_def, P_bevb_x, A_def, A_pron_x, A_def, A_de...",5.5,a,169,"[0.25, 0.25, 0.25, 0.25, 0.25, 0.14, 0.14, 0.14]","[1.0, 1.0, 1.0, 1.0, 1.0, 0.56, 0.56, 0.56]","[1.31, 2.31, 12.52, 12.52, 0.52, -2.62, 1.31, ...","[90.99, 90.96, 12.43, 12.43, 102.05, 119.19, 9...","[0.01, 0.01, 0.04, 0.04, 0.01, 0.0, 0.01, 0.01]","[0.01, 0.02, 0.04, 0.04, 0.01, 0.0, 0.01, 0.01]"
142,03434.014.,3434,14,14,The past,is a mesmerizing illusion.,[is a mesmerizing illusion],[P_bevb_x],5.0,p,142,[0.14],[0.56],[12.52],[12.43],[0.04],[0.04]
108,02883.015.,2883,15,41,Privacy,"is respecting the benefits of having quiet, re...","[is respecting the benefits of, having quiet r...","[P_bevb_x, P_bevb_x]",4.0,p,108,"[0.25, 0.25]","[1.0, 1.0]","[7.6, 13.09]","[49.48, 19.03]","[0.03, 0.04]","[0.03, 0.05]"
61,02333.027.,2333,27,45,People who step out of line,should be dealt with behind closed doors.,[should be dealt with behind closed doors],[P_bevb_x],3.0,p,61,[0.25],[1.0],[0.63],[103.04],[0.01],[0.01]
25,02505.023.,2505,23,23,I am,sorry for stepping on your toe,"[sorry for, stepping on your toe]","[Undefined, A_pron_x]",2.0,p,25,"[0.12, 0.14]","[0.48, 0.56]","[2.89, 0.72]","[77.91, 97.03]","[0.02, 0.01]","[0.02, 0.01]"
1,02544.024.,2544,24,24,If I had more money,I'll buy a mansion,[I ll buy a mansion],[A_def],1.5,a,1,[0.22],[0.88],[0.52],[100.24],[0.01],[0.01]
187,01889.021.,1889,21,21,I just can't stand people who,..searching for how to respond...so sad for ab...,[children anyone so distressed by violence war...,"[Undefined, A_def, A_def, P_bevb_x, P_bevb_x]",6.0,p,187,"[0.25, 0.25, 0.14, 0.25, 0.14]","[1.0, 1.0, 0.56, 1.0, 0.56]","[7.59, 2.89, 3.65, 9.18, -2.62]","[56.7, 77.91, 84.9, 34.59, 119.19]","[0.02, 0.02, 0.01, 0.03, 0.0]","[0.03, 0.02, 0.02, 0.03, 0.0]"
101,02431.005.,2431,5,85,Loving other people,is what makes life real and meaningful. Commun...,"[is, what makes, life real and meaningful Comm...","[P_bevb_x, A_def, P_bevb_x, P_bevb_x]",4.0,p,101,"[0, 0.14, 0.25, 0.25]","[0.0, 0.56, 1.0, 1.0]","[-3.4, -3.01, 8.18, 12.52]","[121.22, 120.21, 50.67, 12.43]","[0.0, 0.0, 0.03, 0.04]","[0.0, 0.0, 0.03, 0.04]"
173,01804.007.,1804,7,38,My co-workers and I,move in a way that allows humanity-the univers...,"[in a way, that allows, humanity the universe ...","[Undefined, A_def, P_bevb_x]",5.5,a,173,"[0.14, 0.14, 0.25]","[0.56, 0.56, 1.0]","[-2.62, 2.89, 8.37]","[119.19, 77.91, 52.87]","[0.0, 0.02, 0.03]","[0.0, 0.02, 0.03]"


In [6]:
#Cross verify that they are correct
reading_ease = df['reading_ease_normalized'].tolist()
reading_ease = [j for i in reading_ease for j in i]
grade = df['grading_level_normalized'].tolist()
grade = [j for i in grade for j in i]
print(max(reading_ease), min(reading_ease), max(grade), min(grade))
df[['UID', 'survey_id', 'prompt_number', 'prompt_id', "prompt", "response", "clauses_text_final", "voice", "idx", "abstraction_score_normalized", "reading_ease_normalized", "grading_level_normalized"]].to_csv("readability_scored.csv", index = False)

1.0 0.0 1.0 0.0


### Introduction to Computing the clause's overall quality
This part determines how each clause adds importance to the overall intent of the sentence. To do this we evaluate keyword tuples (Usually an n-gram adds more value when compated to an individual token) of the original sentence using an unsupervised keyword extraction technique like SGRank (elaborated here: http://www.aclweb.org/anthology/S15-1013). The clauses that contain the n-grams are assigned the score of the n-gram as determined by SG Rank. The quality metric per clause is then determined as Sum(Sgrank values of tuples)/Total tuples with values.

This output is stored in "keyterm_scored.csv" which will be used to evaluate the final scores and voices.

In [8]:
from textacy.keyterms import sgrank
df['nlp_doc'] = df.apply(lambda row : textacy.Doc(row['prompt'] + " " + row['response'], lang = u'en_core_web_lg'), axis = 1) # . tjm changed lang = 'en'
df['sgrank'] = df['nlp_doc'].apply(lambda doc : sgrank(doc, n_keyterms = len(doc)))

def get_normalized_importance(df):
    clauses = df["clauses_text_final"]
    rank_tuples = dict(df['sgrank'])
    ngram_keys = rank_tuples.keys()
    op = []
    for clause in clauses:
        str_clause = "".join(clause)
        denominator = 0
        numerator = 0
        for x in ngram_keys:
            if x in str_clause:
                numerator += rank_tuples[x]
                denominator += 1
        op.append(round(numerator / denominator, 2) if denominator > 0 else 0.0)
    return op
    
df["sgrank_normalized"] = df.apply(get_normalized_importance, axis = 1)
df[['UID', 'survey_id', 'prompt_number', 'prompt_id', "prompt", "response", "clauses_text_final", "voice", 'score','PassAct', "idx", "abstraction_score_normalized", "reading_ease_normalized", "grading_level_normalized", "sgrank_normalized"]].to_csv("keyterm_scored.csv", index = False)
#tjm added above: 'score','PassAct'

print('done') # tjm

done
