In [1]:
import numpy as np 
import pandas as pd 
from textblob import TextBlob
from sklearn.metrics import mean_squared_error

# Appendix F - Text Evaluation

-----------

### Performance Evaluation
After generating summaries using a number of different methods, one thing that is crucial to any Machine Learning task is performance evaluation, i.e. generating a numerical score that is indicative of how well the model is doing at it's given task. Several methods exist to evaluate text output. However, there is no 'standard' when it comes to a performance test. 

The need for a meaningful score is essential for any machine learning system for many reasons, including:

* Compare different models
* Evaluate the increase/decrease in perfromance when a change is implemented in our system.

--------------

### BLEU
One of the most commonly used tests is the *BLEU* (Bilingual Evaluation Understudy) is commongly used for evaluating sequence to sequence tasks. However, it does not do well as an evalautor when applied to tasks that it was never intended to evaluate. Among BLEU's drawbacks are:

* It does not consider meaning
* It doesn not consider sentence structure
* It doesn't map well to human Judgements.

### Sentiment Analysis
Taking into account the drawbacks that exist for BLEU, We decided to opt for sentiment analysis as a perfomance metric. Though not a widely used evalation metric, it seems intuitive that a generated text should be a *true* representation of the original text in terms of essence, i.e. polarity and subjectivity. In other words, the gist of the original text should be represented in the generated summary. Moreover, considering the different models in this project having a model which enables us to compare the different models.

Several libraries exist that are capable of performing senetence analysis, for our task we opted to go for TextBlob's sentiment feature. Which gives a score of (-1, 1) for polarity and (0, 1) for subjectivity. After generating a score for both the original text and generated summary we compute the *MSE*, to measure the deviation of the generated scores from the actual scores.



--------------------

## Supervised Learning

In [2]:
## Read in Data 
sentiment_df = pd.read_parquet("../cases_final_nc_cleaned_v1.parquet.gzip", engine="fastparquet")

sentiment_df.drop(columns=['text', 'summary'], inplace = True)
sentiment_df.head()

Unnamed: 0_level_0,cleaned_text,cleaned_summary
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1268383,majority of the court therefore being of the ...,murder in an indictment for murder the length ...
11272108,walker the grievance alleged by the plaintiff...,1 carriers of goods placing of cars understand...
11273534,hoke on the hearing it appeared that bailey a...,1 deeds and conveyances descriptions—reference...
11272694,hoke the facts pertinent to the inquiry and s...,1 wills devises—contingent limitations sales—r...
11272573,walker after stating tbe facts as above tbe p...,1 constitutional law amendments roads and high...


In [3]:
sentiment_df = sentiment_df.reset_index(drop=True)
sentiment_df.head()

Unnamed: 0,cleaned_text,cleaned_summary
0,majority of the court therefore being of the ...,murder in an indictment for murder the length ...
1,walker the grievance alleged by the plaintiff...,1 carriers of goods placing of cars understand...
2,hoke on the hearing it appeared that bailey a...,1 deeds and conveyances descriptions—reference...
3,hoke the facts pertinent to the inquiry and s...,1 wills devises—contingent limitations sales—r...
4,walker after stating tbe facts as above tbe p...,1 constitutional law amendments roads and high...


### 1. Sentiment & Polarity

In [4]:
%%time
## Add polarity (measures the positivity or negativity of text on -1.0-to-1.0 scale)
sentiment_df['polarity_text'] = sentiment_df['cleaned_text'].apply(lambda x: TextBlob(x).sentiment.polarity)
sentiment_df['polarity_summary'] = sentiment_df['cleaned_summary'].apply(lambda x: TextBlob(x).sentiment.polarity)

## Add Subjectivity (measures how objective or subjective a text is on a 0.0-to-1.0 scale (greater = more subjective))
sentiment_df['subjectivity_text'] = sentiment_df['cleaned_text'].apply(lambda x: TextBlob(x).sentiment.subjectivity)
sentiment_df['subjectivity_summary'] = sentiment_df['cleaned_summary'].apply(lambda x: TextBlob(x).sentiment.subjectivity)

sentiment_df.head()

CPU times: user 23min 24s, sys: 132 ms, total: 23min 24s
Wall time: 23min 24s


Unnamed: 0,cleaned_text,cleaned_summary,polarity_text,polarity_summary,subjectivity_text,subjectivity_summary
0,majority of the court therefore being of the ...,murder in an indictment for murder the length ...,0.127273,0.0,0.270909,0.0
1,walker the grievance alleged by the plaintiff...,1 carriers of goods placing of cars understand...,0.100858,0.063244,0.455662,0.478571
2,hoke on the hearing it appeared that bailey a...,1 deeds and conveyances descriptions—reference...,0.067789,0.077041,0.353571,0.372109
3,hoke the facts pertinent to the inquiry and s...,1 wills devises—contingent limitations sales—r...,0.142308,0.075833,0.395147,0.318333
4,walker after stating tbe facts as above tbe p...,1 constitutional law amendments roads and high...,0.079642,0.184774,0.383253,0.399436


### 2. MSE

In [5]:
## Calculate MSE
true_polarity = sentiment_df['polarity_text'].values
summary_polarity = sentiment_df['polarity_summary'].values

true_subjectivity = sentiment_df['subjectivity_text'].values
summary_subjectivity = sentiment_df['subjectivity_summary'].values

print("Polarity MSE: ", round(mean_squared_error(true_polarity, summary_polarity), 6))
print("\nSubjectivity MSE: ", round(mean_squared_error(true_subjectivity, summary_subjectivity), 6))

Polarity MSE:  0.015543

Subjectivity MSE:  0.023467


## Unsupervised

In [2]:
## Read in Data
unsupervisied_df = pd.read_parquet("../cases_summary_unsupervised.parquet.gzip", engine="fastparquet")
unsupervisied_df = unsupervisied_df[['opinion_text', 'opinion_text_summary']]

unsupervisied_df = unsupervisied_df.sample(frac=0.01, replace=True, random_state=1)
unsupervisied_df.head()

Unnamed: 0_level_0,opinion_text,opinion_text_summary
id,Unnamed: 1_level_1,Unnamed: 2_level_1
11640036,\nOPINION OF THE COIÍRT. This is an action of ...,This is an action of debt brought by the appel...
11638634,\nOPINION OP THE COURT. This is an appeal from...,\nOPINION OP THE COURT. This is an appeal from...
11641817,"\nCROSS, Judge.\nThe record in this case shows...",The writ of error is prosecuted to reverse the...
243503,"\nW. H.“Dub” Arnold, Chief Justice.\nThis is a...","In the trial court, appellant filed a motion t..."
243557,"\nRay Thornton, Justice.\nAppellant brings thi...",Hammon took possession of the loaded gun that ...


### 1. Sentiment & Polarity

In [None]:
%%time
## Add polarity (measures the positivity or negativity of text on -1.0-to-1.0 scale)
unsupervisied_df['polarity_text'] = unsupervisied_df['opinion_text'].apply(lambda x: TextBlob(x).sentiment.polarity)
unsupervisied_df['polarity_summary'] = unsupervisied_df['opinion_text_summary'].apply(lambda x: TextBlob(x).sentiment.polarity)

## Add Subjectivity (measures how objective or subjective a text is on a 0.0-to-1.0 scale (greater = more subjective))
unsupervisied_df['subjectivity_text'] = unsupervisied_df['opinion_text'].apply(lambda x: TextBlob(x).sentiment.subjectivity)
unsupervisied_df['subjectivity_summary'] = unsupervisied_df['opinion_text_summary'].apply(lambda x: TextBlob(x).sentiment.subjectivity)

unsupervisied_df.head()

### 2. MSE

In [None]:
## Calculate MSE
true_polarity = unsupervisied_df['polarity_text'].values
summary_polarity = unsupervisied_df['polarity_summary'].values

true_subjectivity = unsupervisied_df['subjectivity_text'].values
summary_subjectivity = unsupervisied_df['subjectivity_summary'].values

print("Polarity MSE: ", round(mean_squared_error(true_polarity, summary_polarity), 6))
print("\nSubjectivity MSE: ", round(mean_squared_error(true_subjectivity, summary_subjectivity), 6))

--------------
## Conclusion:
Although the MSE scores are very encouraging we should nonetheless be cautious as to the significance of this score and recreate the same test using different libraries