# GA Capstone
## Causal Model Evaluation

The goal here is to evaluate the causal model based on the classification model. To do so, I will generate text, reduced to the first sentence of each generated output, and run each sentence through the classification model. The goal is to have the most possible Shakespearean results (as the causal model is supposed to generate Shakespearean text). Score will be the percentage of Shakespearean results out of total results.

### Imports and Preliminaries

In [20]:
# tokenizer
from transformers import AutoTokenizer

# models
from transformers import TFAutoModelForCausalLM, TFAutoModelForSequenceClassification

# custom utilities
from utilities.utilities import load_config, get_model_path, load_model, load_tokenizer
from utilities.utilities import load_text_from_config
from utilities.utilities import generate_from
from utilities.utilities import classify_from
from utilities.utilities import extract_sentences

# pandas for csv read and extract
import pandas as pd

# other
import random
import os

In [2]:
# model config file
CONFIG_FILE = 'config.json'
cfgvars = load_config(CONFIG_FILE)

### Load Models and Model Support

In [3]:
# get model locations, load models, and load tokenizers
causal_model_path = get_model_path(CONFIG_FILE, 'causal')
class_model_path = get_model_path(CONFIG_FILE, 'class')

causal_model = load_model(causal_model_path, 'causal')
class_model = load_model(class_model_path, 'class')

causal_tokenizer = load_tokenizer(cfgvars['CAUSAL_MODEL'])
class_tokenizer = load_tokenizer(cfgvars['CLASS_MODEL'])

2022-10-16 21:03:32.332634: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-16 21:03:32.446166: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-16 21:03:32.446184: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-10-16 21:03:32.471282: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-16 21:03:33.163861: W tensorflow/stream_executor/platform/de

### Load and Prep Test Data

In [27]:
# get some test data to fuel generator
test_data = dict()

wines_test_data_path = os.path.join(cfgvars['DATA_DIR'], 'winemag-data-130k-v2.csv')
wines_test_data = pd.read_csv(wines_test_data_path)
wines_test_data = ' '.join(list(wines_test_data['description']))
wines_test_data = extract_sentences(wines_test_data)
test_data['wines'] = wines_test_data

s, o = load_text_from_config(cfgvars)
test_data['shakespeare'] = extract_sentences(s)
test_data['other'] = extract_sentences(o)

for v in test_data.values():
    print(v[:2])

['Aromas include tropical fruit, broom, brimstone and dried herb.', "The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity."]
['From fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decease, His tender heir might bear his memory:', 'But thou, contracted to thine own bright eyes, Feed’st thy light’s flame with self-substantial fuel, Making a famine where abundance lies, Thyself thy foe, to thy sweet self too cruel:']
['Lift up your hearts in Gumber, laugh the Weald And you my mother the Valley of Arun sing.', 'Here am I homeward from my wandering Here am I homeward and my heart is healed.']


In [36]:
# FORMAT TEST DATA

# fragment ratio for prompt selection
FRAG_RAT = 0.4

# sample size
SAMPLES = 10

# function to extract fragment from sentence
def get_frag(text, rat=0.2):
    words = text.split()
    nwords = len(words)
    if not rat:
        nout = nwords
    elif rat < 1:
        nout = int(nwords * rat) or 1
    else:
        if rat > nwords: rat = nwords
        nout = rat
        
    return ' '.join(words[:nout])

def get_samples(data, samples = 10):
    sampled = random.sample(data, samples)
    return [get_frag(sentence, FRAG_RAT) for sentence in sampled]

# create samples
samples = {k: get_samples(v, SAMPLES) for k, v in test_data.items()}

for v in samples.values():
    print(v[:2])

['There is solid acidity at the tip of the sip,', 'Shows flavors of grilled pineapples,']
['O', 'Go, my dread lord,']
['So jolly, that it can move, this soule is, The body so free', 'Peter Bells, one, two']


### Generate Text from Test Data

In [38]:
# GENERATE TEXT

# maximum lines of ouput per category
MAX_OUTPUT_LINES = 10

# generate and store
generated = {k: list() for k in samples.keys()}
for k,v in samples.items():
    count = 0
    print(f'\n*--- {k} ---*')
    for line in v:
        count += 1
        if count <= MAX_OUTPUT_LINES: print(f'Input: {line}')
        gs = extract_sentences(generate_from(line, causal_model, causal_tokenizer))[0]
        if count <= MAX_OUTPUT_LINES: print(f'> Output: {gs}')
        generated[k].append(gs)

There is solid acidity at the tip of the sip,
There is solid acidity at the tip of the sip, A little overspotted with excess.
Shows flavors of grilled pineapples,
Shows flavors of grilled pineapples, With that bitter eye-tear they taste?
It is powered by its tight
It is powered by its tightness but most of its simplicity.
Bright cranberry and cherry fruit
Bright cranberry and cherry fruit, sweet white rose, Do not stain thee with this tree.
From Lynch-Bages in Pauillac, this
From Lynch-Bages in Pauillac, this lady’s man is here.
This is a round,
This is a round, very old, and full of dankness.
This wine is ripe and full,
This wine is ripe and full, for his father’s death I am sure it was my youth.
It's fun to try this wine
It's fun to try this wine out.
Citrus, apple and a hint
Citrus, apple and a hint of pride in your mouth!
The finish continues to
The finish continues to meet.
O
O, you hadst thou been my lord and brother a man?
Go, my dread lord,
Go, my dread lord, take thy husband p

{'wines': ['There is solid acidity at the tip of the sip, A little overspotted with excess.',
  'Shows flavors of grilled pineapples, With that bitter eye-tear they taste?',
  'It is powered by its tightness but most of its simplicity.',
  'Bright cranberry and cherry fruit, sweet white rose, Do not stain thee with this tree.',
  'From Lynch-Bages in Pauillac, this lady’s man is here.',
  'This is a round, very old, and full of dankness.',
  'This wine is ripe and full, for his father’s death I am sure it was my youth.',
  "It's fun to try this wine out.",
  'Citrus, apple and a hint of pride in your mouth!',
  'The finish continues to meet.'],
 'shakespeare': ['O, you hadst thou been my lord and brother a man?',
  'Go, my dread lord, take thy husband presently And tell him I am near here.',
  'No, truly, not for the world.',
  'He shall make thee know all in thine own good.',
  'Nay, this is not the hour But I shall find my master with mine.',
  'Once, if he do require mercy or break 

# Classify and Score Generated Text

In [43]:
# CLASSIFY
for k, v in generated.items():
    results = classify_from(v, class_model, class_tokenizer)
    shakespearean_ratio = sum(results.c) / len(results.c)
    score_mean = sum(results.s) / len(results.s)
    
    print(f'\n*--- {k} ---*')
    print(f'Shakespearean: {shakespearean_ratio * 100}%')
    print(f'Mean score: {score_mean}')


*--- wines ---*
Shakespearean: 70.0%
Mean score: 0.5816272295080125

*--- shakespeare ---*
Shakespearean: 90.0%
Mean score: 0.8752521067857743

*--- other ---*
Shakespearean: 60.0%
Mean score: 0.5666804890148341


### Conclusion

Maybe it's to be expected, but Shakespearean input generates the most Shakespearean output, as evidenced by the higher positive classifications and mean score for the Shakespeare input text. Overall, all generated text is on average more than 50% Shakespearean.

Future tests could try different sized sentence fragments, to see if the causal model generate more Shakespearean text with more or fewer words in the input prompt.