# Exp013: Create prompts for grammar-controlled text generation tasks

In [1]:
import random
import os
import pandas as pd
import sys
sys.path.append('../source')
import models
import data
import api
import importlib
importlib.reload(data)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[nltk_data] Downloading package punkt to
[nltk_data]     /cluster/home/dglandorf/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data

<module 'data' from '/cluster/home/dglandorf/grammarctg/experiments/../source/data.py'>

In [16]:
# helper functions
levels = ["A1", "A2", "B1", "B2", "C1", "C2"] 
level_idx = {level: i for i, level in enumerate(levels)}

def skills(rules, statement=True):
     if statement: return "- " + rules['SubCategory'] + " - " + rules['guideword'] + ": " + rules['Can-do statement']
     return rules['SubCategory'] + " " + rules['guideword']
    
def get_prompt(rules, snippet, negative_rules=None, dialog=True, all=False, level=None, statement=True):
    sep = os.linesep if statement else "; "
    if level:
        constraints = f"grammar skills on CEFR level {level}."
    else: 
        exclude = f"""Do NOT apply the following grammar skills:
    {sep.join(skills(negative_rules, statement=statement))}
""" if negative_rules else ""
        constraints = f"""{"all" if all else "at least one"} of these grammar skills:
{sep.join(skills(rules, statement=statement))}
{exclude}"""
    return f"""Continue the {"dialog with one turn" if dialog else "text with two sentences"}, proving knowledge of {constraints}
Snippet:
{snippet}"""
    
def classify_sents(classifiers, rules, sentences, verbose=True):
    results = []
    for idx, rule in rules.iterrows():
        scores = models.probe_model(classifiers[rule['#']], sentences)
        hits = [sentences[i] for i, score in enumerate(scores[0]) if score > 0.5]
        if verbose & len(hits): print(f'\n{rule["Level"]}-{rule["Can-do statement"]} ({rule["#"]})\n{os.linesep.join(["- "+h for h in hits])}')
        results.append(len(hits)/len(sentences))
    return results

def sample_adjacent(lst, n=3):
    if len(lst) < n: raise ValueError("List must contain at least two items")
    index = random.randint(0, len(lst) - n)
    return lst[index:index+n]
    
def sample_dialog_snippet(dialogs):
    dialog = random.sample(dialogs, 1)[0]
    utterances = sample_adjacent(dialog)
    return os.linesep.join([("A" if (i%2==0) else "B") + ": " + utt for i, utt in enumerate(utterances + [""])])

def prompt_score(rules, snippet, classifiers, dialog, all, level=None, statement=True, verbose=False):
    prompt = get_prompt(rules, snippet, dialog=dialog, all=all, level=level, statement=statement)
    if verbose: print(prompt)
    response = api.get_openai_chat_completion([{ "role": "user", "content": prompt}])[0] #, model="gpt-4-0125-preview"
    if verbose: print("RESPONSE from GPT3.5")
    if verbose: print(response)
    return classify_sents(classifiers, rules, data.sent_tokenize(response), verbose)


Load EGP rules, topical prompts and dialogs

In [3]:
egp = data.get_egp()
cefr = data.CEFRTexts()
stories = list(cefr.get_beginnings(100))
ds = data.DialogSum()
dialogs = ds.get_dialogues()
classifiers = {nr: models.load_classifier(nr, 'corpus_training') for nr in [int(name.replace(".pth","")) for name in os.listdir(f"../models/corpus_training")]}

## Task 1: Single constraints

In [92]:
subcategories = ["would", "negation", "superlatives"]
level = "A1"
n_per_subcat=1
rules = egp[(egp['SubCategory'].isin(subcategories)) & (egp['Level']==level) & egp['#'].isin(classifiers.keys())].groupby("SubCategory").sample(n_per_subcat)

For Text

In [93]:
snippet = random.sample(stories, 1)[0]
prompt_score(rules, snippet, classifiers, dialog=False, all=True, verbose=True)

Continue the text with two sentences, proving knowledge of all of these grammar skills:
- negation - FORM: MODAL VERB 'CAN': Can use negative form 'can't'.
- would - USE: WISHES AND PREFERENCES WITH 'LIKE': Can use 'would like' to talk about wishes and preferences.

Snippet:

The first chart illustrates the percentage of the population who owned a smartphone from 2011 to 2016, and the second breaks the percentages down by age for 2011 and 2016.
RESPONSE from GPT3.5
However, it is important to note that the data for 2016 is not entirely accurate, as some respondents refused to answer the survey. Therefore, we can't fully rely on the statistics presented in the second chart.

A1-Can use negative form 'can't'. (1177)
- Therefore, we can't fully rely on the statistics presented in the second chart.


[0.5, 0.0]

For Dialog

In [94]:
snippet = sample_dialog_snippet(dialogs)
prompt_score(rules, snippet, classifiers, dialog=True, all=True, verbose=True)

Continue the dialog with one turn, proving knowledge of all of these grammar skills:
- negation - FORM: MODAL VERB 'CAN': Can use negative form 'can't'.
- would - USE: WISHES AND PREFERENCES WITH 'LIKE': Can use 'would like' to talk about wishes and preferences.

Snippet:
A: Can I take the subway to get there?
B: Yes, but that will probably take about half an hour. You should just take a taix.
A: Won't that be expensive?
B: 
RESPONSE from GPT3.5
Not necessarily, you can always share a ride with a friend or take a cheaper taxi service. Would you like me to look up some options for you?

A1-Can use 'would like' to talk about wishes and preferences. (618)
- Would you like me to look up some options for you?


[0.0, 0.5]

## Task 2: Combine subcategories

Choose subcategory and level skills

In [26]:
subcategories = ["would", "negation", "superlatives"]
levels = ["B1", "B2", "B1"]
rules = pd.concat([egp[(egp['SubCategory'] == subcat) & (egp['Level']==level) & (egp['#'].isin(classifiers.keys()))] for subcat, level in zip(subcategories, levels)]) 

Assemble prompt for a random story beginning, test it with GPT3.5 and use classifiers to score the response

In [17]:
snippet = random.sample(stories, 1)[0]
prompt_score(rules, snippet, classifiers, dialog=False, all=False, verbose=True, statement=False)

Continue the text with two sentences, proving knowledge of at least one of these grammar skills:
would FORM/USE: AFTER 'IF' CLAUSES; would FORM: PAST AFFIRMATIVE; would FORM: PAST NEGATIVE; would FORM: QUESTIONS; would FORM: WITH ADVERBS; would USE: FUTURE IN THE PAST; would USE: IMAGINED SITUATIONS IN THE PAST; would USE: INDIRECTNESS; would USE: POLITE REQUESTS; would USE: REPORTED SPEECH; would USE: WILLINGNESS IN THE PAST; negation FORM/USE: 'NOT', EMPHASIS; negation FORM/USE: 'NEVER', INVERTED FRONT POSITION, FOCUS; negation FORM/USE: 'NEITHER ... NOR'

Snippet:
Nobody knows which came first: the economic crisis tearing Greece apart or shisha, the drug now known as the “cocaine of the poor”.
RESPONSE from GPT3.5
If the economic crisis had not brought such devastation to Greece, perhaps the popularity of shisha would not have reached such heights. Would you happen to know where one can find reliable statistics on the correlation between the two phenomena?

B1-Can use 'would' in the

[0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.5, 0.5, 0.0, 0.0, 0.0, 0.5, 0.0, 0.0]

Do this with dialogs

In [27]:
snippet = sample_dialog_snippet(dialogs)
prompt_score(rules, snippet, classifiers, dialog=True, all=False, verbose=True, statement=False)

Continue the text with two sentences, proving knowledge of at least one of these grammar skills:
would FORM/USE: AFTER 'IF' CLAUSES; would FORM: PAST AFFIRMATIVE; would FORM: PAST NEGATIVE; would FORM: QUESTIONS; would FORM: WITH ADVERBS; would USE: FUTURE IN THE PAST; would USE: IMAGINED SITUATIONS IN THE PAST; would USE: INDIRECTNESS; would USE: POLITE REQUESTS; would USE: REPORTED SPEECH; would USE: WILLINGNESS IN THE PAST; negation FORM/USE: 'NOT', EMPHASIS; negation FORM/USE: 'NEVER', INVERTED FRONT POSITION, FOCUS; negation FORM/USE: 'NEITHER ... NOR'

Snippet:
A: Sir, I am very glad to tell that we have successfully registered the trademark for our new product. It is the time to think of some effective promoting strategies. We are beginning to get more attention from overseas.
B: Well done, Fred. Do you know something useful for our promotion for our I-series?
A: OK, Let me see. I suppose we must strengthen our promotion, because our brand is still new to some consumers. Maybe w

[0.0,
 0.0,
 0.0,
 0.3333333333333333,
 0.0,
 0.0,
 0.0,
 0.0,
 0.3333333333333333,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0]

## Task 3: All skills from one difficulty level

In [24]:
level = "A2"
rules = egp[(egp['Level']==level) & (egp['#'].isin(classifiers.keys()))]

In [89]:
snippet = random.sample(stories, 1)[0]
prompt_score(rules, snippet, classifiers, dialog=False, all=False, verbose=True, level=level)

Continue the text with two sentences, proving knowledge of grammar skills on CEFR level A2.
Snippet:
Martha is at the grocery store, getting ready for a house party.
RESPONSE from GPT3.5
She buys chips, soda, and other snacks for her guests. Then, she heads to the checkout to pay for her items.


[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

In [25]:
snippet = sample_dialog_snippet(dialogs)
prompt_score(rules, snippet, classifiers, dialog=True, all=False, verbose=True, level=level)

Continue the dialog with one turn, proving knowledge of grammar skills on CEFR level A2.
Snippet:
A: Sure, company rules are not very strict at this point.
B: Thank you for telling me that.
A: But remind you, there are some forbidden activities.
B: 
RESPONSE from GPT3.5
I understand. I will make sure to follow the rules carefully.


[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]