# Exp013: Create prompts for generating grammar-controlled text

In [2]:
import random
import os
import sys
sys.path.append('../source')
import models
import data
import api
import importlib
#importlib.reload(data)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[nltk_data] Downloading package punkt to
[nltk_data]     /cluster/home/dglandorf/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data

<module 'data' from '/cluster/home/dglandorf/grammarctg/experiments/../source/data.py'>

In [7]:
# helper functions
levels = ["A1", "A2", "B1", "B2", "C1", "C2"] 
level_idx = {level: i for i, level in enumerate(levels)}

def skills(rules, levels):
    level_rules = rules[rules['Level'].isin(levels)]
    return "- " + subcategory + " - " + level_rules['guideword'] + ": " + level_rules['Can-do statement']
    
def get_prompt(rules, idx, snippet, negative=False):
    exclude = f"""Do NOT apply the following grammar skills:
{os.linesep.join(skills(rules, levels[idx+1:]))}""" if negative else ""
    return f"""Continue the snippet proving knowledge of at least one of these grammar skills:
{os.linesep.join(skills(rules, [levels[idx]]))}
{exclude}
Snippet:
{snippet}"""

def get_sentences(prompt, response):
    init_sent = prompt[prompt.rfind(":")+1:].strip()
    preamble = (init_sent + " " if not init_sent[-1]=="." else "") if len(init_sent) else ""
    return [preamble + data.sent_tokenize(response)[0].strip()] + data.sent_tokenize(response)[1:]
    
def classify_sents(classifiers, rules, sentences, hide_below=0):
    for level, skill, (nr, classifier) in zip(rules['Level'], rules['Can-do statement'], classifiers.items()):    
        if level_idx[level] < hide_below: continue
        scores = models.probe_model(classifiers[nr], sentences)
        hits = [sentences[i] for i, score in enumerate(scores[0]) if score > 0.5]
        if len(hits): print(f'\n{level}-{skill} ({nr})\n{os.linesep.join(["- "+h for h in hits])}')

def sample_adjacent(lst, n=3):
    if len(lst) < n: raise ValueError("List must contain at least two items")
    index = random.randint(0, len(lst) - n)
    return lst[index:index+n]
    
def get_dialog_snippet(utterances):
    return os.linesep.join([("A" if (i%2==0) else "B") + ": " + utt for i, utt in enumerate(utterances + [""])])

def prompt_score(snippet, level):
    prompt = get_prompt(rules, level, snippet)
    #print(f"{snippet} (total words in prompt: {len(prompt.split(' '))})")
    response = api.get_openai_chat_completion([{ "role": "user", "content": prompt}])[0]
    print(response)
    sentences = get_sentences(prompt, response)
    classify_sents(classifiers, rules, sentences, level)
    

In [3]:
egp = data.get_egp()

In [6]:
egp['SubCategory'].nunique()

86

Load EGP rules, topical prompts and dialogs

In [8]:
egp = data.get_egp()
cefr = data.CEFRTexts()
stories = list(cefr.get_beginnings(100))
ds = data.DialogSum()

Choose subcategory and load necessary classifiers

In [9]:
subcategory = "would"
rules = egp[(egp['SubCategory']==subcategory)]
classifiers = {nr: models.load_classifier(nr, 'corpus_training') for nr in rules['#']}

Assemble prompt for a random story beginning, test it with GPT3.5 and use classifiers to score the response

In [270]:
snippet = random.sample(stories, 1)[0]
print(snippet)
for level, idx in level_idx.items():    
    print(f"Target: {level}")
    prompt_score(snippet, idx)
    print("X" * 75)

The small space is set up to look like a classroom. Its corrugated iron walls are hung with educational
Target: A1
 posters and there are tiny desks and chairs scattered around. On one wall, a chart displays the alphabet in colorful letters. The teacher, Miss Jenkins, stands at the front of the room with a warm smile, welcoming the children. "Would you like to come sit on the carpet for story time?" she asks, inviting them to gather around. The children nod eagerly, expressing their wishes and preferences to hear the tale she has prepared for them.

A1-Can use 'would like' to talk about wishes and preferences. (618)
- "Would you like to come sit on the carpet for story time?"

A2-Can use the question form 'would you like'. (621)
- "Would you like to come sit on the carpet for story time?"
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Target: A2
 posters, and a row of tiny desks fill the center. The teacher, a young woman with a warm smile, stands at the fr

Do this with dialogs

In [10]:
dialog = random.sample(ds.get_dialogues(), 1)[0]
utterances = sample_adjacent(dialog)
snippet = get_dialog_snippet(utterances)
for level, idx in level_idx.items():
    print(f"Target: {level}")
    prompt_score(snippet, idx)
    print("X"*75)

Target: A1
Nice to meet you too! Would you like to go grab some coffee and chat for a bit?

A1-Can use 'would like' to talk about wishes and preferences. (618)
- Would you like to go grab some coffee and chat for a bit?

A2-Can use the question form 'would you like'. (621)
- Would you like to go grab some coffee and chat for a bit?
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Target: A2
Nice to meet you too! Would you like to go get some food together?
A: That would be great! I would love to try some Canadian cuisine.
B: How about poutine? It would be a classic Canadian dish to start with.
A: That sounds delicious! I would definitely like to try it.

A2-Can use the affirmative form. (619)
- A: That would be great!
- I would love to try some Canadian cuisine.
- It would be a classic Canadian dish to start with.
- I would definitely like to try it.

A2-Can use the question form 'would you like'. (621)
- Would you like to go get some food together?

A2-Can u

In [21]:
prompt = get_prompt(rules, 2, snippet)

In [22]:
print(prompt)

Continue the snippet proving knowledge of at least one of these grammar skills:
- would - FORM/USE: AFTER 'IF' CLAUSES: Can use 'would' in the main clause of a conditional sentence to talk about an imagined situation, often in the context of advice or opinion-giving.
- would - FORM: PAST AFFIRMATIVE: Can use 'would have' + '-ed'.
- would - FORM: PAST NEGATIVE: Can use 'would not have' + '-ed' or 'wouldn’t have' + '-ed'
- would - FORM: QUESTIONS: Can use question forms.
- would - FORM: WITH ADVERBS: Can use an limited range of adverbs with 'would', including 'really', 'probably', 'certainly', 'definitely'.► adverbs
- would - USE: FUTURE IN THE PAST: Can use 'would' to talk about the future in the past.
- would - USE: IMAGINED SITUATIONS IN THE PAST: Can use 'would' to talk about imagined situations in the past. ► conditionals
- would - USE: INDIRECTNESS: Can use 'would' with verbs such as 'advise', 'imagine', 'recommend', 'say' to be less direct.
- would - USE: POLITE REQUESTS: Can use 