# Loading the `Moral-Stories` dataset
***
The dataset and code can be found <a href="https://github.com/demelin/moral_stories">here</a>.\
The authors provide 12k unique norms and, for some reason, additional 700k variations of the same norms, just with NaN fields every now and then. Zero additional information, but maybe I am overlooking something here?
* Might be for different tasks? But then they only provide a single label which is always 1 for any NaN rows...

# Sample task: Action classification
***
For starters, let's reproduce a task from the paper:
* Given an action, predict whether it is moral or immoral.
* For simplicity, we do not use the splits introduced in the paper, but rather random splitting

We start by loading the data as a `pandas.DataFrame`:

In [1]:
from ailignment.datasets.moral_stories import get_moral_stories, make_action_classification_dataframe
from ailignment.datasets import get_accuracy_metric, join_sentences, tokenize_and_split
from transformers import TrainingArguments
import pandas as pd
import datasets
import transformers
from ailignment import sequence_classification

transformers.logging.set_verbosity_warning()

dataframe = get_moral_stories()

In [2]:
from collections import Counter

import spacy
from spacy import displacy
from tqdm import tqdm
import pandas as pd

nlp = spacy.load("en_core_web_sm")
# run everything through spacy for part of speech tags
dataframe["norm_parsed"] = dataframe["norm"].apply(nlp)

In [4]:
def get_pos_until_verb(doc):
    '''
    Returns a tuple of Part-of-speech tags up until the first verb
    Returns ("EMPTY") if no VERB is present
    '''
    l = [d.pos_ for d in doc]
    if "VERB" not in l: return ("EMPTY",)
    return tuple(l[:l.index("VERB")])

In [5]:
dataframe["norm_pos"] = dataframe["norm_parsed"].apply(lambda x: [d.pos_ for d in x])

In [6]:
dataframe["pos_verb"] = dataframe["norm_parsed"].apply(get_pos_until_verb)
pos_verb = dataframe.groupby("pos_verb")

In [61]:
s = pos_verb.count().sort_values("ID",ascending=False)
#pos_verb.get_group(s.index[2])

In [62]:
e = s["ID"].cumsum()/s["ID"].sum()


In [174]:
pos_verb.get_group(s.index[0])

Unnamed: 0,ID,norm,situation,intention,moral_action,moral_consequence,immoral_action,immoral_consequence,norm_parsed,norm_pos,pos_verb
7222,3TVRFO09GLDUXBWS12SEEPJ2264LXJ,If your job is preparing food you should wear ...,Sally works in the kitchen at a diner and is c...,Sally wants to create a meal.,Sally cuts the chicken wearing gloves and afte...,Sally is able to make a healthy meal that the ...,Sally holds the chicken with her bare hand and...,Sally cross contaminates the vegetables with c...,"(If, your, job, is, preparing, food, you, shou...","[SCONJ, PRON, NOUN, AUX, VERB, NOUN, PRON, AUX...","(SCONJ, PRON, NOUN, AUX)"


# Untying the *is* from the *ought*
***
We are not to come up with new norms. Rather, we like to get rid of the usually ambiguous normative judgements. Hence, we aim to split the moral value assigned to a norm from the situation it applies to. Checking for norm violation then reduces to other well-known problems such as question answering or textual entailment.

**TODO:** Surely there are smart names for the two categories of *situations that are judge-worthy* and their corresponding moral values.

We look for the most common ways that norms are phrased:
* "You should" or "You shouldn't"
* "It's" or "It is" followed by good, bad, etc.

Then, we want to translate them into "Did you ....?" and "Well, that's ..." pieces
* E.g. "You should back up so you can let people park." becomes
    * "Did you backup so you can let people park?"
    * "Yes" -> "Good"
    * "No" -> "Well, you should have!"

The norms only give value for either violation or adherence, but rarely both. We mirror the sentiment of the judgement by simple negation:
* "It is bad to hurt people"
    * "Did you hurt people?"
    * "Yes" --> "bad"
    * "No" --> "not bad"

Note, that we refrain from translating "not bad" to "good", since norms often aren't morally symmetric, i.e. not every norm has equally strong normativity for violation and adherence. E.g. positively committing a crime is punished much more explicitly than is not committing it. 

In [60]:
norms = dataframe["norm"].apply(str.lower)

f_should = lambda x: x.startswith("you should ") and not x.startswith("you should not")
shoulds = dataframe[norms.apply(f_should)]
print("\"You should\":", len(shoulds))

f_shouldnt = lambda x: x.startswith("you shouldn") or x.startswith("you should not")
shouldnts = dataframe[norms.apply(f_shouldnt)]
print("\"You shouldnt\":", len(shouldnts))

f_its = lambda x: x.startswith("it's") or x.startswith("it is")
its = dataframe[norms.apply(f_its)]
print("\"It's\"", len(its))

total = len(its)+len(shoulds)+len(shouldnts)
print("Covered", total ,"of",len(dataframe), f"({100*total/len(dataframe):.2f}%) norms")

"You should": 741
"You shouldnt": 2302
"It's" 8087
Covered 11130 of 11999 (92.76%) norms


### Untying `you should` sentences
***
First, find the most common grammatical structures:

In [91]:
common_pos = shoulds["norm_pos"].apply(lambda x: tuple(x[:3]))
common_pos = shoulds.groupby(common_pos)
most_common = common_pos["ID"].count().sort_values(ascending=False)
most_common

norm_pos
(PRON, AUX, VERB)    564
(PRON, AUX, ADV)     173
(PRON, AUX, AUX)       3
(PRON, AUX, PRON)      1
Name: ID, dtype: int64

Only two seem to be of relevance:
* "You should VERB"
    * You should help the ones in need
* "You should ADV"
    * You should always, You should never

While the first is straightforward to de-value, the latter at least requires more sophisticated reasoning capabilities concerning quantification and negation of cases.

For now, we simply strip the initial "You should" from the former, and the "You should ADV" from the latter to obtain the judgements.

In [114]:
# do first group of (PRON, AUX, VERB)
g0 = common_pos.get_group(("PRON","AUX","VERB")).copy()
g0["norm_devalued"] = g0["norm_parsed"].apply(lambda x: x[2:].text)
g0["value"] = "You should"

# second group of (PRON, AUX, ADV)
g1 = common_pos.get_group(("PRON","AUX","ADV")).copy()
g1["norm_devalued"] = g1["norm_parsed"].apply(lambda x: x[3:].text)
g1["value"] = g1["norm_parsed"].apply(lambda x: "You should " + x[2].text)

shoulds_devalued = pd.concat([g0,g1])

### Untying `you should not` sentences
***
First, find the most common grammatical structures:

In [116]:
common_pos = shouldnts["norm_pos"].apply(lambda x: tuple(x[:4]))
common_pos = shouldnts.groupby(common_pos)
most_common = common_pos["ID"].count().sort_values(ascending=False)
most_common

norm_pos
(PRON, AUX, PART, VERB)     2236
(PRON, AUX, PART, ADV)        35
(PRON, AUX, PART, AUX)        19
(PRON, AUX, PART, ADJ)         3
(PRON, AUX, PART, ADP)         3
(PRON, AUX, PART, NOUN)        3
(PRON, AUX, PART, PART)        1
(PRON, AUX, PART, PUNCT)       1
(PRON, AUX, VERB, NOUN)        1
Name: ID, dtype: int64

Only one seems to be of relevance:
* "You should not VERB"
    * You should not spit in someone's face

Similarly to the "You should" cases, we strip "You should not" as the value-judgement.

In [119]:
shouldnts_devalued = common_pos.get_group(("PRON","AUX","PART", "VERB")).copy()
shouldnts_devalued["norm_devalued"] = shouldnts_devalued["norm_parsed"].apply(lambda x: x[3:].text)
shouldnts_devalued["value"] = "You should not"

### Untying `it is` sentences
***
First, find the most common grammatical structures up until the first verb:

In [133]:
#common_pos = its["norm_pos"].apply(lambda x: tuple(x[:5]))
common_pos = its.groupby("pos_verb")
most_common = common_pos["ID"].count().sort_values(ascending=False)
most_common[:20]

pos_verb
(PRON, AUX, ADJ, PART)                    6518
(PRON, AUX, PART, ADJ, PART)               373
(PRON, AUX, ADJ, PART, PART)               344
(PRON, AUX)                                236
(PRON, AUX, ADJ, PART, ADV)                105
(EMPTY,)                                   101
(PRON, AUX, ADJ, PART, AUX)                 73
(PRON, AUX, ADJ, ADP, NOUN, PART)           49
(PRON, AUX, ADV, ADJ, PART)                 35
(PRON, AUX, PART, ADJ, PART, PART)          20
(PRON, AUX, ADJ, ADV, NOUN)                 16
(PRON, AUX, ADJ, ADP)                       14
(PRON, AUX, ADJ, ADP, DET, NOUN, PART)      13
(PRON, AUX, NOUN, PART)                     10
(PRON, AUX, ADJ, NOUN, PART)                 9
(PRON, AUX, PART)                            9
(PRON, AUX, PART, ADJ, PART, AUX)            8
(PRON, AUX, ADJ)                             6
(PRON, AUX, ADJ, ADP, PRON, PART)            5
(PRON, AUX, ADJ, CCONJ, ADJ, PART)           5
Name: ID, dtype: int64

The situation is a little more complex here, it seems:
1. "It is ADJ to", It's wrong to become addicted to gambling.
    * Value: ADJ, strip until first verb
2. "It is not ADJ to", It's not okay to invade someone else's privacy.
    * Like 1.
3. (PRON, AUX, ADJ, PART, PART). Two subgroups, which are both currently ignored
    1. "It is ADJ not to"
    2. "It is ADJ to not"

Similarly to the "You should" cases, we strip "You should not" as the value-judgement.

In [142]:
groups = []
# (PRON, AUX, ADJ, PART)
g = common_pos.get_group(("PRON","AUX","ADJ", "PART")).copy()
g["norm_devalued"] = g["norm_parsed"].apply(lambda x: x[4:].text)
g["value"] = g["norm_parsed"].apply(lambda x: x[2].text)
groups.append(g)

In [152]:
# (PRON, AUX, PART, ADJ, PART)
g = common_pos.get_group(("PRON","AUX","PART","ADJ", "PART")).copy()
g["norm_devalued"] = g["norm_parsed"].apply(lambda x: x[5:].text)
g["value"] = g["norm_parsed"].apply(lambda x: x[2:4].text)
groups.append(g)

In [165]:
# (PRON, AUX, ADJ, PART, PART)
g = common_pos.get_group(("PRON","AUX","ADJ","PART", "PART")).copy()
g["norm_devalued"] = g["norm_parsed"].apply(lambda x: x[5:].text)
g["value"] = g["norm_parsed"].apply(lambda x: x[2:4].text)
# currently ignored

In [162]:
# (PRON, AUX)
g = common_pos.get_group(("PRON","AUX")).copy()
g["norm_devalued"] = g["norm_parsed"].apply(lambda x: x[5:].text)
g["value"] = g["norm_parsed"].apply(lambda x: x[2:4].text)
# currently ignored

In [166]:
# finally, stitch everything together
its_devalued = pd.concat(groups)

# Imperatives and values
***
By splitting up value-judgements from the norms we end up with imperatives describing what the action looks like and the value one should expect to receive from it.

In [171]:
dataframe_devalued = pd.concat([shoulds_devalued, shouldnts_devalued, its_devalued])
total_split = len(dataframe_devalued)
print("After devaluation, we cover", total_split ,"of",len(dataframe), f"({100*total_split/len(dataframe):.2f}%) norms")

After devaluation, we cover 9864 of 11999 (82.21%) norms


In [178]:
dataframe_devalued[["moral_action","norm_devalued","value"]].sample(10)

Unnamed: 0,moral_action,norm_devalued,value
5285,Douglas picks up the new game and purchased it...,steal things from a business.,wrong
8933,Anders talks to his primary care doctor about ...,take Xanax if it wasn't prescribed to you.,You should not
10629,John found a restroom in a shop where he was b...,expose yourself to strangers.,You should not
6605,Martin decides to tackle some really difficult...,drink yourself to excess.,not okay
4047,Matt takes a seat at table with some friendly ...,eat with other people.,good
6624,Tom deescalates the situation by apologizing a...,shoot arrows at people.,bad
2395,Jonny tells his friends to start the movie wit...,expect people to wait for you.,You should not
6815,Jim decides to leave the kids alone and let th...,try and ruin someone else good time.,wrong
1969,John avoids saying anything and cautiously dri...,yell at an elder.,rude
8633,John tells his son that these things happen an...,get mad at your family.,You should not


In [None]:
results = []
for wd in [0,0.01,0.05,0.1]:
    for lr in [1e-5, 5e-5]:
        test_split = 0.2
        batch_size = 12
        model = "distilbert-base-uncased"
        #model = "albert-base-v2"
        action_dataframe = make_action_classification_dataframe(dataframe)
        input_columns = ["action"]
        action_dataframe["task_input"] = join_sentences(action_dataframe, input_columns, " ")
        dataset = datasets.Dataset.from_pandas(action_dataframe)
        dataset = dataset.train_test_split(test_size=test_split)

        def data_all(tokenizer):
            return tokenize_and_split(dataset, tokenizer, "task_input")
        def data_small(tokenizer):
            train, test = data_all(tokenizer)
            train = train.shuffle(seed=42).select(range(1000))
            test = test.shuffle(seed=42).select(range(1000))
            return train, test

        training_args = TrainingArguments(
            output_dir="results/",
            num_train_epochs=3,
            per_device_train_batch_size=batch_size,
            per_device_eval_batch_size=batch_size,
            warmup_steps=500,
            weight_decay=wd,
            learning_rate=lr,
            logging_dir='./logs',
            logging_steps=500,
            save_steps=1000000,
            save_total_limit=0,
            evaluation_strategy="epoch",
        )

        r = sequence_classification(data_all, model, training_args, get_accuracy_metric())
        acc = [x["eval_accuracy"] for x in r if "eval_accuracy" in x]
        results.append(r)
        print(wd, lr, acc)

In [None]:
[[x["eval_accuracy"] for x in r if "eval_accuracy" in x] for r in results]

# WIP: Get score output from LM
***
Question: Is there a better way to sample from generated LM outputs?

In [None]:
from transformers import (
    AutoModelForSequenceClassification, DistilBertTokenizerFast,
     Trainer, TrainingArguments, AutoModelWithLMHead, AutoTokenizer,
)
import torch

model = "distilbert-base-uncased"
model = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelWithLMHead.from_pretrained(model)

prompt = "Today the weather is really nice and I am planning on "
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")

prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
outputs = model.generate(inputs, max_length=250, do_sample=False, top_p=0.95, top_k=60,
                        return_dict_in_generate=True, output_attentions=False,
                        output_hidden_states=True, output_scores=True)
#generated = prompt + tokenizer.decode(outputs[0])[prompt_length:]

p = torch.softmax(outputs.scores[0], dim=1)

print(p.max())

# WIP: Data augmentation with NER
***
Idea: Use Named entity recognition to find and replace persons etc. 

In [None]:
from ailignment.datasets.moral_stories import get_moral_stories, make_action_classification_dataframe
from ailignment import join_sentences, tokenize_and_split, get_accuracy_metric
dataframe = get_moral_stories()
columns = dataframe.columns[1:]
print("Running NER on columns", columns.to_list())

In [None]:
texts = join_sentences(dataframe ,columns, "\n")

In [None]:
import spacy
from spacy import displacy
from tqdm import tqdm
import pandas as pd

nlp = spacy.load("en_core_web_sm")

ner_pipe = nlp.pipe(tqdm(texts), disable=['tagger', 'parser', 'attribute_ruler', 'lemmatizer'])
docs = [x for x in ner_pipe]

displacy.render(docs[0], style="ent")

In [None]:
from collections import Counter

def get_frequent_entity(doc, entity="PERSON", n=1):
    '''
    Returns the highest number of occurences of an
    entity in the NER doc.
    '''
    occurences = [(x.text, x.label_) for x in doc.ents if x.label_ == entity]
    c = Counter(occurences)
    ents = []
    for item, count in c.most_common(n):
        ents.append([x for x in doc.ents if (x.text, x.label_) == item])
    
    if n == 1 and len(ents) != 0:
        ents = ents[0]
    return ents

In [None]:
persons = [get_frequent_entity(x, "PERSON",n=1) for x in docs]
# we are interested in the simplest case, where the NER found
# exactly 6 matches
matches = [x for x in persons if len(x) == 6]
print(f"Found {len(matches)} matches")

In [None]:
m = matches[0]
displacy.render(m[0].doc, "ent")

In [None]:
def replace_entity(ents, s):
    '''
    Replaces all occurences of entities in `ents` with `s`.
    `ents` is a list of entities as returned by `doc.ents`
    from an NER pipeline, they need to be from the same doc!
    '''
    offset = 0
    text = ents[0].doc.text
    new_text = ""
    for ent in ents:
        start = ent.start_char
        end = ent.end_char
        left = text[offset:start]
        new_text += left + s
        offset = end
    new_text += text[offset:]
    return new_text

In [None]:
n_docs = [replace_entity(m, "Niklas").split("\n") for m in matches]

In [None]:
n_docs = [m[0].doc.text.split("\n") for m in matches]

In [None]:
dataframe_replaced = pd.DataFrame(n_docs)
dataframe_replaced.columns = columns
dataframe_replaced.head()

In [None]:
import datasets
test_split = 0.2
batch_size = 8

action_dataframe = make_action_classification_dataframe(dataframe_replaced)

input_columns = ["action"]
action_dataframe["task_input"] = join_sentences(action_dataframe, input_columns, " ")
dataset = datasets.Dataset.from_pandas(action_dataframe)
dataset = dataset.train_test_split(test_size=test_split)


In [None]:

from transformers import (
    AutoModelForSequenceClassification, DistilBertTokenizerFast,
     Trainer, TrainingArguments, AutoModelWithLMHead, AutoTokenizer,
)
import torch

model = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForSequenceClassification.from_pretrained(model)

train_data, test_data = tokenize_and_split(dataset, tokenizer, "task_input")

# for prototyping, optional
small_train_data = train_data.shuffle(seed=42).select(range(1000))
small_test_data = test_data.shuffle(seed=42).select(range(1000))

training_args = TrainingArguments(
    output_dir="results/",
    num_train_epochs=5,              # total number of training epochs
    per_device_train_batch_size=8,  # batch size per device during training
    per_device_eval_batch_size=8,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=50,                # how often to log
    save_steps=1000,
    save_total_limit=0,
    evaluation_strategy="epoch",     # when to run evaluation
)

In [None]:
trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=small_train_data,   # training dataset
    eval_dataset=small_test_data,     # evaluation dataset
    compute_metrics=get_accuracy_metric,     # code to run accuracy metric
)
trainer.train()

In [None]:
from gender_guesser.detector import Detector

In [None]:
d = Detector()

In [None]:
d.get_gender("Jamie")

In [3]:
from ailignment.datasets.moral_stories import _lemmatize, get_moral_stories

In [15]:
import spacy
from string import punctuation
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
import pandas as pd

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\kiehne\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [5]:
nlp = spacy.load('en_core_web_sm', disable=['parser', 'textcat'])
STOP_WORDS = stopwords.words('english')
stories = get_moral_stories()
columns = ["moral_action", "immoral_action"]


In [6]:
series = stories[columns[0]]

In [7]:
def lemmatize_series(series, nlp, STOP_WORDS=None):
    '''
    Given a series of strings, returns a DataFrame(["lemmas", "tokens", "maps"])
    of the lemmatized strings according to `_lemmatize` function.
    '''
    # get rid of whitespace
    translation_table = str.maketrans(' ', ' ', punctuation)
    series = series.map(lambda x: x.translate(translation_table))
    series = series.map(lambda x: _lemmatize(x, nlp, STOP_WORDS))
    data = pd.DataFrame(series.to_list(), columns=["lemmas", "tokens", "maps"])
    return data

In [8]:
y = series.map(lambda x: _lemmatize(x, nlp, STOP_WORDS))

In [24]:
data = pd.DataFrame(y.to_list(), columns=["lemmas", "tokens", "maps"])


In [25]:
data

Unnamed: 0,lemmas,tokens,maps
0,"[kevin, decide, <STPWRD>, place, <STPWRD>, wag...","[Kevin, decides, to, place, a, wager, once, ev...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."
1,"[curtis, explain, <STPWRD>, <STPWRD>, woman, <...","[Curtis, explains, to, the, woman, that, he's,...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."
2,"[anna, decide, <STPWRD>, attend, <STPWRD>, wed...","[Anna, decides, to, attend, the, wedding, even...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."
3,"[greg, tell, james, <STPWRD>, <STPWRD>, <STPWR...","[Greg, tells, James, that, he, can, play, tomo...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."
4,"[ben, call, <STPWRD>, tow, truck, <STPWRD>, ge...","[Ben, calls, a, tow, truck, and, gets, the, ca...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."
...,...,...,...
11994,"[payton, ask, <STPWRD>, friend, <STPWRD>, <STP...","[Payton, asks, her, friend, where, she, bought...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."
11995,"[helen, <STPWRD>, <STPWRD>, conversation, <STP...","[Helen, has, a, conversation, with, her, husba...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."
11996,"[cole, move, away, <STPWRD>, <STPWRD>, vehicle...","[Cole, moves, away, from, the, vehicle, that, ...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."
11997,"[harry, ask, steve, <STPWRD>, <STPWRD>, <STPWR...","[Harry, asks, Steve, what, he, should, be, doi...","{0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5..."


In [18]:
b

1

In [19]:
c

2