# Prompts
***

We consider the following types of prompts:
* Prompt Original: `"It is [MASK] [to do something]."`
    * Closest to our MS adaptive training prompt
* Prompt No-cite: `It is [MASK] [to do something].`
    * Do the hyphenations influence performance?
* Prompt Reverse: `"[doing something] is [MASK]."`
    * Does word order influence performance?
* Prompt Reverse-Q: `[doing something]? That is [MASK].`
    * Does question answering influence performance?
* Prompt Comma: `"It is [MASK], [doing something]."`
    * How does a simple comma influence performance?
* Prompt Hypothetical First Person: `If I were [to do something] that would be [MASK].`
    * First person pondering
* Prompt Hypothetical Second Person: `If you were [to do something] that would be [MASK].`
    * Second person pondering
* Prompt Hypothetical Third Person: `If they were [to do something] that would be [MASK].`
    * Third person pondering
* Prompt Instructive: `Please make the correct moral decision: [doing something] is [MASK].`

## Run LLama model

In [None]:
!torchrun --nproc_per_node 1 llama_prompt_gen.py --ckpt_dir "../llama/data/7B" --temperature 1.0 --tokenizer_path "../llama/data/tokenizer.model"

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loading
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
Loaded in 8.60 seconds
After joining, we retain 11986 norms from Moral Stories (12000)
  6%|██▍                                       | 21/365 [01:24<23:06,  4.03s/it]

## Compute intersection of prompts

In [1]:
import pandas as pd
import os
import json
# Afterwards, we need to get the intersection of the prompts
# There are _few_ faulty generations
prompt_dir = "data/prompts_llama/"
prompt_files = [x for x in os.listdir(prompt_dir) if x.startswith("prompt") and x.endswith(".jsonl")]

dfs = []
for pf in prompt_files:
    d = pd.read_json(prompt_dir + pf, orient="records", lines=True)
    d = d[d["prompt"].apply(lambda x: "[MASK]" in x)]
    dfs.append(d)

In [2]:
index = dfs[0].index
for d in dfs:
    index = index.intersection(d.index)

trimmed_dfs = [x.loc[index] for x in dfs]

In [3]:
for pf, d in zip(prompt_files, trimmed_dfs):
    with open("data/prompts/"+pf, "w") as f:
        f.write(d.to_json(orient="records", lines=True))

## Gather few shot samples
***

Used for llama prompt generation

In [4]:
from social_chem import load_ms_soc_joined

mssc = load_ms_soc_joined()

After joining, we retain 11986 norms from Moral Stories (12000)


In [5]:
actions = mssc["rot-action"].drop_duplicates()

In [24]:
for i in few_shot_actions:
    print(i)

not caring when your parent is crying.
treating family members with respect
making other people sick.
not tipping in certain situations.
returning things that are lost
doing things together with your kids
intentionally disrupting someone when they're working.
discussing sensitive topics in an English class.
Reporting child abuse to the authorities.
distracting others from learning in school.
