## CMSC Interactive Fiction Storytelling - DnD Project 
### LLM TO DETECTION PLAYER INTENT BASED ON PLAYER NARRATIVES AND GAME STATE
#### Team Members, Arya Honraopatil, Saksham Kumar Sharma, and Patty Delafuente  

Our first step is to pull in the Fireball Dataset https://huggingface.co/datasets/lara-martin/FIREBALL that we will use for model training.  This dataset contains actual game play data from players playing a text based DnD game in Discord by way of the AVRAE DnD Discord Bot https://avrae.io/.    

The dataset is about 7gb. 

The below cells show all the functions that are required to preprocess the dataset. First the libraries are installed and imported after that a list of functions are added filter_rows, split_dataset, get_dndname, replacing_players_with_names, and preprocess_function.

The required libraries are installed and imported which will help us to preprocess the dataset.

In [8]:
#!pip install datasets
#!pip install jsonlines
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from datasets import load_dataset, Audio
from sklearn.model_selection import train_test_split


The filter_rows function is used to filter the rows which have nothing in their 'before_utterances' column. The code logic simply calculates the length of the string present in the rows of 'before_utterances' column. Then it filters out that dataframe where the rows of 'before_utterances' column have string of length>0. This is done so that to give good quality data to our model, as before_utterances is important for us to get the last conversation before a particular point.

In [9]:
def filter_rows(df, column):
    print(f"Rows before filtering: {len(df)}")
    # Filter rows where the length of the string in the specified column is greater than 0
    filtered_df = df[df[column].str.len() > 0].reset_index(drop=True)
    print(f"Rows after filtering: {len(filtered_df)}")
    return filtered_df

The split_dataset function is used to split the dataset into training, testing and validation functions. Here sklearn train_test_split library is utilized. This is done to ensure that our training dataset for model training is different than the testing and validation data. This gives us surity about the model's prediction when some unseen data is thrown at it.

In [10]:
def split_dataset(df, train_size=0.6, test_size=0.2, val_size=0.2):
    assert train_size + test_size + val_size == 1, "Train, test, and validation sizes must sum to 1."
    train_df, temp_df = train_test_split(df, test_size=(1 - train_size), random_state=42,shuffle=True)
    test_df, val_df = train_test_split(temp_df, test_size=val_size/(test_size + val_size), random_state=42,shuffle=True)
    print(f"Number of rows in the Train set: {len(train_df)}")
    print(f"Number of rows in the Test set: {len(test_df)}")
    print(f"Number of rows in the Validation set: {len(val_df)}")    
    return train_df, test_df, val_df

Mapping is being done in get_dndname function with a generic player identifier and the player name by referring to the index position in the list of combat_state. So identifier are being made like "Player 0", "Player 1"  and then a condition is checked to check if they match with the name. So if we are able to find a match then the name of the player is returned. So this function is used to replace "Player 1", "Player 0" with the original names. This is necessary as 'utterance_history' has utterances references with "Player 0", "Player 0" etc. This is a helper function for 'replacing_players_with_names' function.

In [11]:
def get_dndname(name,combat_state):
    cstate = combat_state
    index = 0
    for item in cstate:
        strname = 'Player ' + str(index)
        if strname == name:
            return item ['name']
        else:
            index = index + 1

This 'replacing_players_with_names' function is used to split by ':' the 'utterance_history', so that the text after player number can be extracted and joined with the name extracted from the get_dndname. Afterwards, its stored in result list. This is done so that our dataset 'utterance_history' column can be modified.

In [12]:
def replacing_players_with_names(dataset):
    final_column=[]
    for rowindex in range(len(dataset['combat_state_before'])):
      combat_state = dataset['combat_state_before'][rowindex]
      utter_hx = dataset['utterance_history'][rowindex]
      split_utterances = []
      for utterance in utter_hx:
          if ':' in utterance:
              name, text = utterance.split(':', 1)
              split_utterances.append((name.strip(), text.strip()))
          else:
              split_utterances.append((utterance.strip(), ''))
      result=[]
      for name, text in split_utterances:
          combat_name = get_dndname(name,combat_state)
          result.append(str(combat_name)+": "+text)
      final_column.append(result)
    dataset['utterance_history']=final_column
    return dataset



So this 'preprocess_function' is the heart of our program which loads the dataset and call multiple functions and then returns the training, testing and validation datasets for further analysis. We just have to call the preprocess_function to make the program run smoothly. This function is necessary because we need to divide the dataset into train, test, and validation splits. 

In [13]:
def preprocess_function():
    dataset = load_dataset("lara-martin/FIREBALL")
    # Access the 'train' dataset
    train_dataset = dataset['train']
    # Convert the dataset to a pandas DataFrame
    train_df = train_dataset.to_pandas()
    filtered_df = filter_rows(train_df, 'before_utterances')  # Drop rows where 'before_utterances' is empty
    filtered_df = replacing_players_with_names(filtered_df)
    train_df, test_df, val_df = split_dataset(filtered_df)  # This train_df, test_df, val_df are the resulting dataset after split
    
    train_df=train_df[['before_utterances','combat_state_before','current_actor', 'utterance_history','commands_norm']]
    train_df = train_df.reset_index(drop=True)
    train_label=train_df[['commands_norm']]
    train_df.drop(['commands_norm'], axis=1)

    test_df=test_df[['before_utterances','combat_state_before','current_actor', 'utterance_history','commands_norm']]
    test_df = test_df.reset_index(drop=True)
    test_label=test_df[['commands_norm']]
    test_df.drop(['commands_norm'], axis=1)

    val_df=val_df[['before_utterances','combat_state_before','current_actor', 'utterance_history','commands_norm']]
    val_df = val_df.reset_index(drop=True)
    val_label=val_df[['commands_norm']]
    val_df.drop(['commands_norm'], axis=1)
    return train_df,train_label, test_df, test_label, val_df, val_label

By calling the below command we get an idea of how many rows are deleted from the dataset, and how many rows of the dataset are given to training, testing and validation sets. So as we can see total 33432 rows are removed from the dataset. Only those rows are removed which have nothing in 'before_utterances' column.

Running the command to process the data in the below cell. We just have to call the below function to get the training data, training labels, testing data, testing labels, validation data, validation labels.


In [14]:
train_data, train_label, test_data, test_label,validation_data, validation_label=preprocess_function()


Rows before filtering: 153829
Rows after filtering: 120397
Number of rows in the Train set: 72238
Number of rows in the Test set: 24079
Number of rows in the Validation set: 24080


In [15]:
train_data.head()

Unnamed: 0,before_utterances,combat_state_before,current_actor,utterance_history,commands_norm
0,"[""We can't let you have that.""]","[{'name': 'Aratak', 'hp': '<244/244 HP; Health...","{'name': 'Kain Heisenberg', 'hp': '<418/418 HP...","[Snips: With a loud sigh, it takes out a knife...",['!cast bless -t aratak -t snips -t agent -t k...
1,[can he pick it up as a free action?)],"[{'name': 'Cerberus', 'hp': '<74/84 HP; Injure...","{'name': 'SG3', 'hp': '<143/152 HP; Injured>',...","[lair: did u giants might as ur ready action, ...",['!i a fist -rr 2 -t SGq1']
2,[I'll allow the damage to be transferred though.],"[{'name': 'Kyle Ravendust', 'hp': '<36/50 HP; ...","{'name': 'Ole Goldenforge', 'hp': '<45/45 HP; ...","[Ota: I'll pilot him if he's not back., Ota: *...",['!i a spirit -t FT1']
3,[using BA to have defender attack],"[{'name': 'Kahlee', 'hp': '<72/72 HP; Healthy>...","{'name': 'Ph's Battle Smith ', 'hp': '', 'clas...",[Phillip: Oh! That's awesome! She's gonna have...,['!a rend -t kahlee']
4,[*Thinks for advice and oeeks at the box all t...,"[{'name': 'The Jester', 'hp': '<43/43 HP; Heal...","{'name': 'Kallahan Adrastia', 'hp': '<164/164 ...",[None: _Ah shit....Hes killing the weakest one...,['!a ace -t jester -d 2d8[fire] -d 10 -b -5 -f...


In [None]:
#need to get the fireball dataset into training format

{"input": "You are a game agent in a DnD game. You will be provided with the current player, game "   state and the in game player utterance.  You task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will iniate the players intention in a discord DnD game. Return the avrae command. Here is the game state. current player: {'actions': None, 'attacks': 'Crossbow, heavy (Thrushrach), Shortsword (Thrushrach), Shortsword2 (Thrushrach), Unarmed Strike (Thrushrach), Bite (Thrushrach), Bite (Edna)', 'class': None, 'controller_id': '0', 'description': None, 'effects': '', 'hp': '', 'name': "Th's Ranger ", 'race': 'Group', 'spells': ''}, game state: [{'actions': 'Hungry Jaws, Drake Companion, Drake Companion: Command, Crossbow Expert Attack, Drake Companion: Summon, Primeval Awareness', 'attacks': 'Crossbow, heavy, Shortsword, Shortsword2, Unarmed Strike, Bite', 'class': 'Ranger 4', 'controller_id': '211023311098160087', 'description': None, 'effects': '', 'hp': '<32/32 HP; Healthy>', 'name': 'Thrushrach', 'race': 'Lizardfolk', 'spells': "Animal Friendship, Hunter's Mark, Thaumaturgy, Speak with Animals"}, {'actions': None, 'attacks': 'Bite', 'class': None, 'controller_id': '211023311098160087', 'description': None, 'effects': 'Bite', 'hp': '<25/25 HP; Healthy>', 'name': 'Edna', 'race': 'Drake Companion P2', 'spells': ''}, {'actions': 'Disarming Shot, Adept Marksman, Action Surge, Extra Attack, Violent Shot, Indomitable, Gunner, Deadeye Shot, Second Wind, Lucky', 'attacks': 'map, Pistol, Automatic, Pistol, Automatic2, Revolver, Shortsword, Unarmed Strike', 'class': 'Fighter 9', 'controller_id': '303160112743260203', 'description': None, 'effects': 'map', 'hp': '<85/85 HP; Healthy>', 'name': 'Cayde the 6th', 'race': 'Warforged', 'spells': ''}, {'actions': "Relentless Endurance, Martial Adept, Maneuvers: Evasive Footwork, Runic Shield, Action Surge, Cloud Rune, Extra Attack, Frost Rune, Indomitable, Storm Rune, Maneuvers: Parry, Giant's Might, Second Wind", 'attacks': 'Longbow, Warhammer +1, 2-Handed Warhammer +1, Unarmed Strike', 'class': 'Fighter 9', 'controller_id': '331427928050958624', 'description': '', 'effects': '', 'hp': '<94/94 HP; Healthy>', 'name': 'Nora Storm-child', 'race': 'Half-Orc', 'spells': ''}, {'actions': 'Infuse Item, Shapechange, Infiltrator Armor: Lightning Launcher (DEX), The Right Tool for the Job, Lightning Launcher, Arcane Armor - Create Armor, Magical Tinkering', 'attacks': 'Crossbow, light, Dagger, Rapier, Unarmed Strike, Infiltrator Armor: Lightning Launcher (DEX)', 'class': 'Artificer 4', 'controller_id': '211124537472256440', 'description': None, 'effects': '', 'hp': '<35/35 HP; Healthy>', 'name': 'Echo', 'race': 'Changeling', 'spells': "Tasha's Caustic Brew, Thunderwave, Identify, Catapult, Grease, Purify Food and Drink, Magic Missile, Booming Blade, Detect Magic, Cure Wounds, Fire Bolt"}, {'actions': 'Action Surge, Second Wind, Cloud Rune, Fire Rune', 'attacks': 'Longbow, Dragon Tooth, Handaxe, Handaxe2', 'class': 'Fighter 3', 'controller_id': '136503232070997789', 'description': '', 'effects': '', 'hp': '<37/37 HP; Healthy>', 'name': 'Skaði', 'race': 'Valkyrie', 'spells': ''}, {'actions': 'Rage, Divine Fury, Blood Curse of the Fallen Puppet (Amplified), Blood Curse of the Fallen Puppet', 'attacks': 'Divine Fury, Crimson Blade, Alternate Side', 'class': 'Barbarian  5/Blood Hunter 1', 'controller_id': '336156382909357813', 'description': '', 'effects': '', 'hp': '<83/83 HP; Healthy>', 'name': 'Ashildr', 'race': 'Half Elf', 'spells': ''}, {'actions': None, 'attacks': 'Spectral Longbow, Horrifying Visage, Wail', 'class': None, 'controller_id': '207027692360152249', 'description': None, 'effects': 'Wail Used', 'hp': '<58/58 HP; Healthy>', 'name': 'LB1', 'race': 'Lonelywood Banshee', 'spells': ''}], utterances: ["(at lest you didn't aim at my drake xD", "You didn't tell the party members ahead of time)", 'And you need to make death saving throws', 'Can i attack now >_> )', 'You can use action to check the body though'],
 "output": ['!a crossbow -t lb1']}

In [12]:
from datasets import Dataset
train_dataset = Dataset.from_pandas(train_data)

In [10]:
print(train_dataset[5])

{'before_utterances': ["*Kelrick and Oli you see the second Dire Wolf make its way out of the woods to the north it looks like it's about to jump the palisade*"], 'combat_state_before': [{'actions': 'Ki Points, Step of the Wind (Disengage), Fey Step, Unarmed Strike, Flurry of Blows, Step of the Wind (Dash), Patient Defense', 'attacks': 'Dart, Shortsword, Unarmed Strike, Unarmed Strike2', 'class': 'Monk 2', 'controller_id': '336032536808857706', 'description': None, 'effects': '', 'hp': '<15/15 HP; Healthy>', 'name': 'Ereykos Nailo', 'race': 'Eladrin (HB)', 'spells': ''}, {'actions': 'Tentacle of the Deeps: Move, Armor of Shadows, Create Pact Weapon, Tentacle of the Deeps: Attack, Tentacle of the Deeps: Summon, Mask of Many Faces, Chameleon Carapace (Change Color)', 'attacks': 'Tentacle of the Deep, Dagger, Dagger2, Greatclub, Quarterstaff, 2-Handed Quarterstaff, Rapier, Unarmed Strike, Tentacle of the Deeps: Attack', 'class': 'Warlock 4', 'controller_id': '224874611243730608', 'descrip

In [14]:
print (train_dataset['current_actor'][5]['name'])
print()
print (train_dataset['before_utterances'][5])
print()
print(train_dataset['commands_norm'][5])
print()
print(train_dataset['utterance_history'][5])

Karliah

["*Kelrick and Oli you see the second Dire Wolf make its way out of the woods to the north it looks like it's about to jump the palisade*"]

['!a rapier -title "Phantasmal Weapon" -t dw2']

['WO8: I know Artie was by the gate was Player 2 by the fire?', 'WO6: Player 2 is *in* artie piloting him lol', 'WO8: The two gates are 60ft apart so if you dashed yes', 'WO6: Hm. Ok. Will Sanctuary a Villager that looks likely to be attacked next and move normal movement.', "WO8: *Player 1 and Player 2 you see the second Dire Wolf make its way out of the woods to the north it looks like it's about to jump the palisade*"]


In [29]:
indx = 111
print (train_dataset['current_actor'][indx])
print()
print (train_dataset['before_utterances'][indx])
print()
print (train_dataset['combat_state_before'][indx])
print()
print(train_dataset['commands_norm'][indx])
print()
print(train_dataset['utterance_history'][indx])

{'actions': None, 'attacks': 'Crossbow, heavy (Thrushrach), Shortsword (Thrushrach), Shortsword2 (Thrushrach), Unarmed Strike (Thrushrach), Bite (Thrushrach), Bite (Edna)', 'class': None, 'controller_id': '0', 'description': None, 'effects': '', 'hp': '', 'name': "Th's Ranger ", 'race': 'Group', 'spells': ''}

["(at lest you didn't aim at my drake xD", "You didn't tell the party members ahead of time)", 'And you need to make death saving throws', 'Can i attack now >_> )', 'You can use action to check the body though']

[{'actions': 'Hungry Jaws, Drake Companion, Drake Companion: Command, Crossbow Expert Attack, Drake Companion: Summon, Primeval Awareness', 'attacks': 'Crossbow, heavy, Shortsword, Shortsword2, Unarmed Strike, Bite', 'class': 'Ranger 4', 'controller_id': '211023311098160087', 'description': None, 'effects': '', 'hp': '<32/32 HP; Healthy>', 'name': 'Thrushrach', 'race': 'Lizardfolk', 'spells': "Animal Friendship, Hunter's Mark, Thaumaturgy, Speak with Animals"}, {'actions

In [32]:
#train_dataset['commands_norm']

In [28]:
#functions to parse though fireball and format for finetuning
import json


def write_jsonl(fname, json_objs):
    with open(fname, 'wt') as f:
        for o in json_objs:
            f.write(json.dumps(o)+"\n")

def load_jsonl(file_path):
    data = []
    with open(file_path, 'r') as file:
        for line in file:
            data.append(json.loads(line))
    return data

def curr_state(jstr):
    combined_actors = []
    for item in jstr: 
        cname = "Player: " + item["name"]
        cname = cname.replace('"','')
        cname = cname.replace("'","")
        cclass = item["class"]
        if cclass == None:
            cclass = ''
        else:
            cclass = cclass.replace('"','')
            cclass = cclass.replace("'","")
        chp = item["hp"]
        cattack = item["attacks"]
        cattack = cattack.replace('"','')
        cattack = cattack.replace("'","")
        cspells = item["spells"]
        cspells = cspells.replace('"','')
        cspells = cspells.replace("'","")
    
        combined_string = f"{cname} ({cclass}) ({chp}) ({cattack}) ({cspells})"
        combined_actors.append(combined_string)
    a_string = ' '.join(combined_actors)
    return(a_string)

#function to convert json file to input/output format for llama instruct
def convert_for_llamainstruct_finetuning(json_filename,output_path):
    fireball_data = load_jsonl(json_filename)
    json_objs = []
    prompt='''You are a game agent in a DnD game. You will be provided with the current player, game state and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS]'''
    for item in fireball_data:
        #data={}
        cactordata = str(item['current_actor'])
        if cactordata != None:
            cactordata = cactordata.replace('"','')
            cactordata = cactordata.replace("'","")
            #cactor = "Current Player: " + str(item['current_actor'])
            cactor = " Current Player: " + cactordata + "[SEP]"
            cstate = " Game State: " + curr_state(item['combat_state_before']) + "[SEP]"
            utteritem = str(item['before_utterances'])
            utteritem = utteritem.replace('"','')
            utteritem = utteritem.replace("'",'')
            utteritem = utteritem.replace("\\",'')
            #utter = " Utterances : " + ' '.join(item['before_utterances']) + "[SEP]"
            utter = " Utterances : " + utteritem + "[SEP]"
        

            inputstr = f"{cactor}{cstate}{utter}"
            cmdstr =  str(item['commands_norm']) + "[SEP]"
            cmdstr = cmdstr.replace("'","")

            data = { "input": f'''{prompt}{cactor}{cstate}{utter} \nAvrae Command: ''',
                    "output": f'''{cmdstr}''' }
            strdata = str(data)
            lendata = len(strdata)
            if lendata < 4001:
                json_objs.append(data)
    write_jsonl(output_path,json_objs)
    return json_objs

#function to convert json file to input/output format for llama instruct
def convert_for_llamainstruct_finetuningM2(json_filename,output_path):
    fireball_data = load_jsonl(json_filename)
    json_objs = []
    prompt='''You are a game agent in a DnD game. You will be provided with the current player and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS]'''
    for item in fireball_data:
        #data={}
        cactordata = str(item['current_actor'])
        if cactordata != None:
            cactordata = cactordata.replace('"','')
            cactordata = cactordata.replace("'","")
            #cactor = "Current Player: " + str(item['current_actor'])
            cactor = " Current Player: " + cactordata + "[SEP]"
            cstate = " Game State: " + curr_state(item['combat_state_before']) + "[SEP]"
            utteritem = str(item['before_utterances'])
            utteritem = utteritem.replace('"','')
            utteritem = utteritem.replace("'",'')
            utteritem = utteritem.replace("\\",'')
            #utter = " Utterances : " + ' '.join(item['before_utterances']) + "[SEP]"
            utter = " Utterances : " + utteritem + "[SEP]"
        

            inputstr = f"{cactor}{cstate}{utter}"
            cmdstr =  str(item['commands_norm']) + "[SEP]"
            cmdstr = cmdstr.replace("'","")

            data = { "input": f'''{prompt}{cactor}{utter} \nAvrae Command: ''',
                    "output": f'''{cmdstr}''' }
            strdata = str(data)
            lendata = len(strdata)
            if lendata < 4001:
                json_objs.append(data)
    write_jsonl(output_path,json_objs)
    return json_objs

#function to convert json file to input/output format for llama instruct
def convert_for_llamainstruct_finetuningM3(json_filename,output_path):
    fireball_data = load_jsonl(json_filename)
    json_objs = []
    prompt='''You are a game agent in a DnD game. You will be provided with the current player and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS]'''
    for item in fireball_data:
        #data={}
        cactordata = str(item['current_actor'])
        if cactordata != None:
            cactordata = cactordata.replace('"','')
            cactordata = cactordata.replace("'","")
            #cactor = "Current Player: " + str(item['current_actor'])
            cactor = " Current Player: " + cactordata + "[SEP]"
            cstate = " Game State: " + curr_state(item['combat_state_before']) + "[SEP]"
            utteritem = str(item['before_utterances'])
            utteritem = utteritem.replace('"','')
            utteritem = utteritem.replace("'",'')
            utteritem = utteritem.replace("\\",'')
            #utter = " Utterances : " + ' '.join(item['before_utterances']) + "[SEP]"
            utter = " Utterances : " + utteritem + "[SEP]"
        

            inputstr = f"{cactor}{cstate}"
            cmdstr =  str(item['commands_norm']) + "[SEP]"
            cmdstr = cmdstr.replace("'","")

            data = { "input": f'''{prompt}{cactor}{cstate}\nAvrae Command: ''',
                    "output": f'''{cmdstr}''' }
            strdata = str(data)
            lendata = len(strdata)
            if lendata < 4001:
                json_objs.append(data)
    write_jsonl(output_path,json_objs)
    return json_objs


#function to convert json file to message prompt format for GPT models
def convert_for_gpt_finetuning(json_filename, output_path):
    fireball_data = load_jsonl(json_filename)
    system_message = "You are a game agent in a DnD game. You will be provided with the current player, game state and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state." 
    json_objs = []
    for item in fireball_data:
        data={}
        item_input = item['input']
        cactor = " Current Player: " + str(item['current_actor'])
        cstate = " Game State: " + curr_state(item['combat_state_before'])
        utter = " Utterances : " + ' '.join(item['before_utterances'])
        
        inputstr = f"{cactor}{cstate}{utter}"
        item_output = item['commands_norm']
        prompt = [{"role": "system","content": system_message},{"role": "user","content": inputstr},{"role":"assistant","content":item_output}]
        data['messages'] = prompt
        json_objs.append(data)

    write_jsonl(output_path,json_objs)
    return json_objs

#function takes in pandas df with input and output datasets and converts to multichoice format for RoBERTa model training
def convert_to_multichoice(df):
    
    df = pd.DataFrame(df)

    # Create a new dataframe with the multichoice format
    df_multichoice = pd.DataFrame(columns=['sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label'])

    # Loop through the original dataset
    for index, row in df.iterrows():
        # Set the value of sent2 to the input value
        sent2 = row['input']

        # Randomly select three outputs from the entire output column
        outputs = df['output'].sample(n=3, replace=False).tolist()

        # Add the actual ground truth output to the list
        outputs.append(row['output'])

        # Shuffle the list to randomize the position of the ground truth output
        np.random.shuffle(outputs)

        # Create a new row for the multichoice dataframe
        new_row = {
            'sent2': sent2,
            'ending0': outputs[0],
            'ending1': outputs[1],
            'ending2': outputs[2],
            'ending3': outputs[3],
            'label': outputs.index(row['output'])
        }

        # Append the new row to the multichoice dataframe
        df_multichoice = df_multichoice._append(new_row, ignore_index=True)
    return df_multichoice


In [35]:
#function to convert json file to input/output format for llama instruct
def convert_for_llamainstruct_finetuningM1(json_filename,output_path):
    fireball_data = load_jsonl(json_filename)
    json_objs = []
    prompt='''You are a game agent in a DnD game. You will be provided with the current player and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS]'''
    for item in fireball_data:
        #data={}
        cactordata = str(item['current_actor'])
        if cactordata != None:
            cactordata = cactordata.replace('"','')
            cactordata = cactordata.replace("'","")
            #cactor = "Current Player: " + str(item['current_actor'])
            cactor = " Current Player: " + cactordata + "[SEP]"
            cstate = " Game State: " + curr_state(item['combat_state_before']) + "[SEP]"
            utteritem = str(item['before_utterances'])
            utteritem = utteritem.replace('"','')
            utteritem = utteritem.replace("'",'')
            utteritem = utteritem.replace("\\",'')
            #utter = " Utterances : " + ' '.join(item['before_utterances']) + "[SEP]"
            utter = " Utterances : " + utteritem + "[SEP]"
        

            inputstr = f"{cactor}{utter}"
            cmdstr =  str(item['commands_norm']) + "[SEP]"
            cmdstr = cmdstr.replace("'","")

            data = { "input": f'''{cactor}{utter} \nAvrae Command: ''',
                    "output": f'''{cmdstr}''' }
            strdata = str(data)
            lendata = len(strdata)
            if lendata < 4001:
                json_objs.append(data)
    write_jsonl(output_path,json_objs)
    return json_objs

#function to convert json file to input/output format for llama instruct
def convert_for_llamainstruct_finetuningM4(json_filename,output_path):
    fireball_data = load_jsonl(json_filename)
    json_objs = []
    prompt='''You are a game agent in a DnD game. You will be provided with the current player and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS]'''
    for item in fireball_data:
        #data={}
        cactordata = str(item['current_actor'])
        if cactordata != None:
            cactordata = cactordata.replace('"','')
            cactordata = cactordata.replace("'","")
            #cactor = "Current Player: " + str(item['current_actor'])
            cactor = " Current Player: " + cactordata + "[SEP]"
            cstate = " Game State: " + curr_state(item['combat_state_before']) + "[SEP]"
            utteritem = str(item['before_utterances'])
            utteritem = utteritem.replace('"','')
            utteritem = utteritem.replace("'",'')
            utteritem = utteritem.replace("\\",'')
            #utter = " Utterances : " + ' '.join(item['before_utterances']) + "[SEP]"
            utter = " Utterances : " + utteritem + "[SEP]"
        

            inputstr = f"{cactor}"
            cmdstr =  str(item['commands_norm']) + "[SEP]"
            cmdstr = cmdstr.replace("'","")

            data = { "input": f'''{cactor} \nAvrae Command: ''',
                    "output": f'''{cmdstr}''' }
            strdata = str(data)
            lendata = len(strdata)
            if lendata < 4001:
                json_objs.append(data)
    write_jsonl(output_path,json_objs)
    return json_objs
#function to convert json file to input/output format for llama instruct

def convert_for_llamainstruct_finetuningM5(json_filename,output_path):
    fireball_data = load_jsonl(json_filename)
    json_objs = []
    prompt='''You are a game agent in a DnD game. You will be provided with the current player and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS]'''
    for item in fireball_data:
        #data={}
        cactordata = str(item['current_actor'])
        if cactordata != None:
            cactordata = cactordata.replace('"','')
            cactordata = cactordata.replace("'","")
            #cactor = "Current Player: " + str(item['current_actor'])
            cactor = " Current Player: " + cactordata + "[SEP]"
            cstate = " Game State: " + curr_state(item['combat_state_before']) + "[SEP]"
            utteritem = str(item['before_utterances'])
            utteritem = utteritem.replace('"','')
            utteritem = utteritem.replace("'",'')
            utteritem = utteritem.replace("\\",'')
            #utter = " Utterances : " + ' '.join(item['before_utterances']) + "[SEP]"
            utter = " Utterances : " + utteritem + "[SEP]"
        

            inputstr = f"{utter}"
            cmdstr =  str(item['commands_norm']) + "[SEP]"
            cmdstr = cmdstr.replace("'","")

            data = { "input": f'''{utter} \nAvrae Command: ''',
                    "output": f'''{cmdstr}''' }
            strdata = str(data)
            lendata = len(strdata)
            if lendata < 4001:
                json_objs.append(data)
    write_jsonl(output_path,json_objs)
    return json_objs

In [12]:
from datasets import load_dataset                   


In [13]:
#lets save the current dataset to a json file
from datasets import load_dataset  
train_dataset.to_json("data/fireball_train.jsonl")

Creating json from Arrow format:   0%|          | 0/73 [00:00<?, ?ba/s]

451618751

In [14]:
#now load the file you just saved and pass to the format function
filename = "data/fireball_train.jsonl"
train_json_objs = convert_for_llamainstruct_finetuning(filename,"data/fireball_train_finetuning.jsonl")
type(train_json_objs)

list

In [15]:
print(len(train_json_objs))

55110


In [16]:
#lets take a look if the prompts are in the dataset and ready for training
strt = str(train_json_objs[0])
print(len(strt))
print(strt)
#print(train_json_objs[0])

2964
{'input': 'You are a game agent in a DnD game. You will be provided with the current player, game state and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS] Current Player: {actions: None, attacks: Fist, Regeneration, class: None, controller_id: 241419117855225963, description: None, effects: Aid, hp: <143/152 HP; Injured>, name: SG3, race: Shield Guardian, spells: }[SEP] Game State: Player: Cerberus (Fighter 7) (<74/84 HP; Injured> (+11 temp)) (Glaive, Halberd, Handaxe, Handaxe2, Moonrazor, Shortsword, Unarmed Strike, Polearm Master - Opportunity Attack) (Burning Hands, Flame Blade, Produce Flame) Player: lair () () () () Player: RS1 () (<22/22 HP; Healthy>) (Bite) () Player: RS2 () (<22/22 HP; Healthy>) (Bite) () Player: RS3 () (<22/22 HP; Healthy>) (Bite) () Player: RS4 () (<22/

In [17]:
val_dataset = Dataset.from_pandas(validation_data)
val_dataset.to_json("data/fireball_val.jsonl")
valfilename = "data/fireball_val.jsonl"
val_json_objs = convert_for_llamainstruct_finetuning(valfilename,"data/fireball_val_finetuning.jsonl")
print(val_json_objs[0])

Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]



In [18]:
strp = str(val_json_objs[0])
print(strp)
print(len(strp))
#print(len(val_json_objs[0]))

3236


In [19]:
#now lets repeat for text data
test_dataset = Dataset.from_pandas(test_data)
test_dataset.to_json("data/fireball_test.jsonl")
testfilename = "data/fireball_test.jsonl"
test_json_objs = convert_for_llamainstruct_finetuning(testfilename,"data/fireball_test_finetuning.jsonl")
print(test_json_objs[0])

Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': 'You are a game agent in a DnD game. You will be provided with the current player, game state and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS] Current Player: {actions: None, attacks: Beak, Talons, class: None, controller_id: 981381483504357172, description: None, effects: buff, hp: <165/248 HP; Injured>, name: air monster, race: Roc, spells: }[SEP] Game State: Player: Minion () (<100/100 HP; Healthy> (+4 temp)) (Club) () Player: Fiona (Cleric 20) (<258/258 HP; Healthy>) (Akivean, 2-Handed Akivean, Crossbow, light, Unarmed Strike) (Beacon of Hope, Telekinesis, Dispel Magic, Revivify, Mending, Dispel Evil and Good, Warding Bond, Dimension Door, Guardian of Faith, Conjure Elemental, Mass Heal, Magic Missile, Sanctuary, True Seeing, Word of Radiance, Summon Celestial, Sleep, 

### Convert to MultiChoice for RoBERTa model training


In [95]:
fb_df = pd.DataFrame(train_json_objs)
fb_df.head()

Unnamed: 0,input,output
0,You are a game agent in a DnD game. You will b...,[!i a fist -rr 2 -t SGq1][SEP]
1,You are a game agent in a DnD game. You will b...,[!i a spirit -t FT1][SEP]
2,You are a game agent in a DnD game. You will b...,[!a rend -t kahlee][SEP]
3,You are a game agent in a DnD game. You will b...,[!a ace -t jester -d 2d8[fire] -d 10 -b -5 -f ...
4,You are a game agent in a DnD game. You will b...,"[!a rapier -title ""Phantasmal Weapon"" -t dw2][..."


In [29]:
#m1 before utterances + current player
from datasets import load_dataset  
train_dataset = Dataset.from_pandas(train_data)
train_dataset.to_json("data/fireball_train_m1.jsonl")
filename = "data/fireball_train_m1.jsonl"
train_json_objs = convert_for_llamainstruct_finetuningM1(filename,"data/fireball_train_finetuning_m1.jsonl")

val_dataset = Dataset.from_pandas(validation_data)
val_dataset.to_json("data/fireball_val_m1.jsonl")
valfilename = "data/fireball_val_m1.jsonl"
val_json_objs = convert_for_llamainstruct_finetuningM1(valfilename,"data/fireball_val_finetuning_m1.jsonl")
print(val_json_objs[0])

test_dataset = Dataset.from_pandas(test_data)
test_dataset.to_json("data/fireball_test_m1.jsonl")
testfilename = "data/fireball_test_m1.jsonl"
test_json_objs = convert_for_llamainstruct_finetuningM1(testfilename,"data/fireball_test_finetuning_m1.jsonl")
print(test_json_objs[0])

Creating json from Arrow format:   0%|          | 0/73 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': ' Current Player: {actions: Channel Divinity: Conquering Presence, Divine Smite, Extra Attack, Hexblades Curse (Heal), Harness Divine Power, Hex Warrior, Divine Sense, Lay on Hands, Lay on Hands (Cleanse), Hexblades Curse, Channel Divinity, Aura of Protection, Headwinds: Gust of Wind, Channel Divinity: Guided Strike, attacks: Bokuto, 2-Handed Bokuto, Dagger, Dagger2, Javelin, Moon-Touched Sword, Longsword, 2-Handed Moon-Touched Sword, Longsword, Unarmed Strike, class: Paladin 6/Warlock 1, controller_id: 425255192845774907, description: None, effects: , hp: <66/66 HP; Healthy>, name: Vexx Tempest, race: Mark of Storm Half-Elf, spells: Spiritual Weapon, Magic Weapon, Sacred Flame, Armor of Agathys, Bless, Guidance, Eldritch Blast, Divine Favor, Searing Smite, Shield, Feather Fall, Gust, Booming Blade, Lesser Restoration, Gust of Wind, Hex, Command, Shield of Faith, Hold Person}[SEP] Utterances : [Shit did you do that?, .!i oc ba1 stealth , It looks like a normal, new book. The 

Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': ' Current Player: {actions: None, attacks: Beak, Talons, class: None, controller_id: 981381483504357172, description: None, effects: buff, hp: <165/248 HP; Injured>, name: air monster, race: Roc, spells: }[SEP] Utterances : [Keeps shredding minion and will move 60ft up to be at 160total][SEP] \nAvrae Command: ', 'output': '[!i a bite -t min][SEP]'}


In [31]:
#m2 conversion #before utterances, combat state before, current actor state 
from datasets import load_dataset  
train_dataset.to_json("data/fireball_train_m2.jsonl")
filename = "data/fireball_train_m2.jsonl"
train_json_objs = convert_for_llamainstruct_finetuningM2(filename,"data/fireball_train_finetuning_m2.jsonl")

val_dataset = Dataset.from_pandas(validation_data)
val_dataset.to_json("data/fireball_val_m2.jsonl")
valfilename = "data/fireball_val_m2.jsonl"
val_json_objs = convert_for_llamainstruct_finetuningM2(valfilename,"data/fireball_val_finetuning_m2.jsonl")
print(val_json_objs[0])

test_dataset = Dataset.from_pandas(test_data)
test_dataset.to_json("data/fireball_test_m2.jsonl")
testfilename = "data/fireball_test_m2.jsonl"
test_json_objs = convert_for_llamainstruct_finetuningM2(testfilename,"data/fireball_test_finetuning_m2.jsonl")
print(test_json_objs[0])

Creating json from Arrow format:   0%|          | 0/73 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': 'You are a game agent in a DnD game. You will be provided with the current player and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS] Current Player: {actions: Channel Divinity: Conquering Presence, Divine Smite, Extra Attack, Hexblades Curse (Heal), Harness Divine Power, Hex Warrior, Divine Sense, Lay on Hands, Lay on Hands (Cleanse), Hexblades Curse, Channel Divinity, Aura of Protection, Headwinds: Gust of Wind, Channel Divinity: Guided Strike, attacks: Bokuto, 2-Handed Bokuto, Dagger, Dagger2, Javelin, Moon-Touched Sword, Longsword, 2-Handed Moon-Touched Sword, Longsword, Unarmed Strike, class: Paladin 6/Warlock 1, controller_id: 425255192845774907, description: None, effects: , hp: <66/66 HP; Healthy>, name: Vexx Tempest, race: Mark of Storm Half-Elf, spells: Spiritual We

Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': 'You are a game agent in a DnD game. You will be provided with the current player and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS] Current Player: {actions: None, attacks: Beak, Talons, class: None, controller_id: 981381483504357172, description: None, effects: buff, hp: <165/248 HP; Injured>, name: air monster, race: Roc, spells: }[SEP] Utterances : [Keeps shredding minion and will move 60ft up to be at 160total][SEP] \nAvrae Command: ', 'output': '[!i a bite -t min][SEP]'}


In [36]:
#m3 conversion #combat state before, current actor state 
from datasets import load_dataset, Dataset

train_dataset = Dataset.from_pandas(train_data)
train_dataset.to_json("data/fireball_train_m3.jsonl")
filename = "data/fireball_train_m3.jsonl"
train_json_objs = convert_for_llamainstruct_finetuningM3(filename,"data/fireball_train_finetuning_m3.jsonl")
print(train_json_objs)
                                                         
val_dataset = Dataset.from_pandas(validation_data)
val_dataset.to_json("data/fireball_val_m3.jsonl")
valfilename = "data/fireball_val_m3.jsonl"
val_json_objs = convert_for_llamainstruct_finetuningM3(valfilename,"data/fireball_val_finetuning_m3.jsonl")
print(val_json_objs[0])

test_dataset = Dataset.from_pandas(test_data)
test_dataset.to_json("data/fireball_test_m3.jsonl")
testfilename = "data/fireball_test_m3.jsonl"
test_json_objs = convert_for_llamainstruct_finetuningM3(testfilename,"data/fireball_test_finetuning_m3.jsonl")
print(test_json_objs[0])

Creating json from Arrow format:   0%|          | 0/73 [00:00<?, ?ba/s]

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]



Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': 'You are a game agent in a DnD game. You will be provided with the current player and the in game player utterance. Your task is to predict what action, attack or spell the play intends. Then provide the AVRAE command that will initiate the players intention in a discord DnD game. Return the avrae command. Here is the game state. [CLS] Current Player: {actions: None, attacks: Beak, Talons, class: None, controller_id: 981381483504357172, description: None, effects: buff, hp: <165/248 HP; Injured>, name: air monster, race: Roc, spells: }[SEP] Game State: Player: Minion () (<100/100 HP; Healthy> (+4 temp)) (Club) () Player: Fiona (Cleric 20) (<258/258 HP; Healthy>) (Akivean, 2-Handed Akivean, Crossbow, light, Unarmed Strike) (Beacon of Hope, Telekinesis, Dispel Magic, Revivify, Mending, Dispel Evil and Good, Warding Bond, Dimension Door, Guardian of Faith, Conjure Elemental, Mass Heal, Magic Missile, Sanctuary, True Seeing, Word of Radiance, Summon Celestial, Sleep, Bestow Curse

In [33]:
#m4 current_player
train_dataset = Dataset.from_pandas(train_data)
train_dataset.to_json("data/fireball_train_m4.jsonl")
filename = "data/fireball_train_m4.jsonl"
train_json_objs = convert_for_llamainstruct_finetuningM4(filename,"data/fireball_train_finetuning_m4.jsonl")
print(train_json_objs)
                                                         
val_dataset = Dataset.from_pandas(validation_data)
val_dataset.to_json("data/fireball_val_m4.jsonl")
valfilename = "data/fireball_val_m4.jsonl"
val_json_objs = convert_for_llamainstruct_finetuningM4(valfilename,"data/fireball_val_finetuning_m4.jsonl")
print(val_json_objs[0])

test_dataset = Dataset.from_pandas(test_data)
test_dataset.to_json("data/fireball_test_m4.jsonl")
testfilename = "data/fireball_test_m4.jsonl"
test_json_objs = convert_for_llamainstruct_finetuningM4(testfilename,"data/fireball_test_finetuning_m4.jsonl")
print(test_json_objs[0])

Creating json from Arrow format:   0%|          | 0/73 [00:00<?, ?ba/s]

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': ' Current Player: {actions: Channel Divinity: Conquering Presence, Divine Smite, Extra Attack, Hexblades Curse (Heal), Harness Divine Power, Hex Warrior, Divine Sense, Lay on Hands, Lay on Hands (Cleanse), Hexblades Curse, Channel Divinity, Aura of Protection, Headwinds: Gust of Wind, Channel Divinity: Guided Strike, attacks: Bokuto, 2-Handed Bokuto, Dagger, Dagger2, Javelin, Moon-Touched Sword, Longsword, 2-Handed Moon-Touched Sword, Longsword, Unarmed Strike, class: Paladin 6/Warlock 1, controller_id: 425255192845774907, description: None, effects: , hp: <66/66 HP; Healthy>, name: Vexx Tempest, race: Mark of Storm Half-Elf, spells: Spiritual Weapon, Magic Weapon, Sacred Flame, Armor of Agathys, Bless, Guidance, Eldritch Blast, Divine Favor, Searing Smite, Shield, Feather Fall, Gust, Booming Blade, Lesser Restoration, Gust of Wind, Hex, Command, Shield of Faith, Hold Person}[SEP] \nAvrae Command: ', 'output': '[!i cast hex -t Shiro][SEP]'}


Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': ' Current Player: {actions: None, attacks: Beak, Talons, class: None, controller_id: 981381483504357172, description: None, effects: buff, hp: <165/248 HP; Injured>, name: air monster, race: Roc, spells: }[SEP] \nAvrae Command: ', 'output': '[!i a bite -t min][SEP]'}


In [37]:
#m5 before utterances
train_dataset = Dataset.from_pandas(train_data)
train_dataset.to_json("data/fireball_train_m5.jsonl")
filename = "data/fireball_train_m5.jsonl"
train_json_objs = convert_for_llamainstruct_finetuningM5(filename,"data/fireball_train_finetuning_m5.jsonl")
print(train_json_objs)
                                                         
val_dataset = Dataset.from_pandas(validation_data)
val_dataset.to_json("data/fireball_val_m5.jsonl")
valfilename = "data/fireball_val_m5.jsonl"
val_json_objs = convert_for_llamainstruct_finetuningM5(valfilename,"data/fireball_val_finetuning_m5.jsonl")
print(val_json_objs[0])

test_dataset = Dataset.from_pandas(test_data)
test_dataset.to_json("data/fireball_test_m5.jsonl")
testfilename = "data/fireball_test_m5.jsonl"
test_json_objs = convert_for_llamainstruct_finetuningM5(testfilename,"data/fireball_test_finetuning_m5.jsonl")
print(test_json_objs[0])

Creating json from Arrow format:   0%|          | 0/73 [00:00<?, ?ba/s]

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': ' Utterances : [Shit did you do that?, .!i oc ba1 stealth , It looks like a normal, new book. The pages are written in normal ink.][SEP] \nAvrae Command: ', 'output': '[!i cast hex -t Shiro][SEP]'}


Creating json from Arrow format:   0%|          | 0/25 [00:00<?, ?ba/s]

{'input': ' Utterances : [Keeps shredding minion and will move 60ft up to be at 160total][SEP] \nAvrae Command: ', 'output': '[!i a bite -t min][SEP]'}
