# Prepare the fireball dataset for training

In [68]:
from datasets import load_dataset, Dataset
from tqdm import tqdm
import json

The dataset has been extracted from the official .tar.gz link and the preprocessed data has been used.
Check the offical GitHub of [FIREBALL](https://github.com/zhudotexe/FIREBALL)  

It is now stored on my hf account

In [2]:
dataset = load_dataset("JeremyArancio/fireball", split="train")
dataset

Found cached dataset parquet (/home/jeremy/.cache/huggingface/datasets/JeremyArancio___parquet/JeremyArancio--fireball-0c4eb766134263ea/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)


Dataset({
    features: ['before_utterances', 'commands_norm', 'automation_results', 'after_utterances', 'utterance_history'],
    num_rows: 153829
})

In [3]:
it = iter(dataset)

In [27]:
next(it)

{'before_utterances': [],
 'commands_norm': ['!a handaxe -t or3 adv'],
 'automation_results': ['Orance attacks with a Handaxe!\nOrance attacked OR3 and hit.\nOR3 took 8 damage.'],
 'after_utterances': ['The orc snarls at the puny hit from Orance "I thought better of you...you\'re a disgrace to all Orcs!"'],
 'utterance_history': ['Player 3 of Twilight [6]: Also enraged from watching his leader fall, he looks towards Guldar. He raises his axe...',
  'Player 3 of Twilight [6]: The orcs miss as the tiny dwarf is too quick on his feat. They reply " The only thing i\'ll learn from you is what your insides taste like!" *he snarls*',
  'Player 3 of Twilight [6]: Dropping the drums ealier, he swings with ferocity towards Cali',
  'Player 3 of Twilight [6]: The orc looks at Cali as he glances off her arm. "you\'re sure to pay for this!"',
  'Player 3 of Twilight [6]: (you make movement to them then ya?']}

Action steps:
1. before_utterances comes before the command action (it is also stored in utterance history) 
2. action command (automation_result / commands_normalised)
3. results of the action on the next line **or** in next_utterance. In the last case, the next_utterance is not stored in the history


```txt
'before_utterances': ['The orc assesses the battle field, and sees all of his comrades have been slain. He looks at Cali with blood all over her mouth and takes a swing at her']

'commands_norm': ['!i aoo OR2 greataxe -t cali'],
'automation_results': ['OR2 attacks with a Greataxe!\nOR2 attacked Cali Burn and hit.\nCali Burn took 8 damage.']

Next row
---------------------------------------------------
'before_utterances': [],
'commands_norm': ['!i a greataxe -t Cali'],
'automation_results': ['OR2 attacks with a Greataxe!\nOR2 attacked Cali Burn and hit.\nCali Burn took 7 damage.']
'after_utterances': ['"I\'ll take you with me devil!" the orc screams as he hits Cali twice with his Greataxe'],

Next row
---------------------------------------------------µ
'before_utterances': ['Seeing his ineveitable doom as the party closes in on him, the Orc lets out a roar and beats his chest "For GLORY!"',
  '"Lets get it over with...slay this beast" *he says*',
  '"Put it right between his eyes!"',
  'She then position herself going for the right spot then she throws that javelin trying to aim for between the eyes']

'commands_norm': ['!a javelin -t or2 adv'],
'automation_results': ['Orance attacks with a Javelin!\nOrance attacked OR2 and hit.\nOR2 took 10 damage.']

'after_utterances': ['The javelin flies through the air, not hitting him in the face, but sticking him in the shoulder. He breaks it off as he pushes through the pain']

Next row
---------------------------------------------------
'before_utterances': ['Actually…you know what? Screw it I will hit him with my own Greataxe! ‘She suddenly move fast as soon she begin a FRENZY of two attacks!’']


'commands_norm': ['!a Frenzy'],
'automation_results': ['Orance uses Frenzy Rage!\nOrance gained Rage.']

'after_utterances': [],

Next row
---------------------------------------------------
'before_utterances': ['"Damn you! " he screams as he swings his axe at Orance'],

'commands_norm': ['!i aoo OR2 greataxe -t Orance'],
'automation_results': ['OR2 attacks with a Greataxe!\nOR2 attacked Orance and hit.\nOrance took 7 damage.']

'after_utterances': ['The Orc connects solidly, but Orance seemed unphased'],

```

## Preparation process

Sometimes, "before utterance" or "after utterance" can be missing. 
Because the story can continue over lines (jsonl) / events, we will create a dataset that will have the required triplet for each event. 

In [111]:
def fill_up_before_after_utterances(dataset: Dataset) -> Dataset:
    new_dataset = dataset
    for i in tqdm(range(len(dataset))):
        # before_utterances
        if i > 0 and not dataset[i]['before_utterances']:
            new_dataset[i]["before_utterances"] =  dataset[i-1]["after_utterances"]
        # after_utterances
        if i < len(dataset) and not dataset[i]["after_utterances"]:
            j = 1
            while not dataset[i + j]["before_utterances"]:
                j += 1
            new_dataset[i]["after_utterances"] = dataset[i+1]['before_utterances']
        if i > 1:
            assert new_dataset[i]["before_utterances"] and new_dataset[i]["after_utterances"], f"""Utterances missing at index {i}\n{new_dataset[i]} """

    return new_dataset

In [112]:
%%time
new_dataset = fill_up_before_after_utterances(dataset)

  0%|          | 6/153829 [00:00<15:01, 170.55it/s]


AssertionError: Utterances missing at index 6
{'before_utterances': ['The second orc takes his swing at Guldar....'], 'commands_norm': ['!i a greataxe -t guldar'], 'automation_results': ['OR4 attacks with a Greataxe!\nOR4 attacked Guldar Battleglug but missed.\n'], 'after_utterances': [], 'utterance_history': ['Player 3 of Twilight [6]: The second Orc swings almost simultaneously as the first.', 'Player 3 of Twilight [6]: The orcs to the east make their way towards the party, charging down the hill!', 'Player 3 of Twilight [6]: The orcs are within melee range of Cali, and Guldar. The orc lifts its axe with murderous intent....', 'Player 3 of Twilight [6]: The Orc connects, hitting Cali squarely', 'Player 3 of Twilight [6]: The second orc takes his swing at Guldar....']} 

In [103]:
next(it)

{'before_utterances': ['Salients fingertips start to glow as he watches Cali almost rip the chiefs throat out.'],
 'commands_norm': ['!cast magic missile -l 2 -t or5 -t or5 -t or5 -t or1'],
 'automation_results': ['Salient of Twilight casts Magic Missile!\nOR5 took 5 damage.\nOR5 took 5 damage.\nOR5 took 5 damage.\nOR1 took 5 damage.'],
 'after_utterances': ["Salient's darts shoot from his hand, 3 hit the chief in a row, impaling his skull and spilling his brains onto the grass...The 4th dart hits The orc in front of him squarly in the chest and leaves a singe mark on his armor."],
 'utterance_history': ['Player 3 of Twilight [6]: OR1 and 2 scream and panic due to dreadful thoughts of not dying in combat....but instead dying to a freak axe sharpening accident',
  'Player 3 of Twilight [6]: As Cali approaches the Orc Chief, he takes blow after blow until he is taken to his belly. Cali bites him on the back of his neck causing a tremendous wound.',
  'Player 3 of Twilight [6]: "The orc l