# Prepare the fireball dataset for training

In [2]:
from datasets import load_dataset, Dataset
from tqdm import tqdm

  from .autonotebook import tqdm as notebook_tqdm


The dataset has been extracted from the official .tar.gz link and the preprocessed data has been used.
Check the offical GitHub of [FIREBALL](https://github.com/zhudotexe/FIREBALL)  

It is now stored on Hugging Face

In [3]:
dataset = load_dataset("JeremyArancio/fireball", split="train")
dataset

Found cached dataset parquet (/home/jeremy/.cache/huggingface/datasets/JeremyArancio___parquet/JeremyArancio--fireball-0c4eb766134263ea/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)


Dataset({
    features: ['before_utterances', 'commands_norm', 'automation_results', 'after_utterances', 'utterance_history'],
    num_rows: 153829
})

In [4]:
it = iter(dataset)

In [54]:
next(it)

{'before_utterances': ['"c\'mon now we\'ve got em\'! just finish him off!"'],
 'commands_norm': ['!a Greataxe -t or4 adv'],
 'automation_results': ['Orance attacks with a Greataxe!\nOrance attacked OR4 and hit.\nOR4 took 14 damage.'],
 'after_utterances': [],
 'utterance_history': ['Player 3 of Twilight [6]: (not even in round 4',
  'Player 3 of Twilight [6]: (probably i havent seen it yet',
  'Fredbear (Zal 6)(Player 2 6): Hehehe let’s see if you can handle this weapon of mine?',
  'Fredbear (Zal 6)(Player 2 6): ‘She begin to put both her javelin and her shield away into her back while inrage then brought a big great axe and go for a double strike against this or4’',
  'Player 3 of Twilight [6]: "c\'mon now we\'ve got em\'! just finish him off!"']}

Action steps:
1. before_utterances comes before the command action (it is also stored in utterance history) 
2. action command (automation_result / commands_normalised)
3. results of the action on the next line **or** in next_utterance. In the last case, the next_utterance is not stored in the history


```txt
'before_utterances': ['The orc assesses the battle field, and sees all of his comrades have been slain. He looks at Cali with blood all over her mouth and takes a swing at her']

'commands_norm': ['!i aoo OR2 greataxe -t cali'],
'automation_results': ['OR2 attacks with a Greataxe!\nOR2 attacked Cali Burn and hit.\nCali Burn took 8 damage.']

Next row
---------------------------------------------------
'before_utterances': [],
'commands_norm': ['!i a greataxe -t Cali'],
'automation_results': ['OR2 attacks with a Greataxe!\nOR2 attacked Cali Burn and hit.\nCali Burn took 7 damage.']
'after_utterances': ['"I\'ll take you with me devil!" the orc screams as he hits Cali twice with his Greataxe'],

Next row
---------------------------------------------------µ
'before_utterances': ['Seeing his ineveitable doom as the party closes in on him, the Orc lets out a roar and beats his chest "For GLORY!"',
  '"Lets get it over with...slay this beast" *he says*',
  '"Put it right between his eyes!"',
  'She then position herself going for the right spot then she throws that javelin trying to aim for between the eyes']

'commands_norm': ['!a javelin -t or2 adv'],
'automation_results': ['Orance attacks with a Javelin!\nOrance attacked OR2 and hit.\nOR2 took 10 damage.']

'after_utterances': ['The javelin flies through the air, not hitting him in the face, but sticking him in the shoulder. He breaks it off as he pushes through the pain']

Next row
---------------------------------------------------
'before_utterances': ['Actually…you know what? Screw it I will hit him with my own Greataxe! ‘She suddenly move fast as soon she begin a FRENZY of two attacks!’']


'commands_norm': ['!a Frenzy'],
'automation_results': ['Orance uses Frenzy Rage!\nOrance gained Rage.']

'after_utterances': [],

Next row
---------------------------------------------------
'before_utterances': ['"Damn you! " he screams as he swings his axe at Orance'],

'commands_norm': ['!i aoo OR2 greataxe -t Orance'],
'automation_results': ['OR2 attacks with a Greataxe!\nOR2 attacked Orance and hit.\nOrance took 7 damage.']

'after_utterances': ['The Orc connects solidly, but Orance seemed unphased'],

```

In [7]:
df = dataset.to_pandas(batch_size= 200)
df[100:110]

Unnamed: 0,before_utterances,commands_norm,automation_results,after_utterances,utterance_history
100,[],"[!a bop -t BB3 magical adv\n-title ""[name] rec...",[Riena recklessly attacks with a Bop!\nRiena a...,[Razor.*Riena would then try to pluck bb3 out ...,[Player 5: !a battle -t 3 -d 3 -d 4d6]
101,"[*Since riena doesnt want, not need to kill th...","[!a bop -t BB6 -rr 3 magical adv\n-title ""[nam...",[Riena recklessly attacks with a Bop!\nRiena a...,[],"[Player 5: !a battle -t 3 -d 3 -d 4d6, Player ..."
102,[*Only a few left shell **violently** put the ...,"[!a bop -t BB4 magical adv\n-title ""[name] rec...",[Riena recklessly attacks with a Bop!\nRiena a...,[],"[Player 5: !a battle -t 3 -d 3 -d 4d6, Player ..."
103,[],[!a trip -t bb8 -d -2],[Riena uses Maneuvers: Trip Attack!\nBB8 took ...,[*As the bee falls shell try to grab it and sa...,"[Player 5: !a battle -t 3 -d 3 -d 4d6, Player ..."
104,"[*turns and kisses Revas goodbye*\n""I'll be ba...",[!cast synaptic],[Valerie Black casts Synaptic Static!\n],[],"[Player 7: The drone falls to the ground, Play..."
105,[You do but the stats are worse than any of yo...,[!cast hex -t 1 -i],[Gwyn Woodborn casts Hex!\nGwyn Woodborn gaine...,[],[Player 3: You do but the stats are worse than...
106,[_pops out some claws and strikes at the beast_],"[!cast savagery -t 1 -d ""1d6 [hex necrotic]"" -...",[Gwyn Woodborn casts Primal Savagery!\nGwyn Wo...,[],"[Player 3: 8 hour buff if you got them, Player..."
107,"[""You know, I don't usually do dog meat, but.....",[!a greatsword -t BigDog1 -rr2 adv],"[Mick ""Ram"" Gordon attacks with a Forsaken Fab...",[],"[Player 3: 30ft away kick us off!, Player 3: ""..."
108,[_He pull out his gun and aim at the not cute ...,"[!a pepper eadv -t bigdog1 -d 3d6 -f ""Sneak At...",[The Jester attacks with a Pepperbox (Exandria...,[],"[Player 3: ""What the hell are those?!?!"", Lady..."
109,[goes to attack 1 recklessly],"[!a claw magical -phrase ""Once per turn when y...",[Gorb attacks with a Claws!\nGorb attacked Big...,[],[Lady Gwen_Player 4 Dd5|Mk5: _heads towards 1 ...


In [35]:
# How many rows without after_utterance?
print(f"{dataset.num_rows = }")
def filter_function(example):
    return example["after_utterances"] != []
print(f"Num rows of filtered data: {dataset.filter(filter_function).num_rows}")

dataset.num_rows = 153829


                                                                         

Num rows of filtered data: 43372




It seems it corresponds to what the paper describes for the next utterance prediction. (~44000)

## Observations

### Issues
1. Sometimes, "before utterance" or "after utterance" can be missing. Because the story can continue over lines (jsonl) / events, we will create a dataset that will have the required triplet for each event. It's also possible that there's no utterance in a row.

2. The utterance (and utterance history are sometimes not Out-Of-Characters -> check the [paper](https://arxiv.org/pdf/2305.01528.pdf) to understand this point)

3. In *utterance_history*, player's name doesn't respect any structure

4. "*" in utterances

4. Commands in utterance history

5. Possibility to miss *before_utterance* and *utterance_history* in the same event 

6. Avrae commands in utterance history 

### Solutions
1. 
    * If no *before_utterance*, take the last utterance in *utterance_history*
    * If no *after_utterance*, drop the event (like in the paper)

2. ~   

3. Remove player's nammes from *utterance_history* 

4. Remove "*" from utterances

5. Drop the event when *before_utterance* and *utterance_history* missing

## Process the dataset

The script describing the dataset processing in located at `scripts/fireball_preparation.py`

## Dataset-V1 visualization 

In [31]:
from datasets import load_dataset
import json
from random import randint

dataset = load_dataset("JeremyArancio/fireball_v1", split="train")

Downloading readme: 100%|██████████| 398/398 [00:00<00:00, 1.46MB/s]


Downloading and preparing dataset None/None to /home/jeremy/.cache/huggingface/datasets/JeremyArancio___parquet/JeremyArancio--fireball_v1-dac1602dee132ad2/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7...


Downloading data: 100%|██████████| 14.7M/14.7M [00:02<00:00, 6.10MB/s]
Downloading data files: 100%|██████████| 1/1 [00:04<00:00,  4.11s/it]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 1079.34it/s]
                                                                                        

Dataset parquet downloaded and prepared to /home/jeremy/.cache/huggingface/datasets/JeremyArancio___parquet/JeremyArancio--fireball_v1-dac1602dee132ad2/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7. Subsequent calls will reuse this data.




In [35]:
n = randint(0, len(dataset))
print(f"PROMPT:\n{dataset[n]['prompt']}")
print(f"PREDICTION:\n{dataset[n]['prediction']}")

PROMPT:

    Last utterance:
    3/6 it's in pings. I have marked how much it has left

    Command description:
    Kain Heisenberg uses Divine Smite!
Abaddon took 24 damage.
Abaddon took 28 damage.

    History:
    On Player 2's third to last strike, another one of the purple streaks on his wing dims. 2/4

He Glares at Player 2
"You have been causing me the most amount of problems."

You see one of the streaks on his wings light up, and another one of his threads burn up again

Another thread on its crown dims again

3/6 it's in pings. I have marked how much it has left
    
PREDICTION:
On Kain's first hit, another purple streak dims

You all see his sword become consumed in a pure void of emptiness as he brings it down into the ground, the feeling of the forbiddance now dissappearing
"Now we take this seriously... Today Kain, your soul will join the rest of those I've taken."

Reality around you begins to shift as it seems like the walls of the bunker and everything begins to crumb

In [27]:
print("hello\n\nworld")

hello

world
