# GOBLIN TIME PARTY TIME roll parser
### September 2023

My D&D group has had 2 hella long campaigns: "Gods' Blood Shed" (2012-2017) and "The Athenaeum" (2018-present). I eventually want to look at rolls for them, and that/those notebooks will be basically identical to this, but I wanted to start with the time I ran the "We Be Goblins" pathfinder module ("humorously" imo, named GOBLIN TIME PARTY TIME on roll20). This was a one night game and I built in a lot of things for the players so hopefully there's more consistency. It's also obviously much shorter, so things should run faster and give me a chance to work through issues without things taking forever.

Generally speaking, in all of these campaigns we didn't really use Roll20's built in character sheet options, so our rolls are more "raw" than they seem to be in the [inspiration for this project](https://github.com/axlan/roll20-chatlog-stats/tree/main): I should be able to reuse some of his work, but it'll need lots of modifications.

I'm choosing to ignore inline rolls from character sheets (for this test at least) since we have one (1) of those in this game and they're handled differently. I'm also only looking at players, not the GM (for both this test and in general) because for the other games I *wasn't* the GM and the chat logs I have don't include his rolls.

I've got the chat log downloaded from roll20 as a single page and saved locally on my computer. 

This is one roll
![screenshot of html](../a_roll.png)

Important note! Crit successes and crit fails look different from normal rolls (for d20 and d_other): note the `critsuccess` and `critfail` (will need to work out how to handle this)

![screenshot of nat20](../nat20.png)
![screenshot of nat1](../nat1.png)

From a roll I will eventually need
* the `data-playerid` matched to the player 
    - we weren't always consistent about the names, so the player id is going to be the only consistent way to do this
    - make a table of `data-playerid`s and `class="by"`s
* the `class="diceroll d20` or `d##` whatever number
    - I want to look at all the different dice that got rolled, not just the d20s
    - will need to check `crit*` to see if that will matter in my scraping
* the `class="didroll"` text to get the actual roll result for each die


In [92]:
from bs4 import BeautifulSoup

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn; seaborn.set()

# I locally saved the one page version of the chat log just to simplify
with open('../Chat Log for GOBLIN TIME PARTY TIME.html', mode='r', encoding="utf8") as chatlog:
    html_doc = chatlog.read()

soup = BeautifulSoup(html_doc, 'html.parser')

In [11]:
roll_messages = []

for div in soup.find_all('div'):
    if div.has_attr("class"):
        if 'rollresult' in div["class"]:
            roll_messages.append(div)

In [70]:
player_id_ref = []
rollers = []

for message in roll_messages:
    # the "by" is not always populated if the person has sent
    # multiple messages in a row
    if message.find('span', {"class":"by"}) is not None:
        
        # in the roll messages the class list always looks like
        # ['message', 'rollresult', other modifiers, ..., player--letters_etc]
        # so grab the last one
        p_id = message["class"][-1]
        
        # get the display name
        r_name = message.find('span', {"class":"by"}).text
        
        player_id_ref.append(p_id)
        rollers.append(r_name)
        
id_to_name = pd.DataFrame({'player_id':player_id_ref,
                           'roller':rollers}).drop_duplicates().sort_values(by=['player_id'])

id_to_name

Unnamed: 0,player_id,roller
11,player--N_Z4XzpmqSD1Cmb6ESj,Meredith S. (GM):
0,player--Na9_XYpVB_KcdD88kR0,Chuffy:
34,player--Na9_XYpVB_KcdD88kR0,Chuffy Lickwound:
2,player--Na9ao63L8CSB_no01ch,Ryan N.:
6,player--Na9ao63L8CSB_no01ch,POOG:
89,player--Na9ao63L8CSB_no01ch,Poog of Zarongel:
1,player--Na9au2P-jbgj7brQ7hx,Mogmurch:
3,player--Na9avp1nLcwR0AVnI-f,Sasha S.:
31,player--Na9avp1nLcwR0AVnI-f,Reta Bigbad:


In [71]:
# put in our actual human names, 
# accounting for repeats/different spellings or the Roll20 names
id_to_name['player']=['Mere (GM)', 
                      'Alex', 'Alex', 
                      'Ryan','Ryan','Ryan', 
                      'Mike', 
                      'Sasha', 'Sasha']

id_to_name = id_to_name.drop('roller', axis=1).drop_duplicates().reset_index().drop('index', axis=1)
id_to_name

Unnamed: 0,player_id,player
0,player--N_Z4XzpmqSD1Cmb6ESj,Mere (GM)
1,player--Na9_XYpVB_KcdD88kR0,Alex
2,player--Na9ao63L8CSB_no01ch,Ryan
3,player--Na9au2P-jbgj7brQ7hx,Mike
4,player--Na9avp1nLcwR0AVnI-f,Sasha


In [78]:
player_ids = [] #roll20 unique id to match
die_type = []
roll_result = []

for message in roll_messages:
    
    # in the roll messages the class list always looks like
    # ['message', 'rollresult', other modifiers, ..., player--alphanumeric]
    # so grab the last one
    p_id = message["class"][-1]
    
    # in the diceroll the class list always looks like
    # ['diceroll', 'dwhatever', other modifiers eg critfail, ...]
    # so grab the position 1 element in that list
    # will be a list of dice types depending on how many rolled
    dice = [div["class"][1] for div in message.find_all('div', class_="diceroll")]
    
    # the 'didroll' text contains the raw dice roll(s)
    # without any modifiers
    # will again be a list of the same length as above
    results = [div.text for div in message.find_all("div","didroll")]
    
    # want to keep the same player id for each of the rolls in this message
    # so zip up the dice and results and iterate
    for die, result in zip(dice, results):
        player_ids.append(p_id)
        
        die_type.append(die)
        
        roll_result.append(result)

id_and_rolls = pd.DataFrame({'player_id': player_ids,
                             'die_type': die_type,
                             'roll_result': roll_result})
id_and_rolls

Unnamed: 0,player_id,die_type,roll_result
0,player--Na9_XYpVB_KcdD88kR0,d20,19
1,player--Na9_XYpVB_KcdD88kR0,d20,17
2,player--Na9_XYpVB_KcdD88kR0,d20,7
3,player--Na9_XYpVB_KcdD88kR0,d20,14
4,player--Na9_XYpVB_KcdD88kR0,d20,10
...,...,...,...
239,player--Na9ao63L8CSB_no01ch,d20,10
240,player--Na9ao63L8CSB_no01ch,d20,13
241,player--Na9au2P-jbgj7brQ7hx,d20,5
242,player--Na9_XYpVB_KcdD88kR0,d20,4


In [125]:
names_and_rolls = id_to_name.merge(id_and_rolls, on='player_id').drop('player_id', axis=1)
names_and_rolls.set_index(['player', 'die_type'], inplace=True)
names_and_rolls

Unnamed: 0_level_0,Unnamed: 1_level_0,roll_result
player,die_type,Unnamed: 2_level_1
Mere (GM),d10,5
Mere (GM),d20,20
Mere (GM),d20,20
Mere (GM),d20,15
Mere (GM),d20,18
...,...,...
Sasha,d4,4
Sasha,d20,18
Sasha,d4,3
Sasha,d20,11


In [126]:
# drop rolls by GM: won't be able to see Mike's hidden rolls when he GMed
names_and_rolls.drop('Mere (GM)', axis=0, inplace=True)
names_and_rolls.reset_index(inplace=True)
names_and_rolls

Unnamed: 0,player,die_type,roll_result
0,Alex,d20,19
1,Alex,d20,17
2,Alex,d20,7
3,Alex,d20,14
4,Alex,d20,10
...,...,...,...
201,Sasha,d4,4
202,Sasha,d20,18
203,Sasha,d4,3
204,Sasha,d20,11


In [133]:
names_and_rolls.loc[(names_and_rolls.die_type == 'd20')]

Unnamed: 0,player,die_type,roll_result
0,Alex,d20,19
1,Alex,d20,17
2,Alex,d20,7
3,Alex,d20,14
4,Alex,d20,10
...,...,...,...
197,Sasha,d20,16
200,Sasha,d20,14
202,Sasha,d20,18
204,Sasha,d20,11


### Comparing luck

To compare people's luck, I think I want to make histograms for each die type and normalize each by the number of rolls the person made so they're actually comparable.

Since this was a one shot, there aren't very many rolls and these histograms will be a little silly, but we're going proof-of-concept here