# GOBLIN TIME PARTY TIME roll parser
### September 2023

My D&D group has had 2 hella long campaigns: "Gods' Blood Shed" (2012-2017) and "The Athenaeum" (2018-present). I eventually want to look at rolls for them, and that/those notebooks will be basically identical to this, but I wanted to start with the time I ran the "We Be Goblins" pathfinder module ("humorously" imo, named GOBLIN TIME PARTY TIME on roll20). This was a one night game and I built in a lot of things for the players so hopefully there's more consistency. It's also obviously much shorter, so things should run faster and give me a chance to work through issues without things taking forever.

Generally speaking, in all of these campaigns we didn't really use Roll20's built in character sheet options, so our rolls are more "raw" than they seem to be in the [inspiration for this project](https://github.com/axlan/roll20-chatlog-stats/tree/main): I should be able to reuse some of his work, but it'll need lots of modifications.

I'm choosing to ignore inline rolls from character sheets (for this test at least) since we have one (1) of those in this game and they're handled differently. I'm also only looking at players, not the GM (for both this test and in general) because for the other games I *wasn't* the GM and the chat logs I have don't include his rolls.

I've got the chat log downloaded from roll20 as a single page and saved locally on my computer. 

This is one roll
![screenshot of html](../a_roll.png)

Important note! Crit successes and crit fails look different from normal rolls (for d20 and d_other): note the `critsuccess` and `critfail` (will need to work out how to handle this)

![screenshot of nat20](../nat20.png)
![screenshot of nat1](../nat1.png)

From a roll I will eventually need
* the `data-playerid` matched to the player 
    - we weren't always consistent about the names, so the player id is going to be the only consistent way to do this
    - make a table of `data-playerid`s and `class="by"`s
* the `class="diceroll d20` or `d##` whatever number
    - I want to look at all the different dice that got rolled, not just the d20s
    - will need to check `crit*` to see if that will matter in my scraping
* the `class="didroll"` text to get the actual roll result for each die


In [2]:
from bs4 import BeautifulSoup
import pandas as pd

# I locally saved the one page version of the chat log just to simplify
with open('../Chat Log for GOBLIN TIME PARTY TIME.html', mode='r', encoding="utf8") as chatlog:
    html_doc = chatlog.read()

soup = BeautifulSoup(html_doc, 'html.parser')

In [3]:
player_ids = set(span.text for span in soup.find_all('span', {'class':"by"}))
player_ids

{'(To GM):',
 'Chuffy Lickwound:',
 'Chuffy:',
 'Meredith S. (GM):',
 'Mogmurch:',
 'POOG:',
 'Poog of Zarongel:',
 'Reta Bigbad:',
 'Ryan N.:',
 'Sasha S.:'}

In [4]:

d20rolls = [div.find('div', {"class":"didroll"}).text for div in soup.find_all('div', class_="diceroll")]
whichdie = [div["class"][1] for div in soup.find_all('div', class_="diceroll")]
set(whichdie)


{'d10', 'd20', 'd24', 'd30', 'd4', 'd6', 'd8'}

In [5]:
player_ids = []

for div in soup.find_all('div'):
    if div.has_attr("class"):
        if 'rollresult' in div["class"]:
            player_ids.append(div["class"][-1])
            
set(player_ids)

{'player--N_Z4XzpmqSD1Cmb6ESj',
 'player--Na9_XYpVB_KcdD88kR0',
 'player--Na9ao63L8CSB_no01ch',
 'player--Na9au2P-jbgj7brQ7hx',
 'player--Na9avp1nLcwR0AVnI-f'}

In [7]:
for div in soup.find_all('div', class_='player--N_Z4XzpmqSD1Cmb6ESj'):
    if div.find("span", {"class":"by"}).text is not None:
        print(div.find("span", {"class":"by"}).text)
    

Meredith S. (GM):


AttributeError: 'NoneType' object has no attribute 'text'

In [8]:
messages = soup.find_all('div', 'message')


In [9]:
for message in messages:
    tstamp = message.find(class_='tstamp')

In [10]:
print(pd.to_datetime(messages[10].find(class_="tstamp").text))

AttributeError: 'NoneType' object has no attribute 'text'

In [11]:
roll_messages = []

for div in soup.find_all('div'):
    if div.has_attr("class"):
        if 'rollresult' in div["class"]:
            roll_messages.append(div)

In [56]:
player_id_ref = []
rollers = []

for message in roll_messages:
    if message.find('span', {"class":"by"}) is not None:
        p_id = message["class"][-1]
        r_name = message.find('span', {"class":"by"}).text
        
        player_id_ref.append(p_id)
        rollers.append(r_name)
        
id_to_name = pd.DataFrame({'player_id':player_id_ref,
                           'roller':rollers}).drop_duplicates().sort_values(by=['player_id'])

id_to_name

Unnamed: 0,player_id,roller
11,player--N_Z4XzpmqSD1Cmb6ESj,Meredith S. (GM):
0,player--Na9_XYpVB_KcdD88kR0,Chuffy:
34,player--Na9_XYpVB_KcdD88kR0,Chuffy Lickwound:
2,player--Na9ao63L8CSB_no01ch,Ryan N.:
6,player--Na9ao63L8CSB_no01ch,POOG:
89,player--Na9ao63L8CSB_no01ch,Poog of Zarongel:
1,player--Na9au2P-jbgj7brQ7hx,Mogmurch:
3,player--Na9avp1nLcwR0AVnI-f,Sasha S.:
31,player--Na9avp1nLcwR0AVnI-f,Reta Bigbad:


In [127]:
n=1

#print(pd.to_datetime(roll_messages[n].find(class_="tstamp").text)) #timestamp
print(roll_messages[n].find(class_="tstamp")) #alt timestamp
print(roll_messages[n]["class"][-1]) #playerid
print(roll_messages[n].find('span', {'class':"by"})) # roller
print([div["class"][1] for div in roll_messages[n].find_all('div', class_="diceroll")]) #die_type
print([div.text for div in roll_messages[n].find_all("div","didroll")]) #roll result

None
player--Na9_XYpVB_KcdD88kR0
None
['d20', 'd20', 'd20']
['17', '7', '14']


In [131]:
test_ts = []
for n in range(10):
    ts = roll_messages[n].find(class_="tstamp")
    print(ts)
    print(roll_messages[n].find('span', {'class':"by"}))
    print(roll_messages[n]["class"][-1])
    print([div["class"][1] for div in roll_messages[n].find_all('div', class_="diceroll")]) #die_type
    print([div.text for div in roll_messages[n].find_all("div","didroll")]) #roll result
    print('_______________')
    if ts is not None:
        test_ts.append(pd.to_datetime(ts.text))
    else:
        re_ts = test_ts[-1]
        test_ts.append(re_ts)
test_ts

# not deleting this because some of it is useful
# BUT
# if you just look at roll messages and the player typed a general message before the roll,
# you wont be able to get the timestamp or the player name from the roll message
# but frankly, I didn't really care about the timestamps anyway

<span aria-hidden="true" class="tstamp">July 24, 2023 7:14PM</span>
<span class="by">Chuffy:</span>
player--Na9_XYpVB_KcdD88kR0
['d20']
['19']
_______________
None
None
player--Na9_XYpVB_KcdD88kR0
['d20', 'd20', 'd20']
['17', '7', '14']
_______________
None
None
player--Na9_XYpVB_KcdD88kR0
['d20', 'd20', 'd20']
['10', '19', '19']
_______________
<span aria-hidden="true" class="tstamp">July 24, 2023 7:39PM</span>
<span class="by">Mogmurch:</span>
player--Na9au2P-jbgj7brQ7hx
['d20']
['8']
_______________
<span aria-hidden="true" class="tstamp">July 24, 2023 7:39PM</span>
<span class="by">Ryan N.:</span>
player--Na9ao63L8CSB_no01ch
['d20']
['3']
_______________
<span aria-hidden="true" class="tstamp">July 24, 2023 7:39PM</span>
<span class="by">Sasha S.:</span>
player--Na9avp1nLcwR0AVnI-f
['d20']
['2']
_______________
<span aria-hidden="true" class="tstamp">July 24, 2023 7:40PM</span>
<span class="by">Chuffy:</span>
player--Na9_XYpVB_KcdD88kR0
['d20']
['2']
_______________
None
None
playe

[Timestamp('2023-07-24 19:14:00'),
 Timestamp('2023-07-24 19:14:00'),
 Timestamp('2023-07-24 19:14:00'),
 Timestamp('2023-07-24 19:39:00'),
 Timestamp('2023-07-24 19:39:00'),
 Timestamp('2023-07-24 19:39:00'),
 Timestamp('2023-07-24 19:40:00'),
 Timestamp('2023-07-24 19:40:00'),
 Timestamp('2023-07-24 19:43:00'),
 Timestamp('2023-07-24 19:43:00')]

In [None]:
timestamps = []
player_ids = [] #roll20 unique id
roller = [] #might be player or character name
die_type = []
roll_result = []

for message in roll_messages: