# IRC spelunking

Recovering images from several Noncesense Research Lab projecs that were posted to IRC back in 2019ish and not backed up elsewhere at the time

Mitchell Krawiec-Thayer (@Isthmus), 2022-02

## Import libraries

In [1]:
import isthmuslib
from isthmuslib import pd, tqdm, time, pathlib, Dict, List, Tuple, Any  # (standard libraries)
import requests

## Set the data path and tokens

In [2]:
data_path: pathlib.Path = pathlib.Path.cwd() / 'data' / 'version_controlled' / 'mrl_freenode.txt'
output_directory: pathlib.Path = pathlib.Path.cwd() / 'data' / 'version_controlled' / 'recovered_files'

## Read in the data

In [3]:
with open(data_path, 'r') as f:
    text: str = f.read()
text[0:500]

"\ufeff#monero-research-lab\n[2018-06-01 15:48:07] → Joined channel #monero-research-lab\n[2018-06-01 15:48:07] * Channel mode is +cnt\n[2018-06-01 15:48:07] * Channel timestamp is 1499915714\n[2018-06-01 15:50:53] ← testtestlemon left (~m@209.58.129.97): \n[2018-06-01 16:30:11] → testtestlemon1 joined (~m@70.39.105.3)\n[2018-06-01 16:44:39] <UkoeHB> is transaction fee 8 bytes?\n[2018-06-01 16:46:36] <moneromooo> It is a 64 bit value. It is typically encoded as a varint, if that's what you're asking.\n[2018-0"

## Parse the logs into a dataframe

In [4]:
df: pd.DataFrame = isthmuslib.extract_text_to_dataframe(input_string=text,
    record_delimiter= '] <',
    tokens_dictionary={'raw': ((left:='https://usercontent.irccloud-cdn.com'), '\n')}
    )

  0%|          | 0/87973 [00:00<?, ?it/s]

Reconstruct the full URL, and take a peek

In [5]:
df['url'] = left + df['raw']

## Download all the images

In [6]:
for url in tqdm(left + df['raw']):
    print(f"\nDownloading: {(last_part:=url.split('/')[-1])}")
    r: requests.models.Response = requests.get(str(url), allow_redirects=True)
    open(f"{output_directory / (str(time.time()).split('.')[0])}_{last_part}", 'wb').write(r.content)
    time.sleep(3)

  0%|          | 0/189 [00:00<?, ?it/s]


Downloading: main_chain_block_time

Downloading: Figure_1.png

Downloading: Figure_1-1.png

Downloading:  Looking at the wait times for the block above and below the Merlin blocks themselves, we see that it is skewed toward a longer interval afterward, suggesting that they are actually being retroactively timestamped.

Downloading: merlin_parent_child

Downloading: Oh-No-Thank-You.jpg

Downloading: hashvault.txt

Downloading: IMG_1180.PNG

Downloading: 20181009_214245.jpg

Downloading: 20181009_214304.jpg

Downloading: isthissapling.jpg

Downloading: Screen%20Shot%202018-10-29%20at%2010.11.42.png

Downloading: image.png

Downloading: 20181105_002023.jpg

Downloading: image.png

Downloading: image.png

Downloading: image.png

Downloading: image.png

Downloading: image.png

Downloading: image.png

Downloading: image.png

Downloading: s.png

Downloading: block%20weight%20v%20block%20height.png

Downloading: block%20weight%20by%20block%20height%20-%20articmine.png

Downloading: block%20we