# Account Labels Data Scraper

**[Johnnatan Messias](https://johnnatan-messias.github.io/), March 2025**

This code gathers labels from known sorces that we can use to identify the owner of the public wallet addresses.


In [1]:
import os
import re
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs

In [2]:
# Set directory paths
data_dir = os.path.abspath(
    os.path.join(os.getcwd(), "..", "data", "labels")) + os.sep

# Create directories if they don't exist
os.makedirs(data_dir, exist_ok=True)

## Etherscan Labels

[Etherscan](https://etherscan.io) is a popular and reliable blockchain explorer.
The Etherscan labels were obtained from the project [Etherscan Labels](https://github.com/brianleect/etherscan-labels) by [brianleect](https://github.com/brianleect).


In [3]:
etherscan_labels_url = "https://raw.githubusercontent.com/johnnatan-messias/etherscan-labels/main/data/etherscan/combined/combinedAllLabels.json"
etherscan_labels_df = pd.read_json(etherscan_labels_url, orient='index')
etherscan_labels_df = etherscan_labels_df.drop(
    columns=['labels']).reset_index()

etherscan_labels_df.columns = ['address', 'label']
etherscan_labels_df['address'] = etherscan_labels_df['address'].str.lower()
etherscan_labels_df = etherscan_labels_df.query('label != ""')
print("There are {} labels in the dataset.".format(len(etherscan_labels_df)))
etherscan_labels_df.head()

Unnamed: 0,address,label
0,0x22ff777ef6fe0690f1f74c6758126909653ad56a,Balancer: MLN/ETH 90/10 #2
2,0x689c56aef474df92d44a1b70850f808488f9769c,KuCoin 2
3,0xbd2f0cd039e0bfcf88901c98c0bfac5ab27566e3,Dynamic Set Dollar
4,0x2444973040210cccaad6010c64763d1abe06d8d0,All.Me: Token Sale
5,0xa3ccbd8d6da6fd3e38e3c9b67908ed4400129685,SushiSwap: sUSD-yCREDIT


In [4]:
# Persisting the data into a compressed CSV file
file_dir = os.path.join(data_dir, "etherscan_account_labels.csv.gz")
etherscan_labels_df.to_csv(file_dir, index=False, compression='gzip')

## Sybil List

This is maintained by [Uniswap Labs](https://uniswap.org/) and allows anyone to prove their identity by signing a message with their Ethereum address. The signed message is then posted on Twitter to serve as a match of the wallet address and the Twitter account of that user. Once verified, the user can be added to the [Sybil-List](https://github.com/Uniswap/sybil-list) project.


In [5]:
sybil_list_url = "https://raw.githubusercontent.com/Uniswap/sybil-list/master/verified.json"
json_data = requests.get(sybil_list_url).json()
sybil_data = list(filter(lambda item: 'twitter' in item[1], json_data.items()))
sybil_data_other = list(
    filter(lambda item: 'other' in item[1], json_data.items()))

sybil_data = list(map(lambda item: {
    "address": item[0].lower(),
    "label": item[1]['twitter']['handle'],
    "verified_at": item[1]['twitter']['timestamp']
}, sybil_data))

sybil_data += list(map(lambda item: {
    "address": item[0].lower(),
    "label": item[1]['other']['name'],
    "verified_at": None
}, sybil_data_other))

sybil_list_df = pd.DataFrame(sybil_data)
sybil_list_df['verified_at'] = pd.to_datetime(
    sybil_list_df['verified_at'], unit='ms').dt.date

print("There are {} labels in the dataset.".format(len(sybil_list_df)))

sybil_list_df.head()

Unnamed: 0,address,label,verified_at
0,0x8d07d225a769b7af3a923481e1fdf49180e6a265,MonetSupply,2020-12-15
1,0x4306d8e8ac2a9c893ac1cd137a0cd6966fa6b6ff,pmriviere,2020-12-15
2,0x88fb3d509fc49b515bfeb04e23f53ba339563981,rleshner,2020-12-15
3,0x965b813b302dfccdf6c2f676d59d7d3c960d3582,nick_emmons,2020-12-15
4,0x565b93a15d38acd79c120b15432d21e21ed274d6,Flynnjamm,2020-12-15


In [6]:
# Persisting the data into a compressed CSV file
file_dir = os.path.join(data_dir, "sybil_list_account_labels.csv.gz")
sybil_list_df.to_csv(file_dir, index=False, compression='gzip')

## Top 1500 Delegate Addresses Verified by Tally


### Tally Compound


In [7]:
def load_tally_delegates(file_dir):
    bs_data = bs(open(file_dir), 'html.parser')
    delegate_elements = bs_data.find_all(
        'a', class_='chakra-link chakra-stack no-underline css-l6tukm')
    # Iterate over each delegate element and extract the required fields
    delegates = []

    for element in delegate_elements:
        label = element.find(
            'span', class_='css-1baulvz').text if element.find('span', class_='css-1baulvz') else None
        address = element['href'].split('/')[-1]
        voting_power = element.find('p', class_='chakra-text css-6o3z7p').text if element.find(
            'p', class_='chakra-text css-6o3z7p') else None
        trusted_by = element.find('p', class_='chakra-text css-x44qgv').text if element.find(
            'p', class_='chakra-text css-x44qgv') else None

        delegate_info = {
            'label': label,
            'address': address.lower(),
            'votin_power': voting_power,
            'trusted_by':  int(re.sub('[^0-9\.]', '', trusted_by))
        }

        delegates.append(delegate_info)

    # Converting the list to a Pandas dataframe
    tally_labels_df = pd.DataFrame(delegates)
    return tally_labels_df

  'trusted_by':  int(re.sub('[^0-9\.]', '', trusted_by))


In [8]:
file_dir = os.path.join(data_dir, "Tally _ Compound _ Delegates.html")
tally_labels_compound_df = load_tally_delegates(file_dir)
print("There are {} delegates in the Tally protocol".format(
    tally_labels_compound_df.shape[0])
)

tally_labels_compound_df.head(15)

There are 1488 delegates in the Tally protocol


Unnamed: 0,label,address,votin_power,trusted_by
0,a16z,0x9aa835bc7b8ce13b9b0c9764a52fbf71ac62ccf1,361.01K,333
1,0x7E95...1318,0x7e959eab54932f5cfd10239160a7fd6474171318,170K,3
2,Geoffrey Hayes,0x8169522c2c57883e8ef80c498aab7820da539806,101.01K,25
3,Gauntlet,0x683a4f9915d6216f73d6df50151725036bd26c02,90.07K,31
4,bryancolligan,0x2210dc066aacb03c9676c4f1b36084af14ccd02e,85.63K,7
5,MonetSupply,0x8d07d225a769b7af3a923481e1fdf49180e6a265,85K,39
6,blck,0x54a37d93e57c5da659f508069cf65a381b61e189,80.09K,78
7,Wintermute Governanc…,0xb933aee47c438f22de0747d57fc239fe37878dd1,80K,60
8,Franklin DAO,0x070341aa5ed571f0fb2c4a5641409b1a46b4961b,70K,11
9,0x7d1a...b27f,0x7d1a02c0ebcf06e1a36231a54951e061673ab27f,65K,5


In [9]:
mask = tally_labels_compound_df.apply(
    lambda x: '...' not in x['label'], axis=1)
tally_labels_compound_df = tally_labels_compound_df[mask].reset_index()
print("There are {} delegates in the Tally protocol after filtering out the '...' names".format(
    tally_labels_compound_df.shape[0])
)
tally_labels_compound_df.head()

There are 774 delegates in the Tally protocol after filtering out the '...' names


Unnamed: 0,index,label,address,votin_power,trusted_by
0,0,a16z,0x9aa835bc7b8ce13b9b0c9764a52fbf71ac62ccf1,361.01K,333
1,2,Geoffrey Hayes,0x8169522c2c57883e8ef80c498aab7820da539806,101.01K,25
2,3,Gauntlet,0x683a4f9915d6216f73d6df50151725036bd26c02,90.07K,31
3,4,bryancolligan,0x2210dc066aacb03c9676c4f1b36084af14ccd02e,85.63K,7
4,5,MonetSupply,0x8d07d225a769b7af3a923481e1fdf49180e6a265,85K,39


In [10]:
# Persisting the data into a compressed CSV file
file_dir = os.path.join(data_dir, "tally_compound_account_labels.csv.gz")
tally_labels_compound_df.to_csv(file_dir, index=False, compression='gzip')

### Tally Uniswap


In [11]:
file_dir = os.path.join(data_dir, "Tally _ Uniswap _ Delegates.html")
tally_labels_uniswap_df = load_tally_delegates(file_dir)
print("There are {} delegates in the Tally protocol".format(
    tally_labels_uniswap_df.shape[0])
)

tally_labels_uniswap_df.head(15)

There are 1512 delegates in the Tally protocol


Unnamed: 0,label,address,votin_power,trusted_by
0,0x8E4E...a42E,0x8e4ed221fa034245f14205f781e0b13c5bd6a42e,9.01M,22
1,0x5368...3132,0x53689948444cfd03d2ad77266b05e61b8eed3132,9M,5
2,jessewldn,0xe7925d190aea9279400cd9a005e33ceb9389cc2b,8M,59
3,0x1d8F...6452,0x1d8f369f05343f5a642a78bd65ff0da136016452,8M,5
4,0xe024...26bF,0xe02457a1459b6c49469bf658d4fe345c636326bf,7.3M,8
5,0x88E1...4c05,0x88e15721936c6eba757a27e54e7ae84b1ea34c05,7.25M,2
6,Consensys,0x8962285faac45a7cbc75380c484523bb7c32d429,7.03M,62
7,0xcb70...e638,0xcb70d1b61919dae81f5ca620f1e5d37b2241e638,7M,1
8,Robert Leshner,0x88fb3d509fc49b515bfeb04e23f53ba339563981,5.34M,49
9,Gauntlet,0x683a4f9915d6216f73d6df50151725036bd26c02,5.25M,52


In [12]:
mask = tally_labels_uniswap_df.apply(lambda x: '...' not in x['label'], axis=1)
tally_labels_uniswap_df = tally_labels_uniswap_df[mask].reset_index()
print("There are {} delegates in the Tally protocol after filtering out the '...' names".format(
    tally_labels_uniswap_df.shape[0])
)
tally_labels_uniswap_df.head()

There are 530 delegates in the Tally protocol after filtering out the '...' names


Unnamed: 0,index,label,address,votin_power,trusted_by
0,2,jessewldn,0xe7925d190aea9279400cd9a005e33ceb9389cc2b,8M,59
1,6,Consensys,0x8962285faac45a7cbc75380c484523bb7c32d429,7.03M,62
2,8,Robert Leshner,0x88fb3d509fc49b515bfeb04e23f53ba339563981,5.34M,49
3,9,Gauntlet,0x683a4f9915d6216f73d6df50151725036bd26c02,5.25M,52
4,10,hi_Reverie,0xb55a948763e0d386b6defcd8070a522216ae42b1,5.07M,42


In [13]:
# Persisting the data into a compressed CSV file
file_dir = os.path.join(data_dir, "tally_uniswap_account_labels.csv.gz")
tally_labels_uniswap_df.to_csv(file_dir, index=False, compression='gzip')