# Exercise 5 - Automatically find Exchange links for DWM Query

In this demo we will see how to combine DWM and Iknaio to automatically find connections to exchanges given a set of crypto addresses mentioned in some genre of darkweb sites. Our topic today is CP.

## Preparations

First, we install the graphsense-python package and define an API-key. An API-key for the [GraphSense](https://graphsense.github.io/) instance hosted by [Iknaio](https://www.ikna.io/) can be requested by sending an email to [contact@iknaio.com](contact@iknaio.com).

In [3]:
!pip install graphsense-python seaborn tqdm json-api-doc openpyxl

import graphsense
from graphsense.api import bulk_api, general_api

import json
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime

# Request the HTML for this web page:
# response = requests.get("https://stackoverflow.com/questions/31126596/saving-response-from-requests-to-file")
# with open("dwm.py", "w") as f:
#     f.write(response.text)

import dwm
import gs

def ts_to_pds(ts):
    return datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
# Expected structure of the file
# {
#     "gs-api-key" : "",
#     "dwm-credentials" : {"username": "somename@somedomain.io", "password": ""}
# }
with open("secrets.json") as f:
    secrets = json.load(f)

# We only work with BTC in this example
CURRENCY = 'btc'

# 1. Load Starting Addresses from DWM

In [3]:
# Request authentication token
headers = dwm.authenticate_api(secrets['dwm-credentials'])

## Load Domains

In [4]:
# Collect domains related to title
title = "Alice with violence CP"

df_domains_all = dwm.get_domains_by_title(title, headers)
df_domains_all

Processed 1 out of 3 pages
Processed 2 out of 3 pages
Processed 3 out of 3 pages


Unnamed: 0,type,id,domain_url,title,status,uptime,page_count,clearnet_cohost_count,darknet_cohost_count,inbound_count,outbound_count,discovered_at
0,torv3,15979412,http://x5w2vdx4lmvha27xjgnnnceudiqd6f3gjuegadu...,Alice with violence CP,online,93,4,0,0,2,0,2024-10-29T20:28:53.000Z
1,torv3,15979413,http://x5cj2bvcxngjohqi7hpkf67fqbqg7wkptcqa2sa...,Alice with violence CP,online,96,4,0,0,2,0,2024-10-29T20:28:53.000Z
2,torv3,15979414,http://vvniruuxyborklcc3i7s5mlerjuysw2rwrd6svr...,Alice with violence CP,online,94,4,0,0,1,0,2024-10-29T20:28:53.000Z
3,torv3,15979415,http://bv34z4lb4mr7djs7y7y62db6pocnmwoa7suxs4o...,Alice with violence CP,online,98,4,0,0,1,0,2024-10-29T20:28:53.000Z
4,torv3,15979416,http://bzuk5hv4r2z3n3asimysuxzwctm75eq3fzcd2ah...,Alice with violence CP,online,99,4,0,0,1,0,2024-10-29T20:28:53.000Z
...,...,...,...,...,...,...,...,...,...,...,...,...
2378,torv2,213940,http://ai4gvgc3syetwn4q.onion,Alice with violence CP,offline,83,4,0,0,4,0,2020-03-31T19:20:52.000Z
2379,torv2,213884,http://4cw2nl4jpeaekp2x.onion,Alice with violence CP,offline,83,4,0,0,4,0,2020-03-31T19:19:16.000Z
2380,torv3,206429,http://c5u4kpqwzbns7ikojebppox22mic44ewokk2mxl...,Alice with violence CP,offline,79,3,0,0,34,0,2020-03-13T01:57:51.000Z
2381,torv2,171377,http://yt33fue5lk4j7bks.onion,Alice with violence CP,offline,5,3,0,0,0,0,2020-02-04T09:18:26.000Z


### Only Keep Online Domains

In [14]:
# only keep online domains
df_domains = df_domains_all.query("status=='online'")
nr_domains = len(df_domains)
print(f"We have found {nr_domains} online domains with title: {title}")

We have found 590 online domains with title: Alice with violence CP


## Get Crypto Addresses on the Domains

In [6]:
df_cryptos_all = dwm.get_crypto_addresses_for_domains(df_domains, headers)

Processing domains: 100%|██████████| 590/590 [05:17<00:00,  1.86it/s]


### Only Keep BTC Addresses

In [17]:
df_cryptos = df_cryptos_all.query(f"type=='{CURRENCY.upper()}'")
unique_addresses = len(df_cryptos["address"].unique())
print(f"We have found {len(df_cryptos)} addresses on these domains {unique_addresses} of which are unique")

We have found 8483 addresses on these domains 6226 of which are unique


In [49]:
# save output in an excel file
with pd.ExcelWriter("alice_dwm.xlsx") as writer:
    df_domains.to_excel(writer, sheet_name="Domains", index=False)
    df_cryptos.to_excel(writer, sheet_name="Crypto-Assets", index=False)

# Save unique addresses in a CSV file
df_cryptos[["address"]].drop_duplicates(subset=["address"]).to_csv("addresses.csv")

# 2. Finding Exchanges with Iknaio

In [5]:
configuration = graphsense.Configuration(
    host = "https://api.ikna.io/",
    api_key = {
        'api_key': secrets["gs-api-key"]
    }
)

We can test whether or client works, by checking what data the GraphSense endpoint provides

In [6]:
with graphsense.ApiClient(configuration) as api_client:
    api_instance = general_api.GeneralApi(api_client)
    api_response = api_instance.get_statistics()
    display({x['name']:x['no_blocks'] for x in api_response['currencies']})

{'btc': 879070,
 'bch': 880832,
 'ltc': 2826502,
 'zec': 2784742,
 'eth': 21615371,
 'trx': 68632363}

# Q1. How many of the addresses are used?

Instead of querying each address individually, we just pass the dataframe of the known addresses.

In [8]:
seed_addresses = pd.read_csv("addresses.csv")

with graphsense.ApiClient(configuration) as clnt:
    blkapi = bulk_api.BulkApi(clnt)

    # documentation about available bulk operations can be found
    # here https://api.ikna.io/#/bulk/bulk_csv
    rcsv = blkapi.bulk_csv(
                CURRENCY,
                operation="get_address",
                body={
                    'address': seed_addresses['address'].to_list()
                },
                num_pages=1,
                _preload_content=False
              )
    respAddrDF = pd.read_csv(rcsv)

used_addresses = respAddrDF[["address", "balance_eur", "total_received_eur", "total_spent_eur", "in_degree", "out_degree", "no_incoming_txs", "no_outgoing_txs", "first_tx_timestamp", "last_tx_timestamp", "entity"]].dropna()
used_addresses.head(5)
used_addresses[["address"]].to_csv("used_addresses.csv")

In [18]:
print(f"{len(used_addresses)} addresses received {sum(used_addresses['total_received_eur']):.2f} EUR, Balance {sum(used_addresses['balance_eur']):.2f} EUR")
print(f"Activity period of the addresses was: {ts_to_pds(min(used_addresses['first_tx_timestamp']))} to {ts_to_pds(max(used_addresses['last_tx_timestamp']))}")

53 addresses received 10044.19 EUR, Balance 546.49 EUR
Activity period of the addresses was: 2020-10-26 06:29:13 to 2025-01-04 22:50:09


# Q2: Are there direct links to exchanges?

In [19]:
with graphsense.ApiClient(configuration) as clnt:
    blkapi = bulk_api.BulkApi(clnt)

    # documentation about available bulk operations can be found
    # here https://api.ikna.io/#/bulk/bulk_csv
    rcsv = blkapi.bulk_csv(
                CURRENCY,
                operation="list_address_neighbors",
                body={
                    'address': used_addresses['address'].to_list(),
                    'direction': 'out',
                    'include_labels': True
                },
                num_pages=1,
                _preload_content=False
              )
    respAddrNbrDF = pd.read_csv(rcsv)

with_label = respAddrNbrDF.query("labels.notnull()")

with_outgoing = respAddrNbrDF.query("_info != 'no data'")

print(f"We have found {len(with_outgoing)} outgoing neighbors, {len(with_label)} are known")

We have found 126 outgoing neighbors, 0 are known


# Q3: Can I find links to exchange via Clusters?

We now fetch summary statistics for each entity.

In [30]:
with graphsense.ApiClient(configuration) as clnt:
  blkapi = bulk_api.BulkApi(api_client)
  rcsv = blkapi.bulk_csv(
                                 CURRENCY,
                                 operation = "get_entity",
                                 body={
                                     'entity': used_addresses['entity'].drop_duplicates().to_list(),
                                     "exclude_best_address_tag": True
                                     },
                                 num_pages=1,
                                 _preload_content=False
                                 )
  respEntityDF = pd.read_csv(rcsv)

clusters = respEntityDF[
    ["best_address_tag_label",
     "root_address",
     "no_addresses",
     "best_address_tag_label",
     "balance_eur",
     "total_received_eur",
     "total_spent_eur",
     "first_tx_timestamp",
     "last_tx_timestamp"]
     ]

print(f"{sum(clusters['no_addresses'])-len(used_addresses)} new addresses have been found. In {len(clusters)} clusters. They received {sum(clusters['total_received_eur']):.2f} EUR, Balance {sum(clusters['balance_eur']):.2f} EUR")
print(f"Activity period of the cluster addresses were: {ts_to_pds(min(clusters['first_tx_timestamp']))} to {ts_to_pds(max(clusters['last_tx_timestamp']))}")
clusters.query("best_address_tag_label.notnull()")

4640 new addresses have been found. In 32 clusters. They received 543251.10 EUR, Balance 3370.23 EUR
Activity period of the cluster addresses were: 2014-08-09 23:48:57 to 2025-01-12 13:50:05


Unnamed: 0,best_address_tag_label,root_address,no_addresses,best_address_tag_label.1,balance_eur,total_received_eur,total_spent_eur,first_tx_timestamp,last_tx_timestamp


In [1]:
with graphsense.ApiClient(configuration) as clnt:
    blkapi = bulk_api.BulkApi(clnt)

    # documentation about available bulk operations can be found
    # here https://api.ikna.io/#/bulk/bulk_csv
    rcsv = blkapi.bulk_csv(
                CURRENCY,
                operation="list_entity_neighbors",
                body={
                    'entity': used_addresses['entity'].drop_duplicates().to_list(),
                    'direction': 'out',
                    'include_labels': True
                },
                num_pages=1,
                _preload_content=False
              )
    respAddrDF = pd.read_csv(rcsv)

with_label = respAddrDF.query("labels.notnull()")

with_outgoing = respAddrDF.query("_info != 'no data'")

print(f"We have found {len(with_outgoing)} outgoing neighbors, {len(with_label)} are known")
with_label[["_request_entity","entity_root_address", "labels"]]

NameError: name 'graphsense' is not defined

In [34]:
with_outgoing.columns

Index(['_error', '_info', '_request_entity', 'entity_actors',
       'entity_balance_eur', 'entity_balance_usd', 'entity_balance_value',
       'entity_best_address_tag_abuse', 'entity_best_address_tag_actor',
       'entity_best_address_tag_address', 'entity_best_address_tag_category',
       'entity_best_address_tag_concepts',
       'entity_best_address_tag_concepts_count',
       'entity_best_address_tag_confidence',
       'entity_best_address_tag_confidence_level',
       'entity_best_address_tag_currency', 'entity_best_address_tag_entity',
       'entity_best_address_tag_inherited_from',
       'entity_best_address_tag_is_cluster_definer',
       'entity_best_address_tag_label', 'entity_best_address_tag_lastmod',
       'entity_best_address_tag_source', 'entity_best_address_tag_tag_type',
       'entity_best_address_tag_tagpack_creator',
       'entity_best_address_tag_tagpack_is_public',
       'entity_best_address_tag_tagpack_title',
       'entity_best_address_tag_tagpack_uri

## Get Cluster Addresses

In [28]:
with graphsense.ApiClient(configuration) as clnt:
  blkapi = bulk_api.BulkApi(api_client)
  rcsv = blkapi.bulk_csv(
                                 CURRENCY,
                                 operation = "list_entity_addresses",
                                 body={
                                     'entity': used_addresses['entity'].drop_duplicates().to_list()
                                     },
                                 num_pages=1,
                                 _preload_content=False
                                 )
  addresses_cluster = pd.read_csv(rcsv)
addresses_cluster

Unnamed: 0,_error,_info,_request_entity,actors,address,balance_eur,balance_usd,balance_value,currency,entity,...,status,token_balances,total_received_eur,total_received_usd,total_received_value,total_spent_eur,total_spent_usd,total_spent_value,total_tokens_received,total_tokens_spent
0,,,1.353672e+09,,3NV97tRYA74egEFfMen9JD7JzbM5U8hjYc,30.34,31.26,33000,btc,1353671569,...,clean,,31.86,33.45,33000,0.00,0.00,0,,
1,,,1.329882e+09,,3E2YaXkyp3uxSRVLqwx81mveAowHGEWSQw,53.33,54.95,58000,btc,1329881738,...,clean,,31.87,35.18,58000,0.00,0.00,0,,
2,,,1.348621e+09,,3Ettx8jZGs1AK7WLs48sSEcjX1nz5EoeLt,33.10,34.11,36000,btc,1348621007,...,clean,,33.07,34.74,36000,0.00,0.00,0,,
3,,,1.346625e+09,,3BRfy6BnyFb5XEdzCGrhjbqwMK71egoRDu,0.94,0.96,1017,btc,1346624817,...,clean,,0.95,1.00,1017,0.00,0.00,0,,
4,,,1.357103e+09,,3BHqn4f41ee6Rhb7mXTsoL65BmyBsqjcRL,34.07,35.10,37052,btc,1357103221,...,clean,,35.18,36.57,37052,0.00,0.00,0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4314,,,1.303348e+09,,bc1q4q85hvdgk5w7dh0qtgqyh8dzykzykmvug7ccuq,0.00,0.00,0,btc,1303348096,...,clean,,31.80,33.48,34385,31.63,33.23,34385,,
4315,,,1.303348e+09,,bc1q9fvep89ae69plwwx8vrtmsyp62v50e387r75k6,0.00,0.00,0,btc,1303348096,...,clean,,81.10,85.21,88162,81.10,85.21,88162,,
4316,,,1.303348e+09,,bc1q7zx7qhgltprxjhkczqkjd2pqc79azrns3fc3ar,0.00,0.00,0,btc,1303348096,...,clean,,31.23,32.81,33947,31.23,32.81,33947,,
4317,,,1.303348e+09,,bc1q6wcqacy2ferk9y8c3futrhxk0q23gyr0agq36r,0.00,0.00,0,btc,1303348096,...,clean,,35.91,37.73,39040,35.91,37.73,39040,,


In [29]:
addresses_cluster

Unnamed: 0,_error,_info,_request_entity,actors,address,balance_eur,balance_usd,balance_value,currency,entity,...,status,token_balances,total_received_eur,total_received_usd,total_received_value,total_spent_eur,total_spent_usd,total_spent_value,total_tokens_received,total_tokens_spent
0,,,1.353672e+09,,3NV97tRYA74egEFfMen9JD7JzbM5U8hjYc,30.34,31.26,33000,btc,1353671569,...,clean,,31.86,33.45,33000,0.00,0.00,0,,
1,,,1.329882e+09,,3E2YaXkyp3uxSRVLqwx81mveAowHGEWSQw,53.33,54.95,58000,btc,1329881738,...,clean,,31.87,35.18,58000,0.00,0.00,0,,
2,,,1.348621e+09,,3Ettx8jZGs1AK7WLs48sSEcjX1nz5EoeLt,33.10,34.11,36000,btc,1348621007,...,clean,,33.07,34.74,36000,0.00,0.00,0,,
3,,,1.346625e+09,,3BRfy6BnyFb5XEdzCGrhjbqwMK71egoRDu,0.94,0.96,1017,btc,1346624817,...,clean,,0.95,1.00,1017,0.00,0.00,0,,
4,,,1.357103e+09,,3BHqn4f41ee6Rhb7mXTsoL65BmyBsqjcRL,34.07,35.10,37052,btc,1357103221,...,clean,,35.18,36.57,37052,0.00,0.00,0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4314,,,1.303348e+09,,bc1q4q85hvdgk5w7dh0qtgqyh8dzykzykmvug7ccuq,0.00,0.00,0,btc,1303348096,...,clean,,31.80,33.48,34385,31.63,33.23,34385,,
4315,,,1.303348e+09,,bc1q9fvep89ae69plwwx8vrtmsyp62v50e387r75k6,0.00,0.00,0,btc,1303348096,...,clean,,81.10,85.21,88162,81.10,85.21,88162,,
4316,,,1.303348e+09,,bc1q7zx7qhgltprxjhkczqkjd2pqc79azrns3fc3ar,0.00,0.00,0,btc,1303348096,...,clean,,31.23,32.81,33947,31.23,32.81,33947,,
4317,,,1.303348e+09,,bc1q6wcqacy2ferk9y8c3futrhxk0q23gyr0agq36r,0.00,0.00,0,btc,1303348096,...,clean,,35.91,37.73,39040,35.91,37.73,39040,,


# Q4: What if I look at multiple hops (using QL)? Are there any exchanges?

In [9]:
addresses_used_list = used_addresses["address"].to_list()

x = gs.get_QL_results_many(addresses_used_list, CURRENCY, {"Authorization": secrets["gs-api-key"]})

with open("traces.json","w") as f:
    json.dump(x, f)

[y for y in x if y['nr_pathes_found'] > 0]

Searching centralized exchange connections for addresses: 100%|██████████| 53/53 [02:03<00:00,  2.33s/it]


[]

In [16]:
addresses_used_list

['3Nccrr73vWyHuD5Vc8H1xMRYB8YQG2Ddn5',
 '3JtFjtp3Ue7n7WQDatJp4q47aNnUvU9wDY',
 '3GDXFck1vspFmXrVEDGNeSpSP3imB5FJBq',
 '3E1p3nmMKVEfphVDszy6DyuWc7YM6paBQS',
 '3CmTgpAcgGU8XwzX7Sqowrj2AA2chLajCK',
 '33DtDzEKB9Syr9zfeP3CmhLas5Lnmbwjbn',
 '3DKx1zDQw3F7zz5Gwo7G9qbrikSmWjvqHP',
 '3JVCDJvFKNHHK6uweTAP6gsTiN2bFsLCXk',
 '1QPLoDFimKyBUc263uHxqEw3i17e1Cic2',
 '3Lz4F4VfgLZkxaz3dx6U7NsvyK2ASiPC2m',
 '3Ettx8jZGs1AK7WLs48sSEcjX1nz5EoeLt',
 '353oksE8cRvh1vJJd27p5HiaZ1VcGJyoCz',
 '32LQ3Czqe1xJCv2DpXLrgU7ZJq7Bqt6Uce',
 '1KCbzQFC9K8AiVVWvyfCnsdDUwf2z1Bh1R',
 '35y6CbGGe2UgtMeTtHYLk8QeDyWzmFNeCi',
 '3EC9u8Jz9RzFAfinDBttDjfHnNiq3c8cPi',
 '39Z9THdYa9fmqLfK1ooxnFf7vbpGviNjDe',
 '3MnjbkxCdkTYgZD3BkWbC8JA38sGS374To',
 '1EPqUMJHavCWg94k3wu5UGd8533vyogExP',
 '3DphXPUBMaBArrNfaXKLLXBDZbopSJzsQj',
 '3JTyjTbGArVvziGewvNRjBfg7PeMzMwQvb',
 '37MVTJ315tNA1fcEUiGDBkrkyvPLxyrqsT',
 '3DDQPXjcGjqWZZMh4EqkskWuGxNxjyUMrU',
 '115x1X24is6YpoykBEeh5SEwMTf94wSRhq',
 '38ssYetNyyfGDpquhh8Q5u8febjnHW65Gq',
 '3FnReq8YbjoZkttwFjoQvque

In [15]:
[y for y in x if y['nr_pathes_found'] > 0]

[]