## Heuristic 3 - Transactions outside TCash

### Overview
The main goal of this heuristic is to link Ethereum accounts which interacted with TCash by inspecting Ethereum transactions outside it. 

This is done constructing two sets, one corresponding to the unique TCash deposit addresses ($S_{D}$) and one to the unique TCash withdraw addresses ($S_{W}$), to then make a query to reveal transactions between addresses of each set.

When a transaction between two of them is found, TCash deposit transactions done by the deposit address are linked to all the TCash withdraw transactions done by the withdraw address. These two sets of linked transactions are filtered, leaving only the ones that make sense. For example, if a deposit address A is linked to a withdraw address B, but A made a deposit to the 1 Eth pool and B made a withdraw to the 10 Eth pool, then this link is not considered. Moreover, when considering a particular link between deposit and withdraw transactions, deposits done posterior to the latest withdraw are removed from the deposit set.

### Data
The query to the public BigQuery database was done like so,

```
INSERT `tornado_cash_transactions.transactions_between_withdraw_and_deposit_addresses` 
SELECT * FROM `bigquery-public-data.crypto_ethereum.transactions`
WHERE 
    (
       (`from_address` IN ( SELECT `from_address` FROM `tornado_cash_transactions.deposit_addresses`))
       AND 
       (`to_address` IN ( SELECT `withdraw_address` FROM `tornado_cash_transactions.withdraw_addresses`))
    )
    OR
    (
       (`from_address` IN (  SELECT `withdraw_address` FROM `tornado_cash_transactions.withdraw_addresses`))
       AND 
       (`to_address` IN ( SELECT `from_address` FROM `tornado_cash_transactions.deposit_addresses`))
    )
```

The resulting table has two columns, **from_address** and **to_address**, and each row corresponds to a transaction done between a TCash deposit address and a TCash withdraw address.
From this table, we want to know which of the two addresses was the one that made the deposit and which one made the withdraw. In this way, we are able to link the corresponding deposit and withdraw transactions.

For example, consider this entry from the resulting table,

| from_address  | to_address  |
|---------------|-------------|
| address1      | address2    |

Suppose that `address1` is an address that deposited in TCash and `address2` one that made a withdraw. Then, all deposits made by `address1` will be linked to all the withdraws made by `address2`, and filters will be applied to the linked deposit and withdraw transactions between them, as already described. 

In short, we want to transform said table to one with colums **deposit_address** and **withdraw_address** like so,

| deposit_address | withdraw_address |
|-----------------|------------------|
| address2        | address1         |

With this new table, it is straightforward to make de linking between TCash transactions.


### Some definitions 
A problem raises when there are addresses that belong to the two sets of TCash addresses $S_{D}$ and $S_{W}$.
Let us define an address of type `D` when it belongs to $S_{D}$ and not to $S_{W}$.
Likewise, an address of type `W` is defined when it belongs to $S_{W}$ and not to $S_{D}$. 
Finally, when an address belongs to both sets, we classify it as a `DW` type.

For the cases where we got outside TCash transactions of type `D -> W` (i.e., a transaction from a *D* type address to a *W* type address) or `W -> D` (i.e., a transaction from a *W* type address to a *D* type address), it is trivial to transform this entry to the new table.

In the cases where we have transactions of type `DW -> W`, `DW -> D`, `W -> DW` and `D -> DW`, it is also straightforward to transform the corresponding entries. For example, consider again this particular entry of Ethereum transactions table,


| from_address  | to_address  |
|---------------|-------------|
| address1      | address2    |


Suppose now that `address1` is of type `DW` and `address2` of type `D`. Then, `address2` is trivially placed in the `deposit_address` column. By a process of elimination, then `address1` is set in `withdraw_address` column,


| deposit_address | withdraw_address |
|-----------------|------------------|
| address2        | address1         |


When we have a transaction of type `DW -> DW`, it cannot be known which address deposited and which one made the withdraw, so the two combinations are considered. Considering again the same entry, the resulting table will be as follows,

| deposit_address | withdraw_address |
|-----------------|------------------|
| address1        | address2         |
| address2        | address1         |


Then, deposits of `address1` are linked to withdraws of `address2` and deposits of `address2` are linked to withdraws of `address1`.


### Results data structure
The results of this heuristic are returned in a dictionary that has named tuples as keys, which consist of a deposit address in the first index and a withdraw address in the second index. The values of this dictionary are another dictionary, which have TCash pools as keys, and another named tuple as values. These named tuples have a list of deposit transaction hashes in the first index, and a list of withdraw hashes in the second index. 

In summary, this data structure returns the linked deposit and withdraw transactions in every pool, for every pair of deposit and withdraw addresses from TCash.

An simplified example of a member of this resulting data structure could be:

```
(deposit_address="0x8879", withdraw_address="0x9c8s") => Dict("1 ETH" => (deposit_hashes=["0x892m", "0x24mk"], withdraw_hashes=["0x57jd"]))
```

Colloquially this means that TCash deposit transactions with hashes `0x892m` and `0x24mk`, done by address `0x8879` are linked to TCash withdraw transaction `0x57jd`, done by address `0x9c8`, and that these linked TCash transactions correspond to the 1 Eth pool.

In [1]:
using CSV
using DataFrames
using ProgressBars

In [34]:
deposit_txs = CSV.read("../data/lighter_complete_deposit_txs.csv", DataFrame)
withdraw_txs = CSV.read("../data/lighter_complete_withdraw_txs.csv", DataFrame)

const unique_deposit_addresses = Set(deposit_txs[!, :from_address])
const unique_withdraw_addresses = Set(withdraw_txs[!, :recipient_address])

outside_tcash_txs = CSV.read("../data/transactions_between_deposit_and_withdraw_addresses.csv", DataFrame)
address_and_withdraw_df = outside_tcash_txs[!, [:from_address, :to_address]];

LoadError: cannot declare unique_deposit_addresses constant; it already has a value

### Data preprocessing

The data obtained by the query is filtered so that 

In [3]:
function unique_and_permuted(address_and_withdraw_df)
    unique_and_permuted_set = Set(Set([]))
    
    for row in eachrow(address_and_withdraw_df)
        push!(unique_and_permuted_set, Set([row.from_address, row.to_address]))
    end
  
    unique_and_permuted_set
end

function dataframe_from_set_of_sets(set_of_sets)
    df = DataFrame(from_address=[], to_address=[])
    for set in set_of_sets
        push!(df, collect(set))
    end
    df
end

function preprocess_data(address_and_withdraw_df)
    set = filter(x -> length(x) == 2, unique_and_permuted(address_and_withdraw_df))
    dataframe_from_set_of_sets(set)
end

preprocess_data (generic function with 1 method)

In [36]:
clean_addresses_df = preprocess_data(address_and_withdraw_df)

Unnamed: 0_level_0,from_address,to_address
Unnamed: 0_level_1,Any,Any
1,0x0c63d55a244657f5606d62856bd9f1ff227c05f2,0x0e54db73f82bd9fde34ebce53ea83bd197e9044c
2,0xd9ee088c6ca2a90d6f0d059af17c2ec2c908bb0f,0xc73ef94bc339a2cb9a1b67820af46bf47484a1ed
3,0xbf7c205febae32f7874b28b9f371fe522e1fd97a,0xe5b5df72187f7d867973615f5e1144b7a95b495f
4,0x35f081bdf4740ffa8a56ff98e4b971fbcb7d82a7,0x09fe8f71f8e14b3d6b6456fbafaaef4a27f042cd
5,0xa8308e994d180ca87c6a784fcb8612dec9ede03d,0x46ba0af6bc60e6fabd9957744c057d031c720ace
6,0xf62e92b2452d8a0fbb2c4b03424d679c86660001,0xf94571dbdff33446dabd17040cd6236b0d2c2545
7,0xce91fddab3c544b59ebac665a7635561043a7def,0x865ec62a7f46aab0976ad22573fcf319c3f939ce
8,0x134b9eab4aa4c1489687c18c10d7338656fde32d,0x68a99f89e475a078645f4bac491360afe255dff1
9,0xcd1690b5ae49b4bd1ac5d201dccb461887a76dcd,0x8a83716acd66d9e1fb18c9b79540b72e04f80ac0
10,0xc77fa6c05b4e472feee7c0f9b20e70c5bf33a99b,0x4e1ce0b96fc37f81f5508c6608687af4f78f23b2


In [39]:
# delete
unique_and_permuted(address_and_withdraw_df);

In [5]:
# function is_D_type(address, unique_deposit_addresses, unique_withdraw_addresses)
function is_D_type(address)
    address ∈ unique_deposit_addresses && address ∉ unique_withdraw_addresses
end

# function is_W_type(address, unique_deposit_addresses, unique_withdraw_addresses)
function is_W_type(address)
    address ∉ unique_deposit_addresses && address ∈ unique_withdraw_addresses
end

# function is_DW_type(address, unique_deposit_addresses, unique_withdraw_addresses)
function is_DW_type(address)
    address ∈ unique_deposit_addresses && address ∈ unique_withdraw_addresses
end

is_DW_type (generic function with 1 method)

In [6]:
function is_D_W_tx(from_address, to_address)
    is_D_type(from_address) && is_W_type(to_address)
end

function is_W_D_tx(from_address, to_address)
    is_W_type(from_address) && is_D_type(to_address)
end

function is_D_DW_tx(from_address, to_address)
    is_D_type(from_address) && is_DW_type(to_address)
end

function is_DW_D_tx(from_address, to_address)
    is_DW_type(from_address) && is_D_type(to_address)
end

function is_W_DW_tx(from_address, to_address)
    is_W_type(from_address) && is_DW_type(to_address)
end

function is_DW_W_tx(from_address, to_address)
    is_DW_type(from_address) && is_W_type(to_address)
end

function is_DW_DW_tx(from_address, to_address)
    is_DW_type(from_address) && is_DW_type(to_address)
end 

is_DW_DW_tx (generic function with 1 method)

In [7]:
function create_deposit_and_withdraw_df(address_and_withdraw_df, unique_deposit_addresses, unique_withdraw_addresses)
    
    # D | W
    deposit_and_withdraw_matrix = Matrix{String}
    
    for row in ProgressBar(eachrow(address_and_withdraw_df), printing_delay=0.1)
        if is_D_W_tx(row.from_address, row.to_address) || is_D_DW_tx(row.from_address, row.to_address) || is_DW_W_tx(row.from_address, row.to_address)
            deposit_and_withdraw_matrix = vcat(deposit_and_withdraw_matrix, [row.from_address row.to_address])
            
        elseif is_W_D_tx(row.from_address, row.to_address) || is_W_DW_tx(row.from_address, row.to_address) || is_DW_D_tx(row.from_address, row.to_address)
            deposit_and_withdraw_matrix = vcat(deposit_and_withdraw_matrix, [row.to_address row.from_address])
            
        elseif is_DW_DW_tx(row.from_address, row.to_address)
            deposit_and_withdraw_matrix = vcat(deposit_and_withdraw_matrix, [row.from_address row.to_address; row.to_address row.from_address])
            
        else
            throw("The transaction is not from any of the types: D_W, W_D, D_DW, DW_D, W_DW, DW_W, DW_DW")
            
        end
    end
    
    DataFrame(deposit_and_withdraw_matrix, [:deposit_address, :withdraw_address])
end

create_deposit_and_withdraw_df (generic function with 1 method)

In [8]:
D_W_df = create_deposit_and_withdraw_df(clean_addresses_df, unique_deposit_addresses, unique_withdraw_addresses)[2:end,:];

0.0%┣                                          ┫ 0/11.2k [00:00<-18:-43, -0s/it]
0.0%┣                                        ┫ 1/11.2k [00:01<Inf:Inf, InfGs/it]
0.0%┣                                          ┫ 2/11.2k [00:01<02:48:06, 1it/s]
0.0%┣                                          ┫ 3/11.2k [00:01<01:49:04, 2it/s]
29.6%┣███████████▎                          ┫ 3.3k/11.2k [00:01<00:03, 2.6kit/s]
41.1%┣███████████████▋                      ┫ 4.6k/11.2k [00:01<00:02, 3.4kit/s]
51.7%┣███████████████████▋                  ┫ 5.8k/11.2k [00:01<00:01, 4.0kit/s]
59.9%┣██████████████████████▊               ┫ 6.7k/11.2k [00:02<00:01, 4.3kit/s]
68.5%┣██████████████████████████            ┫ 7.7k/11.2k [00:02<00:01, 4.6kit/s]
76.4%┣█████████████████████████████         ┫ 8.6k/11.2k [00:02<00:01, 4.8kit/s]
83.0%┣███████████████████████████████▌      ┫ 9.3k/11.2k [00:02<00:00, 5.0kit/s]
89.3%┣█████████████████████████████████    ┫ 10.0k/11.2k [00:02<00:00, 5.1kit/s]
94.8%┣██████████████████████

In [9]:
D_W_df

Unnamed: 0_level_0,deposit_address,withdraw_address
Unnamed: 0_level_1,Any,Any
1,0x0c63d55a244657f5606d62856bd9f1ff227c05f2,0x0e54db73f82bd9fde34ebce53ea83bd197e9044c
2,0xd9ee088c6ca2a90d6f0d059af17c2ec2c908bb0f,0xc73ef94bc339a2cb9a1b67820af46bf47484a1ed
3,0xbf7c205febae32f7874b28b9f371fe522e1fd97a,0xe5b5df72187f7d867973615f5e1144b7a95b495f
4,0xe5b5df72187f7d867973615f5e1144b7a95b495f,0xbf7c205febae32f7874b28b9f371fe522e1fd97a
5,0x09fe8f71f8e14b3d6b6456fbafaaef4a27f042cd,0x35f081bdf4740ffa8a56ff98e4b971fbcb7d82a7
6,0xa8308e994d180ca87c6a784fcb8612dec9ede03d,0x46ba0af6bc60e6fabd9957744c057d031c720ace
7,0xf62e92b2452d8a0fbb2c4b03424d679c86660001,0xf94571dbdff33446dabd17040cd6236b0d2c2545
8,0x865ec62a7f46aab0976ad22573fcf319c3f939ce,0xce91fddab3c544b59ebac665a7635561043a7def
9,0x134b9eab4aa4c1489687c18c10d7338656fde32d,0x68a99f89e475a078645f4bac491360afe255dff1
10,0x8a83716acd66d9e1fb18c9b79540b72e04f80ac0,0xcd1690b5ae49b4bd1ac5d201dccb461887a76dcd


In [10]:
tornado_addresses = Dict(
    "0xd4b88df4d29f5cedd6857912842cff3b20c8cfa3" => "100 DAI",
    "0xfd8610d20aa15b7b2e3be39b396a1bc3516c7144" => "1000 DAI",
    "0x07687e702b410fa43f4cb4af7fa097918ffd2730" => "10000 DAI",
    "0x23773e65ed146a459791799d01336db287f25334" => "100000 DAI",
    "0x12d66f87a04a9e220743712ce6d9bb1b5616b8fc" => "0.1 ETH",
    "0x47ce0c6ed5b0ce3d3a51fdb1c52dc66a7c3c2936" => "1 ETH",
    "0x910cbd523d972eb0a6f4cae4618ad62622b39dbf" => "10 ETH",
    "0xa160cdab225685da1d56aa342ad8841c3b53f291" => "100 ETH",
    "0xd96f2b1c14db8458374d9aca76e26c3d18364307" => "100 USDC",
    "0x4736dcf1b7a3d580672cce6e7c65cd5cc9cfba9d" => "1000 USDC",
    "0x169ad27a470d064dede56a2d3ff727986b15d52b" => "100 USDT",
    "0x0836222f2b2b24a3f36f98668ed8f0b38d1a872f" => "1000 USDT",
    "0x178169b423a011fff22b9e3f3abea13414ddd0f1" => "0.1 WBTC",
    "0x610b717796ad172b316836ac95a2ffad065ceab4" => "1 WBTC",
    "0xbb93e510bbcd0b7beb5a853875f9ec60275cf498" => "10 WBTC",
    "0x22aaa7720ddd5388a3c0a3333430953c68f1849b" => "5000 cDAI",
    "0x03893a7c7463ae47d46bc7f091665f1893656003" => "50000 cDAI",
    "0x2717c5e28cf931547b621a5dddb772ab6a35b701" => "500000 cDAI",
    "0xd21be7248e0197ee08e0c20d4a96debdac3d20af" => "5000000 cDAI"
    );

In [28]:
function get_tcash_transactions_done(address, tornado_transactions_df, tornado_addresses; transaction_type)
    
    address_field =
        if transaction_type == :deposit "from_address" 
        elseif transaction_type == :withdraw "recipient_address"
        else throw("Transaction type parameter error")
        end
    
    # The number of withdraws is initialized at 1 since the withdraw_transaction of the first argument is always present
    # in the withdrawal data. Also, the count should be 1 if there is no other transaction with the same address.
    
    transactions_dict = Dict()
    
    address_transactions = filter(row -> row[address_field] == address, tornado_transactions_df)
    
    # This for loop counts the number of transactions with the same address. At the end, the total number is returned.
    # The count is done considering that the recipient_address of each of the transactions in the withdraw_transactions_df
    # is the same as the recipient_address of the withdraw_transaction input, and that the timestamp of the rows is earlier
    # than the withdraw_transaction input. 
    # The if clause also filters by the transaction hash, since we don't want to count the same transaction two times.
    
    for row ∈ eachrow(address_transactions)
        if haskey(transactions_dict, tornado_addresses[row.tornado_cash_address])
            push!(transactions_dict[tornado_addresses[row.tornado_cash_address]], (hash=row.hash, timestamp=row.block_timestamp))
        else
            transactions_dict[tornado_addresses[row.tornado_cash_address]] = [(hash=row.hash, timestamp=row.block_timestamp)]
        end
    end

    Dict(address => transactions_dict)
end

get_tcash_transactions_done (generic function with 1 method)

In [12]:
get_tcash_transactions_done("0x0c63d55a244657f5606d62856bd9f1ff227c05f2", deposit_txs, tornado_addresses; transaction_type=:deposit)

Dict{String, Dict{Any, Any}} with 1 entry:
  "0x0c63d55a244657f5606d6… => Dict("100 ETH"=>NamedTuple{(:hash, :timestamp), …

In [13]:
first(D_W_df, 5)

Unnamed: 0_level_0,deposit_address,withdraw_address
Unnamed: 0_level_1,Any,Any
1,0x0c63d55a244657f5606d62856bd9f1ff227c05f2,0x0e54db73f82bd9fde34ebce53ea83bd197e9044c
2,0xd9ee088c6ca2a90d6f0d059af17c2ec2c908bb0f,0xc73ef94bc339a2cb9a1b67820af46bf47484a1ed
3,0xbf7c205febae32f7874b28b9f371fe522e1fd97a,0xe5b5df72187f7d867973615f5e1144b7a95b495f
4,0xe5b5df72187f7d867973615f5e1144b7a95b495f,0xbf7c205febae32f7874b28b9f371fe522e1fd97a
5,0x09fe8f71f8e14b3d6b6456fbafaaef4a27f042cd,0x35f081bdf4740ffa8a56ff98e4b971fbcb7d82a7


In [14]:
function get_addresses_transactions(addresses, tornado_transactions_df, tornado_addresses; transaction_type)
    addresses_transactions = Dict()
    for address in ProgressBar(addresses, printing_delay=2)
        merge!(addresses_transactions, get_tcash_transactions_done(address, tornado_transactions_df, tornado_addresses; transaction_type=transaction_type))
    end
    addresses_transactions
end

get_addresses_transactions (generic function with 1 method)

In [35]:
deposit_addresses_tcash_transactions = get_addresses_transactions(unique_deposit_addresses, deposit_txs, tornado_addresses; transaction_type=:deposit)

0.0%┣                                       ┫ 0/23.5k [00:02<-13:-3:-28, -2s/it]
0.0%┣                                        ┫ 1/23.5k [00:02<Inf:Inf, InfGs/it]
0.3%┣▏                                          ┫ 75/23.5k [00:04<21:23, 18it/s]
0.7%┣▎                                         ┫ 162/23.5k [00:06<14:39, 27it/s]
1.1%┣▌                                         ┫ 251/23.5k [00:08<12:30, 31it/s]
1.4%┣▋                                         ┫ 340/23.5k [00:10<11:29, 34it/s]
1.8%┣▊                                         ┫ 425/23.5k [00:12<10:59, 35it/s]


LoadError: InterruptException:

In [16]:
const withdraw_addresses_tcash_transactions = get_addresses_transactions(unique_withdraw_addresses, withdraw_txs, tornado_addresses; transaction_type=:withdraw)

0.0%┣                                       ┫ 0/32.5k [00:02<-18:-4:-16, -2s/it]
0.0%┣                                        ┫ 1/32.5k [00:02<Inf:Inf, InfGs/it]
0.4%┣▏                                         ┫ 117/32.5k [00:04<18:49, 29it/s]
0.7%┣▎                                         ┫ 237/32.5k [00:06<13:47, 39it/s]
1.1%┣▌                                         ┫ 359/32.5k [00:08<12:03, 44it/s]
1.5%┣▋                                         ┫ 480/32.5k [00:10<11:13, 48it/s]
1.8%┣▊                                         ┫ 601/32.5k [00:12<10:42, 50it/s]
2.2%┣█                                         ┫ 724/32.5k [00:14<10:20, 51it/s]
2.6%┣█                                         ┫ 847/32.5k [00:16<10:03, 53it/s]
3.0%┣█▎                                        ┫ 970/32.5k [00:18<09:49, 54it/s]
3.3%┣█▍                                       ┫ 1.1k/32.5k [00:20<09:42, 54it/s]
3.7%┣█▌                                       ┫ 1.2k/32.5k [00:22<09:32, 55it/s]
4.1%┣█▊                     

35.4%┣█████████████▉                         ┫ 11.5k/32.5k [03:25<06:14, 56it/s]
35.7%┣██████████████                         ┫ 11.6k/32.5k [03:27<06:12, 56it/s]
36.0%┣██████████████                         ┫ 11.7k/32.5k [03:29<06:11, 56it/s]
36.4%┣██████████████▏                        ┫ 11.8k/32.5k [03:31<06:10, 56it/s]
36.7%┣██████████████▎                        ┫ 11.9k/32.5k [03:33<06:08, 56it/s]
37.0%┣██████████████▍                        ┫ 12.0k/32.5k [03:35<06:06, 56it/s]
37.3%┣██████████████▋                        ┫ 12.1k/32.5k [03:37<06:04, 56it/s]
37.7%┣██████████████▊                        ┫ 12.3k/32.5k [03:39<06:03, 56it/s]
38.0%┣██████████████▉                        ┫ 12.4k/32.5k [03:41<06:01, 56it/s]
38.4%┣███████████████                        ┫ 12.5k/32.5k [03:43<05:59, 56it/s]
38.7%┣███████████████                        ┫ 12.6k/32.5k [03:45<05:57, 56it/s]
39.0%┣███████████████▎                       ┫ 12.7k/32.5k [03:47<05:55, 56it/s]
39.4%┣███████████████▍      

69.2%┣███████████████████████████            ┫ 22.5k/32.5k [06:50<03:03, 55it/s]
69.5%┣███████████████████████████▏           ┫ 22.6k/32.5k [06:52<03:01, 55it/s]
69.8%┣███████████████████████████▎           ┫ 22.7k/32.5k [06:54<02:59, 55it/s]
70.2%┣███████████████████████████▍           ┫ 22.8k/32.5k [06:56<02:57, 55it/s]
70.5%┣███████████████████████████▌           ┫ 22.9k/32.5k [06:58<02:55, 55it/s]
70.8%┣███████████████████████████▋           ┫ 23.0k/32.5k [07:00<02:53, 55it/s]
71.2%┣███████████████████████████▊           ┫ 23.2k/32.5k [07:02<02:51, 55it/s]
71.5%┣███████████████████████████▉           ┫ 23.3k/32.5k [07:04<02:49, 55it/s]
71.8%┣████████████████████████████           ┫ 23.4k/32.5k [07:06<02:47, 55it/s]
72.1%┣████████████████████████████▏          ┫ 23.5k/32.5k [07:08<02:46, 55it/s]
72.4%┣████████████████████████████▎          ┫ 23.6k/32.5k [07:10<02:44, 55it/s]
72.7%┣████████████████████████████▍          ┫ 23.7k/32.5k [07:12<02:42, 55it/s]
73.1%┣██████████████████████

Dict{Any, Any} with 32528 entries:
  "0xab17da946b4ee971e6cd9… => Dict{Any, Any}("0.1 ETH"=>NamedTuple{(:hash, :ti…
  "0x3db8a6a96a8e3711c30c8… => Dict{Any, Any}("0.1 ETH"=>NamedTuple{(:hash, :ti…
  "0x5d57f2e5f61b484eadc14… => Dict{Any, Any}("1 ETH"=>NamedTuple{(:hash, :time…
  "0xce41f28db174e9684c84a… => Dict{Any, Any}("1 ETH"=>NamedTuple{(:hash, :time…
  "0xe8c4d0b45cae962ea5fab… => Dict{Any, Any}("10 ETH"=>NamedTuple{(:hash, :tim…
  "0xad4ece462b6620d0e9bf9… => Dict{Any, Any}("0.1 ETH"=>NamedTuple{(:hash, :ti…
  "0x09e5a3c43133e15a08924… => Dict{Any, Any}("1 ETH"=>NamedTuple{(:hash, :time…
  "0xb232e7c376462dcc96004… => Dict{Any, Any}("100 ETH"=>NamedTuple{(:hash, :ti…
  "0x786a619524f77daee2d86… => Dict{Any, Any}("100 ETH"=>NamedTuple{(:hash, :ti…
  "0x1cbfd11c477bb948742ef… => Dict{Any, Any}("1 ETH"=>NamedTuple{(:hash, :time…
  "0x9bd7f7330c607406a5aea… => Dict{Any, Any}("0.1 ETH"=>NamedTuple{(:hash, :ti…
  "0x8eb0ad50dacfba1b45a55… => Dict{Any, Any}("10 ETH"=>NamedTuple{(:hash,

In [17]:
function filter_deposits(address1, address2)
    
    deposit_transactions = deposit_addresses_tcash_transactions[address1]
    withdraw_transactions = withdraw_addresses_tcash_transactions[address2]
    
    pools = filter(pool -> pool ∈ keys(withdraw_transactions), keys(deposit_transactions))
    
    linked_transactions_dict = Dict()
    
    for pool in pools
        latest_withdraw = maximum(map(w_t -> w_t.timestamp, withdraw_transactions[pool]))
        
        filtered_deposits = filter(d_t -> d_t.timestamp < latest_withdraw, deposit_transactions[pool])
        
        if !isempty(filtered_deposits)
            linked_transactions_dict[pool] = (deposit_hashes = map(d_t -> d_t.hash, filtered_deposits),
                    withdraw_hashes = map(w_t -> w_t.hash, withdraw_transactions[pool]))
        end
    end
    
    Dict((deposit_address=address1, withdraw_address=address2) => linked_transactions_dict)
end

filter_deposits (generic function with 1 method)

In [18]:
d = deposit_addresses_tcash_transactions["0x000000001d94b2612380854e74c32548d3ce4720"]
[t.timestamp for t in d["1 ETH"]]

2-element Vector{String31}:
 "2021-10-22 22:21:14 UTC"
 "2021-10-22 22:56:07 UTC"

In [19]:
w = withdraw_addresses_tcash_transactions["0x000000007cbf74626927365e961cc697ef8fed32"]
[t.timestamp for t in w["1 ETH"]]

1-element Vector{String31}:
 "2020-12-26 10:47:20 UTC"

In [20]:
filter_deposits("0x000000001d94b2612380854e74c32548d3ce4720", "0x000000007cbf74626927365e961cc697ef8fed32")

Dict{NamedTuple{(:deposit_address, :withdraw_address), Tuple{String, String}}, Dict{Any, Any}} with 1 entry:
  (deposit_address = "0x000000001d94b2612380854e74c32548d3ce4720", wi… => Dict()

In [21]:
function first_neightbors_heuristic(deposit_and_withdraw_linked_addresses)
    
    addresses_linked_transactions = Dict()
    
    for row in ProgressBar(eachrow(deposit_and_withdraw_linked_addresses), printing_delay=2)
        merge!(addresses_linked_transactions, filter_deposits(row.deposit_address, row.withdraw_address))
    end
    
    filter(element -> !isempty(last(element)), addresses_linked_transactions)
end     

first_neightbors_heuristic (generic function with 1 method)

In [22]:
d = first_neightbors_heuristic(D_W_df)

0.0%┣                                       ┫ 0/12.4k [00:02<-6:-52:-50, -2s/it]
0.0%┣                                        ┫ 1/12.4k [00:02<Inf:Inf, InfGs/it]
100.0%┣████████████████████████████████████┫ 12.4k/12.4k [00:02<00:00, 5.4kit/s]


Dict{Any, Any} with 5861 entries:
  (deposit_address = "0x32… => Dict{Any, Any}("10 ETH"=>(deposit_hashes = ["0x7…
  (deposit_address = "0xeb… => Dict{Any, Any}("10 ETH"=>(deposit_hashes = ["0xe…
  (deposit_address = "0x9c… => Dict{Any, Any}("10 ETH"=>(deposit_hashes = ["0xe…
  (deposit_address = "0xe2… => Dict{Any, Any}("100 ETH"=>(deposit_hashes = ["0x…
  (deposit_address = "0x92… => Dict{Any, Any}("1 ETH"=>(deposit_hashes = ["0xa6…
  (deposit_address = "0xf3… => Dict{Any, Any}("100 DAI"=>(deposit_hashes = ["0x…
  (deposit_address = "0x06… => Dict{Any, Any}("1 ETH"=>(deposit_hashes = ["0x9b…
  (deposit_address = "0xc4… => Dict{Any, Any}("10 ETH"=>(deposit_hashes = ["0x8…
  (deposit_address = "0x3a… => Dict{Any, Any}("10 ETH"=>(deposit_hashes = ["0x4…
  (deposit_address = "0xb9… => Dict{Any, Any}("1 ETH"=>(deposit_hashes = ["0xb5…
  (deposit_address = "0x22… => Dict{Any, Any}("10 ETH"=>(deposit_hashes = ["0x8…
  (deposit_address = "0x50… => Dict{Any, Any}("0.1 ETH"=>(deposit_hashes = 

In [23]:
t = (deposit_address = "0x32192ec423152d5f2428adcecb1525e4eb366e90", withdraw_address = "0xce7cf36d9b8bcacd4c80024a10aadd8f7d1173a2")

(deposit_address = "0x32192ec423152d5f2428adcecb1525e4eb366e90", withdraw_address = "0xce7cf36d9b8bcacd4c80024a10aadd8f7d1173a2")

In [24]:
d[t]["10 ETH"]

(deposit_hashes = ["0x79524bf15538e277ffe74de2cc7980eb247a65747ef5cc3f0053281d550f6716"], withdraw_hashes = ["0xb528dbdf806625de1819dcbd954d01e844b6de0e989c0ebfa1069289364cc6c2"])

In [25]:
keys(d) |> collect

5861-element Vector{Any}:
 (deposit_address = "0x32192ec423152d5f2428adcecb1525e4eb366e90", withdraw_address = "0xce7cf36d9b8bcacd4c80024a10aadd8f7d1173a2")
 (deposit_address = "0xebfcef1eda60358d7e0e81db5bef89dfaaa5f3f5", withdraw_address = "0xf19958f76f0f40f0a14006b7e22be03ad5eae104")
 (deposit_address = "0x9c1c21ec0e88b8334ebc5d388a310f59eca0e381", withdraw_address = "0x25f2af3b84d6a36d38dda369c8f7e7a7b0258941")
 (deposit_address = "0xe2ca7390e76c5a992749bb622087310d2e63ca29", withdraw_address = "0x000000000cc7e508b4b115e64d71ef374cfb7703")
 (deposit_address = "0x92f29100cc4dca707359d8eb78402eb3acfd87d3", withdraw_address = "0x6cad680b13397a0f1719af6e209d5dd2d228fa4f")
 (deposit_address = "0xf3a6e54248f51fdf85008e02f3ada2dfc66c6c24", withdraw_address = "0x6fa0f8cc1501756bc40c06b841fc908045a1bd7f")
 (deposit_address = "0x06b1bf28c962363f212878bdf87417ebd0316220", withdraw_address = "0xc82547fc22a11b52d2507e28ace078a187194956")
 (deposit_address = "0xc47e04fa576be089a742aa38a2f1215b01

In [31]:
t2 = (deposit_address = "0x1c6558bb4f4dda8fad8c42cc5b0fb69f1c9115a1", withdraw_address = "0x4cb7e575aa19dfc998120a9b056b3b13ca7b0e1c")
d[t2]

Dict{Any, Any} with 2 entries:
  "1 ETH"   => (deposit_hashes = ["0xd1f0e2026d268c855c522332f7777228206f27ae7c…
  "0.1 ETH" => (deposit_hashes = ["0xc103932505443a4efefb4c0500af92b6b4187a7a7f…

In [32]:
d[t2]["0.1 ETH"]

(deposit_hashes = ["0xc103932505443a4efefb4c0500af92b6b4187a7a7fa7b8aa794a9a5239b47396", "0x4b446f425a21af3d0c5aaad088392e59024e45ce7535728f48f6b902826e0515"], withdraw_hashes = ["0xcee0c1d85ad881ec7c33e2f5117bca93d58c91c026a84593f8bf6f63ef5a1687"])