## Second heuristic - Preliminary implementation

### Description

If there is a deposit and a withdraw transaction with **unique** gas prices (e.g., 3.1415926 Gwei), then we consider the deposit and the withdraw transactions linked. The corresponding deposit transaction can be removed from any other withdraw transaction’s anonymity set.

In [1]:
using DataFrames
using CSV

In [2]:
withdraw_transactions_df = CSV.read("../data/tornado_withdraw_df.csv", DataFrame)
deposit_transactions_df = CSV.read("../data/deposit_transactions.csv", DataFrame);

In [3]:
ENV["COLUMNS"]=10000
ENV["LINES"]=10;

### Function summary: filter_by_unique_gas_price

Filters a transaction DataFrame, leaving only the transactions (rows) that have unique gas_price within all transactions.

In [4]:
# Filters a transaction DataFrame, leaving only the rows that have unique gas_price. 
function filter_by_unique_gas_price(transactions_df)
    unique_gas_prices = filter(row -> row.count==1, combine(groupby(transactions_df, :gas_price), nrow => :count))[!, "gas_price"]
    filter(row -> row.gas_price ∈ unique_gas_prices, transactions_df)
end

filter_by_unique_gas_price (generic function with 1 method)

In [5]:
unique_gasp_deposits = filter_by_unique_gas_price(deposit_transactions_df)
first(unique_gasp_deposits, 5)

Unnamed: 0_level_0,hash,nonce,transaction_index,from_address,to_address,value,gas,gas_price,input,receipt_cumulative_gas_used,receipt_gas_used,receipt_contract_address,receipt_root,receipt_status,block_timestamp,block_number,block_hash,max_fee_per_gas,max_priority_fee_per_gas,transaction_type,receipt_effective_gas_price
Unnamed: 0_level_1,String,Int64,Int64,String,String,Int128,Int64,Int64,String,Int64,Int64,Missing,Missing,Int64,String31,Int64,String,Int64?,Int64?,Int64?,Int64
1,0xbd83053f8afa7777f54a4aca6b8e112fa31b888922dc5b9a9a65eb66e9a6996f,7,63,0x6c6e4816ecfa4481472ff88f32a3e00f2eaa95a1,0x12d66f87a04a9e220743712ce6d9bb1b5616b8fc,100000000000000000,800000,30838446643,0xb214faa527a20ba920c8ae877d67ce1ebd7420dafb3150e001eca78166fd6d66a5fd253e,6222489,800000,missing,missing,0,2020-05-27 03:30:44 UTC,10145408,0x837b3482443f027f6f045644bf002243f72304686015a2d6265676b2a2fc630b,missing,missing,missing,30838446643
2,0x6c416af65ea3a4bc096663c94f5b1fb0cba91607f61703657094f1f5441a3a12,3,51,0x27972d10f153099b3649ea8546a11d91315455e5,0x0836222f2b2b24a3f36f98668ed8f0b38d1a872f,0,1200000,71302125000,0xb214faa52d8f3e8b9934e70c0fbf5bb3cf355027815a11191fb5e2f1d3f6f84beb2a7c35,4108630,992258,missing,missing,1,2020-09-26 08:10:08 UTC,10937092,0x32f0f0fd04d3af8210d2eb956fdec21e09ebff5a55209c0a2c559bb1c034b158,missing,missing,missing,71302125000
3,0x830dbd534d13cd43cb078b7cad8a9c5137bb19aa8bf38e0c3b0e222b688d8340,0,63,0x43eefeb3db479e7b22e015572f38b6af633a43ff,0x47ce0c6ed5b0ce3d3a51fdb1c52dc66a7c3c2936,1000000000000000000,275000,595000000000,0xb214faa500fe7a849e7374f033604417bf6ccd369ea65e027f5327f58fcb8667cb469769,4390453,274947,missing,missing,0,2020-09-17 20:50:21 UTC,10881994,0x5270afd78906cc7264620b2c6fbf8c9221cf8053c8111edc0d7457966340dd81,missing,missing,missing,595000000000
4,0x3fc1bccbcb3d55967104809fa30b97e841d1cac61259e4de0553680802320985,90,49,0x1f28f2ef476178baa1bdd52a7dd666046d87288f,0x12d66f87a04a9e220743712ce6d9bb1b5616b8fc,100000000000000000,1200000,2940860215,0xb214faa509ab186ab3c0fc40dd60fcd3387d430adad51c31cae01a2b92280c0c7b580087,4928980,978691,missing,missing,1,2020-02-22 16:58:22 UTC,9534369,0x275673e2c9caec96e76b9c86e7485ea7ec707ee2c57b38cd43b001e2c330cb37,missing,missing,missing,2940860215
5,0x90b978750a56c400bae91ad65e1cf2abe45b1218defe2a22f7f26bb22321b32d,26,57,0x823c54e4fc30665ebcf045d85aa0dae04015670d,0x12d66f87a04a9e220743712ce6d9bb1b5616b8fc,100000000000000000,1200000,2047069131,0xb214faa50c8acbec5f24055cdc9600ad6523ebd062da038769e974d4f489a63ef9610c67,2905958,978691,missing,missing,1,2020-01-24 23:45:33 UTC,9347529,0x4ef1e187ceae83f51f847f39c9355a124317fc7696701273d1309dd90960b76e,missing,missing,missing,2047069131


### Function summary: same_gas_price_heuristic

This function receives a particular withdraw transaction and a DataFrame with the unique gas price deposits.

It returns a tuple:
* $(true, deposit$ $hash)$ when a deposit transaction with the same gas price as the withdrawal transaction is found.
* $(false, nothing)$ when such a deposit is not found.

In [6]:
# Given a withdrawal transaction and a DataFrame with unique gas_price deposit transactions, checks
# if there is a deposit transaction with the same gas_price as the withdrawal transaction.
function same_gas_price_heuristic(withdrawal_transaction, unique_gas_price_deposit_dataframe)
    for row in eachrow(unique_gas_price_deposit_dataframe)
        if withdrawal_transaction.gas_price == row.gas_price && withdrawal_transaction.block_timestamp > row.block_timestamp
            return (true, row.hash)
        end
    end
    (false, nothing)
end

same_gas_price_heuristic (generic function with 1 method)

In [7]:
same_gas_price_heuristic(withdraw_transactions_df[1,:], unique_gasp_deposits)

(false, nothing)

### Function summary: apply_same_gas_price_heuristic

Applies the heuristic to all the withdraw_transactions DataFrame. Returns a dicionary mapping linked withdrawal and deposit transaction hashes.

In [8]:
# Applies the function to detect same unique gas_prices to all the withdraw_transactions data.
# Returns a list of tuples, each tuple with the deposit transaction hash in the first index and
# the withdrawal transaction hash in the second index.
function apply_same_gas_price_heuristic(deposit_dataframe, withdraw_dataframe)
    unique_gas_price_deposits = filter_by_unique_gas_price(deposit_dataframe)
    withdrawal_to_deposit = []
    for withdraw_row in eachrow(withdraw_dataframe)
        same_gas_deposit_hash = same_gas_price_heuristic(withdraw_row, unique_gas_price_deposits)
        if same_gas_deposit_hash[1]
            push!(withdrawal_to_deposit, (same_gas_deposit_hash[2], withdraw_row.hash))
        end
    end
    withdrawal_to_deposit
end 

apply_same_gas_price_heuristic (generic function with 1 method)

In [9]:
@time link_same_gas = apply_same_gas_price_heuristic(deposit_transactions_df, withdraw_transactions_df);

  3.824365 seconds (111.35 M allocations: 1.667 GiB, 10.52% gc time, 1.77% compilation time)


In [10]:
# A DataFrame with the linked transactions.
linked_transactions_df = DataFrame("deposit_hash"=>[tuple[1] for tuple in link_same_gas],"withdrawl_hash"=>[tuple[2] for tuple in link_same_gas])

Unnamed: 0_level_0,deposit_hash,withdrawl_hash
Unnamed: 0_level_1,String,String
1,0x4e6f643e8a8c1fb123ab4921fc7260ddccce3c9823c1b1b25de6d2658be46350,0x607e9843ffa508428c61d8493f630135eab69c50884cef0c8050eaac7f7fadd7
2,0x194ad37f2a3bf793747de0317ad01aa510388e22a6fd10d317a98df09839fdb9,0xfae1528493709f9154436933ecdfb754ef36692a59b0b927af99896c52494ace
3,0xbf31dc241dabc09f93816fdc7669063362ec2e05cf6e0df324a1c7685581bf98,0xf98448ed1ca40e2ba70965488f3b91a1459815965fc39dd1cd30d8881b157a90
4,0xbf31dc241dabc09f93816fdc7669063362ec2e05cf6e0df324a1c7685581bf98,0x9b1516dd5f14bc930034bea7ad5d0d61a428b7f34ac9cf24eb27e77503e8b4ea
5,0x4cb1619f2d51cc7ecb926905747804e589765634ea737d957a19383a7c32162f,0x212b85e840b1789d00e4ea41ff2317e146ec1fe7df6994b3c323f507ef549f9b
6,0xfdebd5f3c2ac6136d10a4a9300efc62b6342c7844d394a91942838023f74a639,0x28634e07c08e81d19b5e0b9ac8da400596761cd0467d13b99fd6c26fd4ff4a25
7,0xfdebd5f3c2ac6136d10a4a9300efc62b6342c7844d394a91942838023f74a639,0x8ec8ef9523bfa6adb0d55a752cbf9e85228c878e74f957c7c7426dcd6b5c6880
8,0xd120233faeb19b05ff6848d7f91fc3901df59204926a00b30130738184e90ee0,0x42da702eaa292b394108931c05f4a3dc910193e6b6beee39dd6b39558958976c
9,0x558ddd6e4d38d381340bac54243a5a0ed66459e729bf73743ddbedaf7f5118fa,0x1f6bd28ef71b40cc98167ec02a1f714ba1c18b768462712a3a0644ea0049e514
10,0xf8ae02f996a4ca7a1c6d4385e3a29e078684a894c91d6a902f4189612979485a,0x72b04009a5764c59aa3a134791ba9d8e7cd145aaf1ea9bf6aa1a7ad7b9064b35
