This query tries to figure out how many smart contracts were deployed in the network. Furthermore, a network with the applications and senders as nodes and transactions from senders to nodes can be creaded.

In [10]:
spark.stop()

In [1]:
import networkx as nx
import matplotlib.pyplot as plt
import matplotlib as mpt
import numpy as np

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import Row
from graphframes import *
from delta import *
from delta.tables import *

import pyspark.sql.functions as fn
from pyspark.sql.types import StringType, BooleanType, DateType

Define the configuration that should be used

In [2]:
config = pyspark.SparkConf().setAll([
    ('spark.executor.memory', '12g'), 
    ('spark.executor.cores', '3'), 
    ('spark.cores.max', '6'),
    ('spark.driver.memory','1g'),
    ('spark.executor.instances', '1'),
    ('spark.dynamicAllocation.enabled', 'true'),
    ('spark.dynamicAllocation.shuffleTracking.enabled', 'true'),
    ('spark.dynamicAllocation.executorIdleTimeout', '60s'),
    ('spark.dynamicAllocation.minExecutors', '0'),
    ('spark.dynamicAllocation.maxExecutors', '2'),
    ('spark.dynamicAllocation.initialExecutors', '1'),
    ('spark.dynamicAllocation.executorAllocationRatio', '1')
])

In [3]:
builder = pyspark.sql.SparkSession.builder \
    .appName("SmartContractRelatedTXNetwork") \
    .master("spark://172.23.149.212:7077") \
    .config(conf=config) \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

22/05/26 13:55:11 WARN Utils: Your hostname, algorand-druid-and-spark resolves to a loopback address: 127.0.0.1; using 172.23.149.212 instead (on interface ens3)
22/05/26 13:55:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address


:: loading settings :: url = jar:file:/home/ubuntu/.local/lib/python3.8/site-packages/pyspark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /home/ubuntu/.ivy2/cache
The jars for the packages stored in: /home/ubuntu/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-1bd8fbe0-fe0e-48a5-a622-687156e7bb68;1.0
	confs: [default]
	found io.delta#delta-core_2.12;1.1.0 in central
	found org.antlr#antlr4-runtime;4.8 in central
	found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
:: resolution report :: resolve 736ms :: artifacts dl 5ms
	:: modules in use:
	io.delta#delta-core_2.12;1.1.0 from central in [default]
	org.antlr#antlr4-runtime;4.8 from central in [default]
	org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      d

Read the transactions_flat table from the Delta Lake

In [56]:
# account table to determine which accounts have received rewards
dfTx = spark.read.format("delta").load("/mnt/delta/bronze/algod_indexer_public_txn_flat")
dfTx = dfTx.drop("key", "EXTRA", "RR", "SIG", "TXN_FEE", "TXN_FV", "TXN_GH", "TXN_LV", "TXN_GEN", "TXN_GRP", "TXN_LX", "TXN_NOTE", "TXN_REKEY", "TXN_CLOSE", "TXN_VOTEKEY", "TXN_SELKEY", "TXN_VOTELST", "TXN_VOTEKD", "TXN_NONPART", "TXN_CAID", "TXN_APAR", "TXN_XAID", "TXN_AAMT", "TXN_ASND", "TXN_ARCV", "TXN_ACLOSE", "TXN_FADD", "TXN_FAID", "TXN_AFRZ", "TXN_SIG", "TXN_MSIG", "TXN_LSIG")

print("Number of total transactions", dfTx.count())
dfTx.show(1, vertical=True, truncate=False)

                                                                                

Number of total transactions 47244794
-RECORD 0---------------------------------------------------
 ROUND       | 215151                                       
 TXID        | null                                         
 INTRA       | 2                                            
 TYPEENUM    | 1                                            
 ASSET       | null                                         
 TXN_SND     | 7OUGoX3hg950O7LF51T+7uRwEWw9PMlVpp0axxDgZY0= 
 TXN_TYPE    | pay                                          
 TXN_RCV     | 5LMyUgikZxnv5RI3OLDyxPqFLo+9EfNYXR5LduhmmCY= 
 TXN_AMT     | 1                                            
 TXN_VOTEFST | null                                         
 TXN_APID    | null                                         
 TXN_APAN    | null                                         
 TXN_APAT    | null                                         
 TXN_APAP    | null                                         
 TXN_APAA    | null                            

In [57]:
print(dfTx.rdd.getNumPartitions())

1995


In [58]:
dfTxtest = dfTx.coalesce(6)
dfTxtest.rdd.getNumPartitions()

6

In [59]:
dfUseCaseDistribution = dfTxtest.groupBy("TXN_TYPE").count()
dfUseCaseDistribution.show()

22/05/26 17:34:01 WARN TaskSetManager: Stage 209 contains a task of very large size (1454 KiB). The maximum recommended task size is 1000 KiB.

+--------+--------+
|TXN_TYPE|   count|
+--------+--------+
|    acfg|     853|
|  keyreg|     507|
|   axfer|36773572|
|     pay|10469849|
|    afrz|      13|
+--------+--------+



                                                                                

In [52]:
dfTX2 = dfTx.select("ROUND")

In [46]:
dfTX2.show(5)



+------+
| ROUND|
+------+
|215151|
|215151|
|215151|
|215151|
|215151|
+------+
only showing top 5 rows



In [38]:
dfTest = spark.range(1, 100000000)
print(dfTest.rdd.getNumPartitions())

6


In [39]:
dfTest.show(5)

+---+
| id|
+---+
|  1|
|  2|
|  3|
|  4|
|  5|
+---+
only showing top 5 rows



In [42]:
dfTest.agg(fn.sum("id")).show()

+----------------+
|         sum(id)|
+----------------+
|4999999950000000|
+----------------+



In [54]:
dfTX2.agg(fn.sum("ROUND")).show()

[Stage 177:>                                                        (0 + 6) / 6]

+-------------------+
|         sum(ROUND)|
+-------------------+
|4.02294943404127E14|
+-------------------+



                                                                                

### Use Cases

We want to distinguish between different use cases of the transactions. To do so, we make use of type of a transaction. we distinguish between the following transactions
- PaymentTx, type = "pay", typeEnum = 1. This transaction sends Algos from one account to another.
- KeyRegistrationTx, type = "keyreg", typeEnum = 2. This transaction is done to register an account either online or offline. A transaction is an online transaction if it has participation-key related fields, namely votekey, selkey, votekd, votefst and votelst. A transaction is an offline transaction if these fields are missing. The moment a key registration transaction is confirmed by the network it takes 320 rounds for the change to take effect. In other words, if a key registration is confirmed in round 1000, the account will not start participating until round 1320.
- AssetConfigTx, type = "acfg", typeEnum = 3. This transaction is used to create an asset, modify certain parameters of an asset, or destroy an asset. 
- AssetTransferTx, type = "axfer", typeEnum = 4. An Asset Transfer Transaction is used to opt-in to receive a specific type of Algorand Standard Asset, transfer an Algorand Standard asset, or revoke an Algorand Standard Asset from a specific account.
- AssetFreezeTx, type = "afrz", tpyeEnum = 5. An Asset Freeze Transaction is issued by the Freeze Address and results in the asset receiver address losing or being granted the ability to send or receive the frozen asset.
- ApplicationCallTx, type = "appl", typeEnum = 6. An Application Call Transaction is submitted to the network with an AppId and an OnComplete method. The AppId specifies which App to call and the OnComplete method is used in the contract to determine what branch of logic to execute.

More information can be found under https://developer.algorand.org/docs/get-details/transactions/

In [25]:
dfUseCaseDistribution = dfTx.groupBy("TXN_TYPE").count()
dfUseCaseDistribution.show()



+--------+-------+
|TXN_TYPE|  count|
+--------+-------+
|    acfg|    175|
|    afrz|      3|
|  keyreg|    337|
|   axfer|  37983|
|     pay|1273597|
+--------+-------+



                                                                                

Create a cross join to have another column with the sum in each row for diving it later

In [26]:
dfUseCaseDistribution = dfUseCaseDistribution.crossJoin(dfUseCaseDistribution.groupby().agg(fn.sum('count').alias('sum_count')))

In [27]:
dfUseCaseDistribution = dfUseCaseDistribution.select('TXN_TYPE', 'count', (fn.col('count') / fn.col('sum_count')).alias("percent"))

In [28]:
dfUseCaseDistribution.show(truncate=False)



+--------+-------+---------------------+
|TXN_TYPE|count  |percent              |
+--------+-------+---------------------+
|acfg    |175    |1.3336442629672136E-4|
|afrz    |3      |2.286247307943795E-6 |
|keyreg  |337    |2.5682178092568627E-4|
|axfer   |37983  |0.028946177165876386 |
|pay     |1273696|0.9706613503795932   |
+--------+-------+---------------------+



                                                                                

In [None]:
dfMiners.write.format("delta").mode("overwrite").save("/mnt/delta/silver/queries/network/miner.addresses")
dfUsers.write.format("delta").mode("overwrite").save("/mnt/delta/silver/queries/network/user.addresses")