
# Geth Transactions to MaprDB Demo

This notebook contains the flow to pull the Ethereum Blockchain through a local geth client (running in a docker container), perform light transformations to obtain valid JSON transaction records, and push the data into an existing Mapr-DB cluster (using the data-access-gateway RESTful iterface). We then use the REST api to send a query to MapR-DB, and retrieve selected attributes of "interesting" transactions (for example, those whose creators significantly overpaid to prioritize) for further analysis.   

### Before you begin
For best results, this jupyter server should be running in a docker container (as testuser, with preconfigured python environment), from on an "edge node"* of a secured MapR6.0.1-MEP5.0.0 cluster. In addition: 
- geth client must be connected to its peers & accessible over private IP (replace 172.16.9.41 with your own IP)
- testuser should exist on all nodes, and have a home directory on mfs where it can create the maprdb table
- one or more maprdb rest gateways should be accessible over private IP (replace 172.16.9.42 and 172.16.9.42)
- mapr cluster must be alive and stay alive - might want to keep an eye on it during the load

*An "edge node" here means a linux host (i'm using centos7.4) capable of running docker containers, and no special MapR packages or configurations required. This notebook can be optionally securely persisted to MapR-FS, by starting this docker container with a volume mount on top of a mapr-loopbacknfs client (on the underlying host) using testuser's mapr ticket, but this is not required for the demo.


## Authenticate to MapR-DB Rest Gateway to obtain JWT Token for testuser
To avoid hardcoding testuser's password, we will use it once below to obtain a bearer token, and pass that in the heade to authenticate in the rest of the application. Note bearer tokens expire by default every 30 minutes, property which can be configured in **/opt/mapr/data-access-gateway/conf/properties.cfg** on the host of the rest gateway that is generating the token below. On the upside, the jwt token can be shared between both gateways, since it is based on the cluster's maprserverticket key (and not the default example :-)

In [50]:
import requests
from requests.auth import HTTPBasicAuth
import json

mapr_rest_auth = 'https://172.16.9.42:8243/auth/v2/token'
headers = {'content-type': 'application/json'}
bearerToken = None

try:
    bearerToken = requests.post(
            mapr_rest_auth, 
            headers=headers, verify=False,
            auth=HTTPBasicAuth('testuser', 'testuser')
        ).json()
except requests.exceptions.ConnectionError as e:
    pass

**Tip:** bearer tokens expire by default every 30 minutes, property which can be configured in **/opt/mapr/data-access-gateway/conf/properties.cfg** on the host of the rest gateway that is generating the token below. To decode a jwt token (for debugging purposes), you can paste it into https://jwt.io/ 

In [51]:
# Optional: print the bearer token to see what it looks like
bearerToken

{'token': 'eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJ0ZXN0dXNlciIsImF1ZCI6IndlYiIsImV4cCI6MTUyMzQ0OTA3NSwiaWF0IjoxNTIzNDMxMDc1fQ.SLx5yfw7WLavavshot_mlkBy2pExAtmcWkGKx06M2Vg9xNY_WsBxoejsGKppoAZO3m-nNWBHUphO6p3Vw3rmxw'}

### Construct a header around your jwt token
Bearer token header is missing keyword "Bearer" before it can be used as a json header, so we make a custom header in which we pass in the testuser's bearer token to use throughout the app. The token works across multiple gateways of a mapr cluster, as it is generated based on each cluster's maprserverticket (and not the default example key :-)

In [52]:
headers = { 
'content-type': 'application/json', 
'Authorization': 'Bearer '+bearerToken['token'] 
} 
headers

{'Authorization': 'Bearer eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJ0ZXN0dXNlciIsImF1ZCI6IndlYiIsImV4cCI6MTUyMzQ0OTA3NSwiaWF0IjoxNTIzNDMxMDc1fQ.SLx5yfw7WLavavshot_mlkBy2pExAtmcWkGKx06M2Vg9xNY_WsBxoejsGKppoAZO3m-nNWBHUphO6p3Vw3rmxw',
 'content-type': 'application/json'}

**Tip:** Supress warnings about the self-signed certificate of maprdb data access gateway, so we dont OOM the notebook browser on inserts. 

In [53]:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

## Create all_transactions_table in MapR-DB

In [58]:
transaction_put_url = 'https://172.16.9.42:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2Fall_transactions_table'
response = None

try:
    response = requests.put(
            transaction_put_url, 
            headers=headers, verify=False
        )
    print(response)
except requests.exceptions.ConnectionError as e:
    pass

# Note: a 409 response means the table already exists (which is good if you're running this for the second time)
# 201 means table created successfully, and 401 is most likely caused by an expired token

<Response [201]>


## Prepare geth as a data source to populate MaprDB table 
- geth container should be accessible from private IP of docker host (replace 172.16.9.41)
- web3 is a python library for interacting with Ethereum http://web3py.readthedocs.io/en/stable/ that should be installed in the python environment provided to this kernel

In [40]:
# The following code connects to my geth container (replace with your own private IP).  
from web3 import Web3, HTTPProvider, IPCProvider

gethRPCUrl='http://172.16.9.41:8545'
web3 = Web3(HTTPProvider(gethRPCUrl))

In [None]:
# Optional - print out one block to see what the data looks like
dict(web3.eth.getBlock(5417612))
# notice its not very intersting, since it mostly contains transaction hashes

In [41]:
# Define a function to retrieve all transactions for a given block
def getAllTransactions(block):
    allTransactions = []
    
    for transaction in dict(web3.eth.getBlock(block,full_transactions=True))['transactions']:
        allTransactions.append((dict(transaction)))
        
    return allTransactions

In [30]:
# Optional: print transactions for a given block to see what the data looks like (and to make sure the function works)
getAllTransactions(5412388)

[{'blockHash': '0xc5d2d1453989b3b58a8d2979c5a0be382be1d06b9ecf271416e9e99a7a6b9660',
  'blockNumber': 5412388,
  'from': '0x390dE26d772D2e2005C6d1d24afC902bae37a4bB',
  'gas': 45000,
  'gasPrice': 111000000000,
  'hash': '0x569b4588e9838e0d809a33764725a16ec926dbb281887dc47925c08fa329ad66',
  'input': '0x',
  'nonce': 320275,
  'r': '0xb9cc1a9b716e81083f84432ec66467d8850bb61d89b033b3ccdb38c3e6241123',
  's': '0x792437a1cf9836d36f7a4aea9ba8d18b0b767a909333e6a76e934f70cfa359fc',
  'to': '0x6C59e03ea0D816bCc48EC8385815858cd2FB642F',
  'transactionIndex': 0,
  'v': '0x25',
  'value': 178000000000000000000},
 {'blockHash': '0xc5d2d1453989b3b58a8d2979c5a0be382be1d06b9ecf271416e9e99a7a6b9660',
  'blockNumber': 5412388,
  'from': '0x16F21113329B7F9cb7b81497bBe043ebB3adba18',
  'gas': 60000,
  'gasPrice': 100000000000,
  'hash': '0x5dc7292e3d127f60622aa578ae54bdaa6a516f3263f0a1c260ca2594ac979cdd',
  'input': '0xa9059cbb0000000000000000000000002f0505c4ad8b9fc70c7d7fc9c5c5dfabdea85e110000000000000

### Define a helper function to insert transactions (for specified block range) into (specified) MaprDB table

In [61]:
def getTransactionsAndInsertToDB(blockstart,blockend,txstable):
    for block in range(blockstart,blockend):
        txsLastBlock=getAllTransactions(block)
        
        #print("Inserting to maprdb")
        rest_put_txs_url = 'https://172.16.9.238:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2F'+txstable
      
        try:
            for transaction in txsLastBlock:
                transaction['_id']=transaction['hash']
                #print(transaction)
                response = requests.post(
                    rest_put_txs_url, 
                    headers=headers, verify=False,
                    data=json.dumps(transaction)
            )
        except Exception as e:
            print(e)
            pass

## Insert transactions (for latest N blocks) into all_transactions_table in MaprDB

In [None]:
# retrieve the latest block number, so we can get a recent range of blocks
currentblock = web3.eth.getBlock('latest').number

getTransactionsAndInsertToDB(blockstart=currentblock-100000,
                           blockend=currentblock,
                           txstable="all_transactions_table")

#### Note: This kernel will be locked while the cell above is running. Start up the second notebook to query the data as it loads.  