
# Retrieve filtered transactions from MaprDB 

Previous notebook contains the flow to pull the Ethereum Blockchain through a local geth client (running in a docker container), perform light transformations to obtain valid JSON transaction records, and push the data into an existing Mapr-DB cluster (using the data-access-gateway RESTful iterface). While the data loads, this notebook will use the REST api to send a query to MapR-DB, and retrieve selected attributes of "interesting" transactions (for example, those whose creators significantly overpaid to prioritize) for further analysis.   

## Authenticate to MapR-DB Rest Gateway
You could reuse the jwt token from earlier session, but it is probably easier to obtain a new one in this kernel. 

In [28]:
import requests
from requests.auth import HTTPBasicAuth
import json

# Connect to any maprdb rest gataway and obtain a token
mapr_rest_auth = 'https://172.16.9.42:8243/auth/v2/token'
headers = {'content-type': 'application/json'}
bearerToken = None

try:
    bearerToken = requests.post(
            mapr_rest_auth, 
            headers=headers, verify=False,
            auth=HTTPBasicAuth('testuser', 'testuser')
        ).json()
except requests.exceptions.ConnectionError as e:
    pass

# Construct a header around your jwt token, same as previous notebook
headers = { 
'content-type': 'application/json', 
'Authorization': 'Bearer '+bearerToken['token'] 
} 

# Supress warnings about the self-signed certificate of maprdb data access gateway, so we dont OOM the notebook browser
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

## Retrieve all the data from a MapR-DB table
> **Tip:** You can limit the results returned by the query by passing in a **limit** parameter at the end of the REST call (to avoid OOM-ing the your browser). Alternatively, you can limit results brought back to the gateway by setting **rest.result.limit** in **/opt/mapr/data-access-gateway/conf/properties.cfg** on each data-access-gateway then restart them using mapr-cli interface:   
>```maprcli node services -nodes `hostname` -name data-access-gateway -action restart```

**A quick way to smoke test if data got inserted is to paste directly to the browser. **
> https://172.16.9.42:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2Fall_transactions_table?limit=10

**Here is an example running the same query from the CLI (of an edge node that can access the maprdb data access gateway), and passing in the jwt token obtained earlier. **

> ```curl -k -X GET 'https://172.16.9.238:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2F%2Fall_transactions_table?limit=5 ' -H 'Authorization: Bearer eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJ0ZXN0dXNlciIsImF1ZCI6IndlYiIsImV4cCI6MTUyMzUwMTQ0NywiaWF0IjoxNTIzNDgzNDQ3fQ.gvSBGxjgBQo-r7uWHdspf10IZI16EGTYjARLBK2Owb3tfL1Fv5ilPVnu3rR44vfviyDQN8V2V3J9iH5wgE5_xg'```


In [7]:
# For the demo, we can define a function that retrieves the results back to notebook 
def retrieveDataFromMaprdb(tablename):
    rest_get_trades_url = 'https://172.16.9.238:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2F'+tablename+'?limit=1'

    try:
        table = requests.get(
            rest_get_trades_url, 
            headers=headers, verify=False
        )
        return table
    except requests.exceptions.ConnectionError as e:
        pass

In [29]:
retrieved_table = retrieveDataFromMaprdb('all_transactions_table')
print(retrieved_table.json())

{'DocumentStream': [{'gasPrice': 15000000000, 'blockHash': '0xfee9a9d6362d8b000a491f40e561d4152827eefc96a322f29489201b9ff2f121', 'v': '0x25', 'transactionIndex': 21, 'input': '0x', 's': '0x3914f12e2444c2887234464827ca2af6bda506aa3a0c0c65f9f53d8489d8bf29', 'r': '0xad26f88cc83fb7fd6fa4b740ce8b359e60e1633cb2a84f333cf0b23d54ec4066', 'hash': '0x00006da2ad44391e3d961ef1be7674dcf008f5dbbf35bcec81cd6e3094ddb3ef', '_id': '0x00006da2ad44391e3d961ef1be7674dcf008f5dbbf35bcec81cd6e3094ddb3ef', 'to': '0x27ab8f51Eb866A755bD05CeC73CD96AFE33f5e34', 'gas': 90000, 'nonce': 629144, 'value': 2997280000000000000, 'from': '0x2B5634C42055806a59e9107ED44D43c426E58258', 'blockNumber': 5322844}]}


## Retrieve filtered data from MapR-DB table with conditions and projections

What would be really interesting is to see who is burning the most eth on gas, but since we cannot filter data from maprdb directly based on (gas * gasPrice), the next best thing would be to figure out who is seriously overpaying gas (> 100x usual gas price) and see if any of those transactions are big enough to be worth tracking down on etherscan.

**Example querying all_transactions_table where gasPrice is unusually (200x) high from web browser **
> https://172.16.9.42:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2Fall_transactions_table?condition={"$gt":{"gasPrice":800000000000}}

** Same query with projection (selected fields to return), limit and an orderBy (which seems to require limit) **
> https://172.16.9.42:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2Fall_transactions_table?condition={"$gt":{"gasPrice":800000000000}}&fields=gas,gasPrice,to,from&limit=100&orderBy=gas

**And here's an example of running the query above from the CLI. ** Note the **-g** option to disable gobbling (and avoid choking on the special characters included in the query). See https://stackoverflow.com/questions/25435798/how-to-curl-post-with-json-parameters
> ``` curl -g -k -X GET 'https://172.16.9.238:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2F%2Fall_transactions_table?condition={"$gt":{"gasPrice":800000000000}}&fields=gas,gasPrice,to,from&limit=100&orderBy=gas' -H 'Authorization: Bearer eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJ0ZXN0dXNlciIsImF1ZCI6IndlYiIsImV4cCI6MTUyMzUwMTQ0NywiaWF0IjoxNTIzNDgzNDQ3fQ.gvSBGxjgBQo-r7uWHdspf10IZI16EGTYjARLBK2Owb3tfL1Fv5ilPVnu3rR44vfviyDQN8V2V3J9iH5wgE5_xg'```



In [10]:
# for a more sustainable way to query with conditions, we can create a function
# appending localparams this way allows us to get around encoding issues for special characters

def retrieveFilteredDataFromMaprdb(tablename, condition, projection):
    rest_get_trades_url = 'https://172.16.9.42:8243/api/v2/table/%2Fuser%2Ftestuser%2Feth%2F'+tablename
    localparams='condition='+condition
    localparams+='&fields='+projection

    
    try:
        table = requests.get(
            rest_get_trades_url, 
            headers=headers, verify=False,
            params=localparams
        )
        return table
    except requests.exceptions.ConnectionError as e:
        pass


In [30]:
# let's query for the the guys really overpaying - 200x the usual price of gas
filtered_table = retrieveFilteredDataFromMaprdb("all_transactions_table",
                                                '{"$gt":{"gasPrice":8000000000000}}',
                                                "")
#filtered_table.json()

### Enrich locally and print out pretty

In [19]:
from web3 import Web3, HTTPProvider, IPCProvider

# connect to your geth node to convert wei to eth
gethRPCUrl='http://172.16.9.41:8545'
web3 = Web3(HTTPProvider(gethRPCUrl))

# query filtering for same overpaid transactions, only bringing back selected fields
filtered_table_projection = retrieveFilteredDataFromMaprdb("all_transactions_table",
                                                '{"$gt":{"gasPrice":8000000000000}}',
                                                "gasPrice,gas,hash")

# Create new empty json to hold enriched transactions (in a local dataframe)
PriceSanitizedMeow=[]
filtered_table_projection=filtered_table_projection.json()
for originalTrasanction in filtered_table_projection['DocumentStream']:
    
    # Add a new column 'ActualEtherUsed'
    originalTrasanction['ActualEtherUsed'] = originalTrasanction['gas'] * web3.fromWei(originalTrasanction['gasPrice'],unit='ether')
    
    # Append enhanced Transaction to the PriceSanitizedMeow
    PriceSanitizedMeow.append(originalTrasanction)

In [32]:
# Optional - print the enriched json to see what the data looks like
PriceSanitizedMeow

In [33]:
# Pretty it up and sort it locally
import pandas as pd
pd.set_option('display.max_colwidth', -1)
prettydf = pd.DataFrame(PriceSanitizedMeow)
prettydf['hash'] = 'https://etherscan.io/tx/'+prettydf['hash']
prettydf.sort_values(by=prettydf.columns[0], ascending=False)

Unnamed: 0,ActualEtherUsed,gas,gasPrice,hash
2,12.348,21000,588000000000000,https://etherscan.io/tx/0x1245123378858a051be8c95769aa2c6bfcdd0f2b0896679ad1d35cdf911f3b83
6,12.348,21000,588000000000000,https://etherscan.io/tx/0x482d539f636b4f20730029f2b1e3a94834e4b8c098739c1c4d97e1d26719f97c
9,12.348,21000,588000000000000,https://etherscan.io/tx/0x92fbdca84ff8158ba2d970d587c407b9c8befc89907a13975246d54302227f30
13,12.348,21000,588000000000000,https://etherscan.io/tx/0xd67ecb01ba9da5bee17edb69725e62f60d0688c6d7188bfaf9cbeb363f543683
0,5.0,250000,20000000000000,https://etherscan.io/tx/0x00c818b5bd20a69715ce4e70e235e7c2275b8eba4b15be0c07a7605fb841deac
3,5.0,250000,20000000000000,https://etherscan.io/tx/0x21e1f70fd42393e2b6d95ee932aa1c9e1189bb755cfeebdf7b5fdb7a99d52464
5,5.0,250000,20000000000000,https://etherscan.io/tx/0x39f308b90258bd72670f62e87919f614f0554d24f2c0543e675a65f05e839403
8,5.0,250000,20000000000000,https://etherscan.io/tx/0x7984edb036ab2f7c8808644fe4f33796a2698b4c1173db384b2c8276451d7b96
12,5.0,250000,20000000000000,https://etherscan.io/tx/0xd13499923870d208f77b41e2cd0961ff86f7f4562c034beb382f77d6bb9e0613
4,2.5,250000,10000000000000,https://etherscan.io/tx/0x2a53cb1636d7306921553818a28c3bb04362af68f5da6179e0292745f3a70da1


#### You can follow the etherscan links above to continue stalking our overpaid transactions