# Web scraping on-chain data via The Graph

- Author: Yichen Luo
- Date: 2024-10-01

# Dependencies

```bash

pip install requests

```

In [4]:
from pprint import pprint
import requests

## The Graph

The Graph is a decentralized protocol for indexing and querying data from blockchains, starting with Ethereum. What that means: it is an easier way to retrieve specific data from the blockchain, within the ethos of web3, with the advantages of decentralization and reliability.

## Aave V2
Aave is a decentralized money market protocol that allows users to lend and borrow a wide range of cryptocurrencies.

- [Subgraph for Aave V2](https://thegraph.com/explorer/subgraphs/8wR23o1zkS4gpLqLNU4kG3JHYVucqGyopL5utGxP2q1N?view=Query&chain=arbitrum-one)

Step 1: Try out the query in the explorer to see what data you can get.

<img src="./fig/playground.png" width="1400">

Step 2: Right click the browser and inspect the network tab to see the request being made.

<img src="./fig/scrape.png" width="1400">

Step 3: Copy the cURL request and convert it to a python request.

<img src="./fig/curl.png" width="1400">

Here is the CURL:

```bash
curl 'https://gateway.thegraph.com/api/deployments/id/QmdEuhCPTFx5q1Vf7jPQDVGQDpC34KYry82yb3NPc9sK6a' \
  -H 'accept: application/json, multipart/mixed' \
  -H 'accept-language: en' \
  -H 'authorization: Bearer 944b560e76f53abf0739468966998887' \
  -H 'content-type: application/json' \
  -H 'origin: https://thegraph.com' \
  -H 'priority: u=1, i' \
  -H 'referer: https://thegraph.com/' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  --data-raw '{"query":"{\n  protocols(first: 5) {\n    id\n    pools {\n      id\n    }\n  }\n  contractToPoolMappings(first: 5) {\n    id\n    pool {\n      id\n    }\n  }\n}"}'
```

Convert it to python:

In [5]:
import requests

url = 'https://gateway.thegraph.com/api/deployments/id/QmdEuhCPTFx5q1Vf7jPQDVGQDpC34KYry82yb3NPc9sK6a'

headers = {
    'accept': 'application/json, multipart/mixed',
    'accept-language': 'en',
    'authorization': 'Bearer 944b560e76f53abf0739468966998887',
    'content-type': 'application/json',
    'origin': 'https://thegraph.com',
    'priority': 'u=1, i',
    'referer': 'https://thegraph.com/',
    'sec-ch-ua-mobile': '?0',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-site',
}

data = {
    "query": """
    {
      protocols(first: 5) {
        id
        pools {
          id
        }
      }
      contractToPoolMappings(first: 5) {
        id
        pool {
          id
        }
      }
    }
    """
}

response = requests.post(url, headers=headers, json=data)

pprint(response.json())

{'data': {'contractToPoolMappings': [{'id': '0x00ad8ebf64f141f1c81e9f8f792d3d1631c6c684',
                                      'pool': {'id': '0xb53c1a33016b2dc2ff3653530bff1848a515c8c5'}},
                                     {'id': '0x01c0eb1f8c6f1c1bf74ae028697ce7aa2a8b0e92',
                                      'pool': {'id': '0xb53c1a33016b2dc2ff3653530bff1848a515c8c5'}},
                                     {'id': '0x028171bca77440897b824ca71d1c56cac55b68a3',
                                      'pool': {'id': '0xb53c1a33016b2dc2ff3653530bff1848a515c8c5'}},
                                     {'id': '0x02aaeb4c7736177242ee0f71f6f6a0f057aba87d',
                                      'pool': {'id': '0xacc030ef66f9dfeae9cbb0cd1b25654b82cfa8d5'}},
                                     {'id': '0x030ba81f1c18d280636f32af80b9aad02cf0854e',
                                      'pool': {'id': '0xb53c1a33016b2dc2ff3653530bff1848a515c8c5'}}],
          'protocols': [{'id': '1',
        