Skip to content

Annotated queries

Connor Mendenhall edited this page Dec 20, 2021 · 1 revision

Here are two past data investigations, including annotated GraphQL queries describing how we used the data API itself to answer specific questions.

Diff lag alert investigation

Investigating whether an alert tracking diff processing was a true error or a false positive outlier period without protocol activity. (It was a false positive outlier, and our response was to decrease sensitivity of this alert).

{
  # allHeaders Alert Investigation 
  #
  # The alarm triggered around 1:15pm ET and resolved around 2pm ET. Let's broaden the window a bit and
  # look at the storage diffs we have from 1pm ET to 2:15pm ET. Converted to UTC, that's 18:00 to 19:15.
  # Converted to block height, that's blocks 12025165 to 12025525. For each diff, let's look at the status
  # (should be mostly "TRANSFORMED"), the blockHeight, and the created time:

  allStorageDiffs(last: 1000, filter: {blockHeight: {greaterThan: "12025165", lessThan: "12025525"}}) {
    nodes {
      status
      blockHeight
      created
    }
  }
  
  # Looking at the difference between each subsequent blockHeight, most of these are close, with one
  # obvious exception:
  # 
  #        - 12025170 to 12025195:  25 blocks 
  #        - 12025195 to 12025196:   1 block
  #        - 12025196 to 12025208:  12 blocks
  #        - 12025208 to 12025231:  23 blocks
  #        - 12025231 to 12025239:   8 blocks
  #        - 12025239 to 12025429: 190 blocks!
  #        - 12025429 to 12025464:  35 blocks
  #        - 12025464 to 12025482:  18 blocks
  #        - 12025482 to 12025487:   5 blocks
  #        - 12025487 to 12025492:   5 blocks
  #        - 12025492 to 12025503:  11 blocks
  #
  # Let's narrow down the query to just the diffs between 12025238 and 12025430 (bumping the range by 1 on
  # each side since the greaterThan/lessThan filter operators are exclusive). We'll add some extra attributes
  # to see if anything looks weird:

  # largeGapDiffs:allStorageDiffs(last: 1000, filter: {blockHeight: {greaterThan: "12025238", lessThan: "12025430"}}) {
  #   nodes {
  #     status
  #     blockHeight
  #     address
  #     fromBackfill
  #     created
  #     updated
  #   }
  # }

  # Nothing looks too far off here. The things I'm looking at:
  # 
  #        - Are any of these diffs from backfills? No. That's good, because we're not running any.
  #        - Do the created times roughly match the block heights? Yes. (Compare the block timestamp
  #          from Etherscan with the created time. Insert time will be a little later, on the order of
  #          tens of seconds, but shouldn't be minutes or tens of minutes.
  #        - Are the created and updated times close together? The updated time usually represents when
  #          the diff's status changed from "NEW" to "TRANSFORMED". The difference between created and
  #          updated is the time the diff was waiting untransformed. Usually this should be seconds or
  #          tens of seconds.
  #        - What contract are these diffs originating from? These look like they come from two:
  #              
  #                - Diffs from block 12025239 from 0x35d1b3f3d7966a1dfe207aa4514c12a259a0492b
  #                - Diffs from block 12025429 from both x35d1b3f3d7966a1dfe207aa4514c12a259a0492b 
  #                  and 0x197e90f9fad81970ba7976f33cbd77088e5d7cf7
  #
  #          You can check these addresses against the latest deployed contracts at changelog.makerdao.com
  #          to figure out what they are. 0x35d1 is the Vat and 0x197 is the Pot.
  
  # Storage diffs represent "changes to storage values in the Maker contracts Vulcanize is watching." So, are
  # we missing storage changes for Maker protocol activity that took place during this 190-block gap? It's hard to
  # prove a negative, but one thing we can look at is events vs storage. Most storage changes are associated with
  # a corresponding log event. We read log events from a different source (geth0) than storage diffs (geth-statediff nodes),
  # so if we were missing storage diffs for activity we care about, we should see events that took place during these blocks.

  # We can look at this through allHeaders:let's get all the block headers from the blocks we're investigating, and embed the
  # associated event logs. If there are any, let's also embed the contract address, so we can tell what kind of event it might
  # be. It's important to note that we store headers for every block, whether or not there were any Maker related transactions.
  # We only store events related to the Maker contracts. So it's possible to see blocks without any events.
  
  # allHeaders(first: 200, filter: {blockNumber: {greaterThanOrEqualTo: "12025239", lessThanOrEqualTo: "12025429"}}) {
  #   nodes {
  #     blockNumber
  #     eventLogsByHeaderId(first: 100) {
  #       totalCount
  #       nodes {
  #         addressByAddress {
  #           address
  #         }
  #       }
  #     }
  #   }
  # }
  
  # Here I see a bunch of blocks with no event logs, bookended by the two blocks that match the storage diffs above. The
  # contract addresses of those events match the contracts from the storage diffs: Vat and Pot. We track a _lot_ of events,
  # but if you wanted to track down exactly what events are associated with these logs, it's possible by embedding event
  # associations like "vatFrobsByHeaderId", or if you're really good at reading the Matrix, by looking at the log topics. 
  # The allHeaders query is pretty low-level data, so unfortunately it's a little painful to work with.  
  
  # To pull this all together:is it possible that there were 190 blocks without any Maker activity we're following? That's
  # about 41 minutes of wall clock time. I'm always skeptical when the answer is "the code is right and the alarm is wrong,"
  # but the evidence looks like this might be a false positive. 
}

Checking Urn history

Verifying whether full history was present for an Urn for a specific period of time.

{
  
# I wanted to write up this data investigation using the Data API itself. Here's exactly how I investigated
# this issue, with the associated queries. I was happy to find that the data I needed is all exposed through 
# the API!

# Error message: 
# "error inserting cdpi 21929 urn 0x1f5209F1c2b5b978F803C84544176b5692a632fD from diff ID 8448193"
#
# Let's check the CDP by ID, make sure it exists, the urn address is correct, and it reflects
# all known activity. I use the Oasis portal or CDP viewer here for a quick check: https://defiexplore.com/cdp/21929
# If something looks off, we can use Pymaker or Geth to tie out against chain data directly.
#
# Here, I'll query for all the basic attributes of this urn: its address, ilk, ink, and art values, plus its
# history of frob events.

 managedCdpById(id: "21929") {
  usr
  urnIdentifier
  ilkIdentifier
  ilk {
    id
  }
  urn {
    ink
    art
    blockHeight
    frobs(first:20) {
      totalCount
      nodes {
        dink
        dart
        tx {
          blockHeight
          blockHash
        }
      }
    }
  }
}
  
# This looks good: I see two frobs and correct dink/dart/ink/art data for this vault.
 
# Now let's verify that we have state snapshots for both frobs. These are at blocks 12016366 and 12016564,
# according to the blockHeight attributes embedded in the frob events above.

frob1Snapshot:urnSnapshotByUrnIdentifierAndIlkIdentifierAndBlockHeight(urnIdentifier: "0x1f5209F1c2b5b978F803C84544176b5692a632fD", ilkIdentifier: "WBTC-A", blockHeight: "12016366") {
  ink
  art
  blockHeight
}
  
frob2Snapshot:urnSnapshotByUrnIdentifierAndIlkIdentifierAndBlockHeight(urnIdentifier: "0x1f5209F1c2b5b978F803C84544176b5692a632fD", ilkIdentifier: "WBTC-A", blockHeight: "12016564") {
  ink
  art
  blockHeight
}

# These look good, too.

# Now let's check on the storage diff mentioned in the error message above. Does it
# exist, and was it successfully transformed?

storageDiffById(id: "8448193")  {
    id
    status
    blockHeight
    blockHash
}

# Yes, this shows status "TRANSFORMED", with a block height and hash that match the first
# frob event up above. Since we know that the frob at block 12016366 was the initial one that
# created this urn, I expect that there is a cdpManagerUrn value associated with this diff. 
# We can embed it in the query to check:

storageDiffWithCdpManagerUrnEntry:storageDiffById(id: "8448193")  {
    id
    status
    cdpManagerUrnsByDiffId(first: 10) {
      totalCount
      nodes {
        cdpi
        urn
      }
    }
}
  
# Yep, there is. I added totalCount intentionally here, to make sure there's only one.
# All in all, this looks good: we have what looks like a complete event and state history for this urn, and the
# storage diff in our error message was transformed correctly.

# Let's now also check to make sure we have a complete set of headers. We saw two error messages related to headers
# with specific blocks:
#
# 1. "error inserting header for block 12017497: pq: more than one row returned by a subquery used as an expression"
# 2. "error inserting header for block 12017498: pq: more than one row returned by a subquery used as an expression"
#
# Let's first check that we have correct data for these two:
  
block12017497Headers:allHeaders(first: 10, condition: {blockNumber: "12017497"}) {
  nodes {
    id
    ethNodeId
    hash
    raw
  } 
}
  
block12017498Headers:allHeaders(first: 10, condition: {blockNumber: "12017498"}) {
  nodes {
    id
    ethNodeId
    hash
    raw
  } 
}


# The header data here looks correct. You can compare the hash and other data against Etherscan or Geth to confirm.
# There is one unusual thing going on here: we read each of these headers twice, from different nodes. The entries
# are identical, except for different ethNodeId values. That suggests to me that we read them once from the old node
# and once from the upgraded one. Node details aren't really exposed through the API, but I can take a closer look
# in the DB. This is unusual, but not abnormal—we are intentionally able to read headers from multiple nodes.

# Finally, let's also check that we have a complete set of headers from during the node downtime. We can filter by
# Unix timestamp. The first 502 in the header-sync logs is at 2021-03-11T04:00:36Z. The last error message was
# at 2021-03-11T13:22:43Z. Adding 30 minutes on each side, that's Unix time 1615451436 to 1615488720.

allHeaders(first: 2500, filter: {blockTimestamp: {greaterThan: "1615451436", lessThan: "1615488720"}}) {
  totalCount
  nodes {
    ethNodeId
    blockNumber
    blockTimestamp
  }
}

# This query will return a very long list of data, but if you scroll through it you can see two things:
# 
# 1. There's a continuous sequence (good, that means we have all the headers)
# 2. You can actually see the cutover by looking at ethNodeId. The change from 1813 to 2216 happens
#    at block height 12017497, Unix time 1615468640, UTC 2021-03-11T13:17:20Z

# There are no errors or evidence that storage diffs were impacted, but let's check them for good measure. 
# Unlike headers, we don't expect a continuous blockHeight sequence here—we typically only see a storage
# diff we care about every few blocks. But there should be no major gaps, and we should see most of them 
# in the "TRANSFORMED" and "UNRECOGNIZED" state. If there are huge gaps or many "NONCANONICAL" diffs,
# that indicates a problem.

allStorageDiffs(first: 1500, filter: {blockHeight: {greaterThan: "12016197", lessThan: "12018588"}}) {
  totalCount
  nodes {
    blockHeight
    status
  }
}
  
}
Clone this wiki locally