Gateway tracking whether requested content is in Database #14

vasco-santos · 2022-02-10T14:33:12Z

We want to know if gateway requested CIDs are root CIDs stored in Content table (and also if they are Pinned).

Requirements:

Keep state with counter of:
- requested CIDs stored
- requested CIDs pinned
- requested CIDs pinQueued
- requested CIDs not stored

vasco-santos · 2022-02-10T14:37:12Z

@dchoi27 let me know if you have other thoughts/ideas of things we should look into in this context.

Probably a special case if we fail to request but content is in the DB?
Or track some kind of relationship on how "old" is the content that is being requested? Maybe an histogram with like 0.5h, 1h, 2h, 4h, 12h, 24h, 3 days, ... + Inf

dchoi27 · 2022-02-10T18:51:58Z

Yes for sure how old the content is (when it was requested vs. when it was first uploaded)
Could you tell me more about "if we fail to request but content is in the DB"? Like if a user requests data we have but we can't fetch it?

Can we track the metrics around the response for each of the groups above? E.g. if it's pinQueued, does it take longer / less reliable to fetch?

vasco-santos · 2022-02-11T09:20:29Z

Could you tell me more about "if we fail to request but content is in the DB"? Like if a user requests data we have but we can't fetch it?

Yes, so this would be targeting the incomplete uploads.

Can we track the metrics around the response for each of the groups above? E.g. if it's pinQueued, does it take longer / less reliable to fetch?

Yes, that's a good idea

dchoi27 · 2022-02-11T17:27:59Z

Awesome, SGTM

JeffLowe · 2022-02-11T21:41:38Z

This sounds very similar to the needs and plans we have for niftysave (discussed as recently as today with @mikeal ). I'm pulling in @the-simian here. You two may sync up on roadmap to implement this to meet both needs.

olizilla · 2022-03-17T11:52:13Z

@dchoi27 how important are these stats to us? In order to make this work nftstorage/nft.storage#1386 adds logic to hit the nftstorage db for every single CID that is requested from the gateway. That seems like an amplification point where a spike in traffic to the gateway cause a spike in requests to the nftstorage db... two systems that are currently isolated from each other become co-dependent.

in the worst case, a sustainable increase in gateway trafffic could be an unsustainable increase in nftstorage db reads... we can and will continue to optimse and grow that db, but I'd feel more comfortable if we ditched these metrics and kept the gateways sparate from the nft.storage api

Also notable adding these stats makes the current gateway impl less reusable / in need of more customisation to be used as a web3.storage gateway.

dchoi27 · 2022-03-17T13:48:19Z

So I think the main goals of these stats would be to:

See if we can draw patterns for when we have performance issues (i.e., get some more visibility into Cluster as a black box)
Understand user behavior so we can better optimize for it when warming the cache

The former probably gets solved by IPFS Elastic Provider in the long-run, so if there are good reasons not to do a live lookup for every CID to understand its pin status at the time, it's probably not worth doing. But for the latter, it'd be great to at least be able to have periodic datasets with samples showing a CID and when it was requested vs. when it was uploaded if there's a way to do that asynchronously, and in a way that doesn't risk the performance of the entire database.

vasco-santos · 2022-03-17T13:52:00Z

if there's a way to do that asynchronously, and in a way that doesn't risk the performance of the entire database.

The solution here is going through logs and get metrics from a different analyser, like a Digital Ocean App similar to checkup tool Alan built

vasco-santos added kind/enhancement A net-new feature or improvement to an existing feature need/triage Needs initial labeling and prioritization labels Feb 10, 2022

vasco-santos mentioned this issue Feb 10, 2022

NFT.Storage gateway cache MVP nftstorage/nft.storage#823

Closed

28 tasks

vasco-santos assigned vasco-santos and unassigned vasco-santos Feb 11, 2022

dchoi27 changed the title ~~Gateway tracking wether requested content is in Database~~ Gateway tracking whether requested content is in Database Feb 11, 2022

vasco-santos mentioned this issue Feb 16, 2022

feat: gateway tracking requested cids in database nftstorage/nft.storage#1386

Closed

vasco-santos transferred this issue from nftstorage/nft.storage Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gateway tracking whether requested content is in Database #14

Gateway tracking whether requested content is in Database #14

vasco-santos commented Feb 10, 2022

vasco-santos commented Feb 10, 2022

dchoi27 commented Feb 10, 2022

vasco-santos commented Feb 11, 2022

dchoi27 commented Feb 11, 2022

JeffLowe commented Feb 11, 2022

olizilla commented Mar 17, 2022

dchoi27 commented Mar 17, 2022 •

edited

vasco-santos commented Mar 17, 2022

Gateway tracking whether requested content is in Database #14

Gateway tracking whether requested content is in Database #14

Comments

vasco-santos commented Feb 10, 2022

vasco-santos commented Feb 10, 2022

dchoi27 commented Feb 10, 2022

vasco-santos commented Feb 11, 2022

dchoi27 commented Feb 11, 2022

JeffLowe commented Feb 11, 2022

olizilla commented Mar 17, 2022

dchoi27 commented Mar 17, 2022 • edited

vasco-santos commented Mar 17, 2022

dchoi27 commented Mar 17, 2022 •

edited