Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Time Series Queries #191

Open
hexonaut opened this issue May 14, 2020 · 5 comments
Open

Generic Time Series Queries #191

hexonaut opened this issue May 14, 2020 · 5 comments

Comments

@hexonaut
Copy link

hexonaut commented May 14, 2020

Continuing on from here: #179

I think a common use case for this data is going to be displaying time series data on these values/events.

I think we should provide a common generic api for querying stats with 3 parameters. A start datetime, an end datetime and a granularity bucket size. I've added these three for collateral price, savings dai and collateral locked:

create index if not exists headers_block_timestamp_idx ON public.headers (block_timestamp);

drop function if exists api.spot_poke_time;
drop type if exists api.spot_poke_time_result cascade;
create type api.spot_poke_time_result AS (
    date timestamptz,
    ilk_id int4,
    value numeric,
    block_number int8,
    block_timestamp numeric
);
create function api.spot_poke_time(start_time date, end_time date, granularity interval)
    returns setof api.spot_poke_time_result as $$
        WITH input AS (
            SELECT generate_series(start_time, end_time, granularity) AS d
        ),
        ilk_values AS (
            SELECT input.d AS date, ilk_id, MAX(s.header_id) AS header_id FROM maker.spot_poke s CROSS JOIN input LEFT JOIN public.headers h ON (h.id = s.header_id) WHERE h.block_timestamp < extract(epoch FROM (input.d + granularity)) GROUP BY input.d, ilk_id
        )
        SELECT i.date, i.ilk_id, s.value, h.block_number, h.block_timestamp FROM ilk_values i LEFT JOIN maker.spot_poke s ON (s.header_id = i.header_id AND s.ilk_id = i.ilk_id) LEFT JOIN public.headers h ON (h.id = s.header_id)
    $$ language sql stable;

drop function if exists api.pot_pie_time;
drop type if exists api.pot_pie_time_result cascade;
create type api.pot_pie_time_result AS (
    date timestamptz,
    pie numeric,
    block_number int8,
    block_timestamp numeric
);
create function api.pot_pie_time(start_time date, end_time date, granularity interval)
    returns setof api.pot_pie_time_result as $$
        WITH input AS (
            SELECT generate_series(start_time, end_time, granularity) AS d
        ),
        pie_values AS (
            SELECT input.d AS date, MAX(p.header_id) AS header_id FROM maker.pot_pie p CROSS JOIN input LEFT JOIN public.headers h ON (h.id = p.header_id) WHERE h.block_timestamp < extract(epoch FROM input.d + granularity) GROUP BY input.d
        )
        SELECT i.date, p.pie, h.block_number, h.block_timestamp FROM pie_values i LEFT JOIN maker.pot_pie p ON (p.header_id = i.header_id) LEFT JOIN public.headers h ON (h.id = p.header_id)
    $$ language sql stable;

drop function if exists api.ilk_ink_time;
drop type if exists api.ilk_ink_time_result cascade;
create type api.ilk_ink_time_result AS (
    date timestamptz,
    ilk_id int4,
    ink numeric,
    block_number int8,
    block_timestamp numeric
);
create function api.ilk_ink_time(start_time date, end_time date, granularity interval)
    returns setof api.ilk_ink_time_result as $$
        WITH input AS (
            SELECT generate_series(start_time, end_time, granularity) AS d
        ),
        latest_urns AS (
            SELECT DISTINCT ON (urn_id) ink, header_id, urn_id FROM maker.vat_urn_ink u LEFT JOIN public.headers h ON (h.id = u.header_id) ORDER BY urn_id, block_number DESC
        )
        SELECT input.d AS date, ilk_id, SUM(u.ink) AS ink, MAX(h.block_number) AS block_number, MAX(h.block_timestamp) AS block_timestamp FROM latest_urns u CROSS JOIN input LEFT JOIN maker.urns urns ON (urns.id = u.urn_id) LEFT JOIN public.headers h ON (h.id = u.header_id) WHERE h.block_timestamp < extract(epoch FROM (input.d + granularity)) GROUP BY input.d, ilk_id
    $$ language sql stable;

I'm not the best with Postgres so maybe there is a cleaner way to write those, but I think this pattern can be generalized to produce aggregation over most stats. For single values we can provide the latest value in the bucket (as in the examples above). For event-based stats such as Draws/Wipes we can provide both SUM and AVERAGE aggregators.

Also, does it make sense to have all these definitions in one giant schema.sql file? Maybe it makes sense to break these apart into more manageable pieces?

@KentonPrescott
Copy link

Hi Sam. For those who aren't as familiar with SQL but are still curious in what is proposed, could you provide a pseudo-example response from each query?

@hexonaut
Copy link
Author

So each of these take the form of <table>_time(start, end granularity). So for example spot_poke_time('2020-05-01'::DATE, '2020-06-01'::DATE, '1 day'::INTERVAL) returns the ilk price over the month of May in 1 day buckets.

Similarly ilk_ink_time('2020-05-01'::DATE, '2020-06-01'::DATE, '1 day'::INTERVAL) returns the amount of collateral an ilk has for the month of May in 1 day buckets.

Here is an example in GraphQL form:

Query:

{
  ilkInkTime(startTime:"2020-05-01", endTime:"2020-06-01", granularity:{days:1}) {
    nodes {
      date,
      ilkId,
      ink,
      blockNumber,
      blockTimestamp
    }
  }
}

Response:

{
  "data": {
    "ilkInkTime": {
      "nodes": [
        {
          "date": "2020-05-28T00:00:00+00:00",
          "ilkId": 10,
          "ink": "14243787879208000000000000",
          "blockNumber": "10070473",
          "blockTimestamp": "1589542052"
        },
        {
          "date": "2020-05-11T00:00:00+00:00",
          "ilkId": 1,
          "ink": "648226674586864668452888",
          "blockNumber": "10047822",
          "blockTimestamp": "1589238672"
        },
        {
          "date": "2020-05-12T00:00:00+00:00",
          "ilkId": 10,
          "ink": "13323780487907000000000000",
          "blockNumber": "10052841",
          "blockTimestamp": "1589306670"
        },
        {
          "date": "2020-05-05T00:00:00+00:00",
          "ilkId": 9,
          "ink": "282783589266214677304525",
          "blockNumber": "10007975",
          "blockTimestamp": "1588705424"
        },
        {
          "date": "2020-05-10T00:00:00+00:00",
          "ilkId": 508622,
          "ink": "57743128310000000000",
          "blockNumber": "10041360",
          "blockTimestamp": "1589152472"
        },
        {
          "date": "2020-05-04T00:00:00+00:00",
          "ilkId": 1,
          "ink": "90936570123725291376599",
          "blockNumber": "10002705",
          "blockTimestamp": "1588634910"
        },
        ...
      ]
    }
  }
}

Does this explain better?

It also probably makes sense to add an optional ilk filter and to order the result by date.

@rmulhol
Copy link
Contributor

rmulhol commented May 15, 2020

Thanks for the input! Will put some thought into considering how we could support these kinds of queries 👍

@hexonaut
Copy link
Author

hexonaut commented Jun 3, 2020

Wrote a PR for ilk_snapshot: #202

I think this format can be generalized to many of the other trigger/storage tables.

In particular I am prefixing time to indicate it is a bucketed, time-series query and requiring bucket_start, bucket_end and bucket_interval.

@hexonaut
Copy link
Author

PR for auctions: #211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants