Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Kinesis delivery stream and Athena query infrastructure to enable queries over the UCAN logs #191

Merged
merged 6 commits into from
Oct 11, 2023

Conversation

vasco-santos
Copy link
Contributor

@vasco-santos vasco-santos commented Apr 14, 2023

This PR has an implementation of option 2 from #190 (comment)

This encompasses a fair amount of functionality - partitioning the UCAN logs into S3 buckets, configuring a Glue database and tables, adding example queries to Athena and more. A partial list of functionality follow:

  • implement UCAN log partitioning in S3

    • first partition by type - everything in "workflows" shows up in "receipts" so this reduces the amount of data scanned by ~50%
    • next partition by op to allow us to create tables that only query a specific operation (ie, store/add or provider/add) - this lets us add operation-specific Glue table schemas with much less clutter in result types than we'd need if we tried to defined all possible inputs and outputs in a single table
    • finally partition by date to allow queries to only load recent data
  • use these partitions to implement standalone tables for receipts in general and the store/add, upload/add and provider/add UCANs specifically,

  • add queries that demonstrate the use of all of these tables

  • add dynamo connector so we can join the UCAN logs to our Dynamo tables in queries

  • add queries that demonstrate using the Dynamo and Glue tables together

@seed-deploy seed-deploy bot temporarily deployed to pr191 April 14, 2023 08:55 Inactive
@seed-deploy
Copy link

seed-deploy bot commented Apr 14, 2023

View stack outputs
  • pr191-w3infra-CarparkStack

    Name Value
    BucketName carpark-pr191-0
    Region us-east-2
  • pr191-w3infra-SatnavStack

    Name Value
    BucketName satnav-pr191-0
    Region us-east-2
  • pr191-w3infra-UploadApiStack

    Name Value
    ApiEndpoint https://r9wabro5ee.execute-api.us-east-2.amazonaws.com
    CustomDomain https://pr191.up.web3.storage
  • pr191-w3infra-BusStack

  • pr191-w3infra-ReplicatorStack

  • pr191-w3infra-UcanFirehoseStack

  • pr191-w3infra-UcanInvocationStack

  • pr191-w3infra-UploadDbStack

@seed-deploy seed-deploy bot temporarily deployed to pr191 April 14, 2023 16:12 Inactive
@vasco-santos
Copy link
Contributor Author

See grafana integration ✨
image

@heyjay44 heyjay44 mentioned this pull request Apr 14, 2023
23 tasks
merging into #191 - mostly put this up so @vasco-santos can review
before I push it into the PR he opened!

- implement date based partitioning
- expand `value` and `out` struct definitions to allow for more
interesting queries
Plus example query joining dynamo to ucan logs.

Additionally, with input from @dchoi27, two major optimizations:

1. partition by `type` and rework the ucan table to only look at
"receipts" - everything in "workflows" shows up in receipts so this
reduces the amount of data scanned by ~50%
2. partition by `op` to allow us to create tables that only query a
specific operation (ie, `store/add` or `provider/add`) - this lets us
add operation-specific schemas with much less clutter in result types
 
Using these optimizations, I've added standalone tables for `store/add`,
`upload/add` and `provider/add` UCANs. and reworked the saved queries
to use them.

---------

Co-authored-by: Vasco Santos <santos.vasco10@gmail.com>
@travis travis marked this pull request as ready for review October 4, 2023 18:31
@travis travis changed the title feat: kinesis delivery stream feat: Kinesis delivery stream and Athena query infrastructure to enable queries over the UCAN logs Oct 4, 2023
@seed-deploy
Copy link

seed-deploy bot commented Oct 5, 2023

Stack outputs updated

Copy link
Contributor Author

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! Thanks for moving this forward @travis ❤️

@travis travis merged commit 9f0c50d into main Oct 11, 2023
3 checks passed
@travis travis deleted the feat/kinesis-delivery-stream branch October 11, 2023 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants