-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Kinesis delivery stream and Athena query infrastructure to enable queries over the UCAN logs #191
Conversation
View stack outputs
|
f44f646
to
90b83db
Compare
90b83db
to
9709d14
Compare
merging into #191 - mostly put this up so @vasco-santos can review before I push it into the PR he opened! - implement date based partitioning - expand `value` and `out` struct definitions to allow for more interesting queries
Plus example query joining dynamo to ucan logs. Additionally, with input from @dchoi27, two major optimizations: 1. partition by `type` and rework the ucan table to only look at "receipts" - everything in "workflows" shows up in receipts so this reduces the amount of data scanned by ~50% 2. partition by `op` to allow us to create tables that only query a specific operation (ie, `store/add` or `provider/add`) - this lets us add operation-specific schemas with much less clutter in result types Using these optimizations, I've added standalone tables for `store/add`, `upload/add` and `provider/add` UCANs. and reworked the saved queries to use them. --------- Co-authored-by: Vasco Santos <santos.vasco10@gmail.com>
Stack outputs updated
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! Thanks for moving this forward @travis ❤️
This PR has an implementation of option 2 from #190 (comment)
This encompasses a fair amount of functionality - partitioning the UCAN logs into S3 buckets, configuring a Glue database and tables, adding example queries to Athena and more. A partial list of functionality follow:
implement UCAN log partitioning in S3
type
- everything in "workflows" shows up in "receipts" so this reduces the amount of data scanned by ~50%op
to allow us to create tables that only query a specific operation (ie,store/add
orprovider/add
) - this lets us add operation-specific Glue table schemas with much less clutter in result types than we'd need if we tried to defined all possible inputs and outputs in a single tableuse these partitions to implement standalone tables for receipts in general and the
store/add
,upload/add
andprovider/add
UCANs specifically,add queries that demonstrate the use of all of these tables
add dynamo connector so we can join the UCAN logs to our Dynamo tables in queries
add queries that demonstrate using the Dynamo and Glue tables together