-
Notifications
You must be signed in to change notification settings - Fork 16
feat: graceful URI -> URN migration #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
|
Looks like the problem is in duckdb extension's json to substrait conversion, it is too strict and fails on unknown json fields. you can get around it by switching to use that being said, we should probably open a ticket on duckdb extension as well to loosen json conversion checks. |
Opened up an issue here: substrait-io/duckdb-substrait-extension#166 |
e3fe8cd to
6c8cddb
Compare
|
@benbellick Can you introduce |
becf09b to
70b193f
Compare
135e586 to
e24c5a9
Compare
d145e28 to
f6201bb
Compare
tokoko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
P.S. it'd be good to run make antlr after substrait upgrades as well, but it needs some setting up (like in devcontainer) so we can do that separately.
I guess this step is still missing in this PR before we can merge? |
Note, this is not passing any of the tests, and more work is necessary.
Had to temporarily disable the duckdb extension tests until the dependency on the duckdb-substrait-extension can handle URNs.
This PR introduces handling of migrating from the usage of URI to URN for extension references. As an intermediate step, both URI and URN and emitted from produced plans. Closes #95
the duckdb extension throws an error when unexpected fields are present in JSON on invocation of `from_substrait_json`. So we instead switch to using `from_substrait`.
It is not actually set anywhere and isn't in the substrait spec.
dce4c13 to
975952d
Compare
|
I have rebased off of main and run |
| from google.protobuf import symbol_database as _symbol_database | ||
| from google.protobuf.internal import builder as _builder | ||
| _runtime_version.ValidateProtobufRuntimeVersion(_runtime_version.Domain.PUBLIC, 5, 29, 5, '', 'proto/algebra.proto') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this line got removed it seems you did not using the same protoc version as the devcontainer has been configured to use. we had the same observation in this closed PR: #113 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the devcontainer uses protoc v29.5 which is it build for protobuf v5.29.5 which is the most recent compatible version with the protoletariat tool used in the proto code generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could add a Github Actions check that runs the codegen and tests whether the git workspace does not contain any uncommitted changes to help prevent missing to run codegen with the right versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's too much code generation going on, we'll probably have hard time keeping track of all the versions in two places unless we build/run devcontainer in a gh action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, makes sense to do it via the devcontainer by building that using the Dockerfile during Github Actions and run the codegen and verification inside the container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this issue for updating the github actions build: #120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we can improve the build in this PR that would be amazing. otherwise we do it in a follow-up
| # generated by datamodel-codegen: | ||
| # filename: simple_extensions_schema.yaml | ||
| # timestamp: 2025-06-06T08:43:35+00:00 | ||
| # timestamp: 2025-10-24T17:55:30+00:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could disable adding the timestamp in Makefile by adding the --disable-timestamp flag to make repeated make codegen-extensions runs produce reproducible outputs
b29f1d2 to
9e7cb06
Compare
251389e to
d2f638a
Compare
| # Generate the new python protobuf files | ||
| buf generate | ||
| protol --in-place --create-package --python-out "$dest_dir" buf | ||
| uv run protol --in-place --create-package --python-out "$dest_dir" buf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to make CI/CD not complain about missing protol
nielspardon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for also helping with adding the codegen verification
tokoko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to merge, thanks for bearing with us :)
Introduces a graceful migration from the previous usage of URIs to URNs
ExtensionRegistryCloses #95, #120, #121