Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for inline-ing metastore over Flight #531

Merged
merged 4 commits into from
Jun 19, 2024
Merged

Conversation

gruuya
Copy link
Contributor

@gruuya gruuya commented Jun 18, 2024

What

Adds support for discrete, temporary, metastores inlined with queries called over arrow flight. These can be used to resolve identifiers parsed from the query in an ad-hoc manner.

How

  • introduce a MemoryStore and implement all store traits for it
  • encapsulate the ListSchemaResponse from regular clade metastore in a new message called InlineMetastoreCommandStatementQuery that also holds the query
  • now when get_sql_info is called arrow fails to pattern match against known commands and uses the fallback method
  • there we simply instantiate the MemoryStore for a one-off context that serves the query

let config_text = format!(
r#"
[object_store]
type = "memory"
{object_store_section}

[catalog]
type = "sqlite"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the only thing left in this PR is making this catalog optional (expect the metastore to be provided inline if it isn't specified) and erroring out if the user tries to run DDL (creating/deleting tables), which we might be getting automatically because of the not_impl() thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making it optional and default to in-mem sqlite?

One can still provide the inline metastores, but we also support the full DDL spectrum, albeit ephemerally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, but kind of awkward? Since if you first create a table with the inline metastore in use (e.g. use some CREATE TABLE AS statement), you wouldn't be able to query it unless you stopped providing the inline metastore, so I don't yet see how useful that would be? It's the responsibility of the caller to be the metastore/catalog if the inline metastore is being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, well you can't create a table with the inline metastore at all, only reads are supported. This is because we implement only the basic SchemaStore for the MemoryStore, so we the table creation metastore calls would error out (due to not_impl).

Copy link
Contributor

@mildbyte mildbyte Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you meant just defaulting to the in-memory metastore if the inline metastore isn't provided. I don't really have strong feelings about it. I guess in the Seafowl configuration where you're using it with the inline metastore it'd surface errors to the developer faster if they forgot to provide it on some code path instead of quietly using the in-mem SQLite one -- that would be my only argument for erroring out.

Comment on lines +122 to +126
let cmd = InlineMetastoreCommandStatementQuery {
query,
schemas: Some(schemas),
};
get_flight_batches_inner(client, cmd.as_any()).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to see that the usage example for this is very straightforward

@@ -74,6 +77,14 @@ impl SeafowlFlightHandler {
self.context.clone()
};

if let Some(memory_store) = memory_store {
// If metastore was inlined with the query use it for resolving tables/locations
ctx = ctx.with_metastore(Arc::new(Metastore::new_from_memory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we might need to "merge" the in-memory and the real from-config metastore and fall back to the latter metastore if there's a table we can't find in the former, but we can do this in the separate PR if need be

@gruuya gruuya merged commit 48262cb into main Jun 19, 2024
1 check passed
@gruuya gruuya deleted the inline-metastore branch June 19, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants