Skip to content

Config datafusion query with schema #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 8, 2022

Conversation

trueleo
Copy link
Contributor

@trueleo trueleo commented Sep 8, 2022

Description

Instead of pre listing all the valid prefixes before query execution and letting datafusion infer the schema, it is better to pass the schema we already have for a given stream and let datafusion do all the heavy lifting. In case stream info does not have a schema for a given stream then we return early with Error suggesting they need to post events to this logstream first.

Goal

  • Avoid unnecessary network calls as Datafusion underneath does that and ignores if file is not present.

Solution

Check for schema in metadata and add it to query itself which then can be used for query


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Instead of pre listing all the valid prefixes before query execution and letting datafusion infer the schema, it is better to pass the schema we already have for a given stream and let datafusion do all the heavy lifting. In case stream info does not have a schema for a given stream then we return early with Error suggesting they need to post events to this logstream first.
@trueleo trueleo requested a review from nitisht September 8, 2022 14:31
@nitisht nitisht merged commit a7abb5d into parseablehq:main Sep 8, 2022
@trueleo trueleo deleted the s3_query_listing branch September 9, 2022 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants