Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add different output support to queries #24616

Merged
merged 1 commit into from Feb 12, 2024
Merged

Conversation

mgattozzi
Copy link
Contributor

@mgattozzi mgattozzi commented Jan 31, 2024

This commit adds the ability to choose the output format of a query via
the v3 api so that a user can choose, whether by Accept headers or the
format url param, how the data will be returned to them.

Prior to this commit the default was a pretty printed text format, but
that instead has been changed to json as the default.

There are multiple formats one can choose:

  1. json
  2. csv
  3. pretty printed text
  4. parquet

I've tested each of these out and it works well. In particular the
parquet output is exciting as users will be able to perform a query and
receive back parquet data that they can then load into say a Python
script or something else to work on and operate it. As we extend what
data can be queried, as well as persisting it, what people will be able
to do with Edge will be really cool and I'm interested to see how users
will end up using this functionality in the future.

Note @pauldix that I'm opening this up as a draft until #24605 is merged as this work depends on it. I'll rebase and remove that code from here once it is. The most relevant bits of the PR are in influxdb3_server/src/http.rs if you want to look at it before the other PR is in.

Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but can you add tests to confirm the responses? You should be able to mock out the WriteBuffer and the QueryExecutor to use fake data.

@mgattozzi mgattozzi marked this pull request as ready for review February 1, 2024 16:26
@mgattozzi
Copy link
Contributor Author

I updated the test to check each format. However, I'm only uncertain about the check on the host for the parquet file one. The other values seem to change fine and I can test for them if I change the input data, but if I change the host name it does not work. I'm okay with the changes overall, but that test I feel a bit iffy on, but I can also serialize the parquet data to a RecordBatch so overall I think it's fine.

let res = query(&server, "foo", "select * from cpu", "json", None).await;
let body = body::to_bytes(res.into_body()).await.unwrap();
let actual = std::str::from_utf8(body.as_bytes()).unwrap();
let expected = r#"[{"host":"a","time":"1970-01-01T00:00:00.000000123","val":1}]"#;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON output is actually supposed to be JSONL, which doesn't have the wrapping brackets []. To really confirm that output, you'd ideal have at least two rows come back in the result so you can validate that it outputs one row per line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this offline and decided to just open an issue for this -> #24654

@mgattozzi
Copy link
Contributor Author

Hey @pauldix I updated the PR and it now successfully handles the check for parquet files as well. The data was just stored a little deeper it seems. We should be all good now!

This commit adds the ability to choose the output format of a query via
the v3 api so that a user can choose, whether by Accept headers or the
format url param, how the data will be returned to them.

Prior to this commit the default was a pretty printed text format, but
that instead has been changed to json as the default.

There are multiple formats one can choose:

1. json
2. csv
3. pretty printed text
4. parquet

I've tested each of these out and it works well. In particular the
parquet output is exciting as users will be able to perform a query and
receive back parquet data that they can then load into say a Python
script or something else to work on and operate it. As we extend what
data can be queried, as well as persisting it, what people will be able
to do with Edge will be really cool and I'm interested to see how users
will end up using this functionality in the future.
@mgattozzi
Copy link
Contributor Author

Rebased off main and fixed the lint problem with clippy. I'll merge afterwords

@mgattozzi mgattozzi merged commit b555ddf into main Feb 12, 2024
11 checks passed
@mgattozzi mgattozzi deleted the mgattozzi/formats branch February 12, 2024 17:04
@mgattozzi mgattozzi added the v3 label Feb 12, 2024
hiltontj pushed a commit that referenced this pull request Feb 12, 2024
This commit adds the ability to choose the output format of a query via
the v3 api so that a user can choose, whether by Accept headers or the
format url param, how the data will be returned to them.

Prior to this commit the default was a pretty printed text format, but
that instead has been changed to json as the default.

There are multiple formats one can choose:

1. json
2. csv
3. pretty printed text
4. parquet

I've tested each of these out and it works well. In particular the
parquet output is exciting as users will be able to perform a query and
receive back parquet data that they can then load into say a Python
script or something else to work on and operate it. As we extend what
data can be queried, as well as persisting it, what people will be able
to do with Edge will be really cool and I'm interested to see how users
will end up using this functionality in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants