New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add different output support to queries #24616
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but can you add tests to confirm the responses? You should be able to mock out the WriteBuffer and the QueryExecutor to use fake data.
c23d6ac
to
16c4475
Compare
I updated the test to check each format. However, I'm only uncertain about the check on the host for the parquet file one. The other values seem to change fine and I can test for them if I change the input data, but if I change the host name it does not work. I'm okay with the changes overall, but that test I feel a bit iffy on, but I can also serialize the parquet data to a RecordBatch so overall I think it's fine. |
let res = query(&server, "foo", "select * from cpu", "json", None).await; | ||
let body = body::to_bytes(res.into_body()).await.unwrap(); | ||
let actual = std::str::from_utf8(body.as_bytes()).unwrap(); | ||
let expected = r#"[{"host":"a","time":"1970-01-01T00:00:00.000000123","val":1}]"#; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON output is actually supposed to be JSONL, which doesn't have the wrapping brackets []
. To really confirm that output, you'd ideal have at least two rows come back in the result so you can validate that it outputs one row per line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON Lines: https://jsonlines.org/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this offline and decided to just open an issue for this -> #24654
7874dea
to
10d7d2f
Compare
Hey @pauldix I updated the PR and it now successfully handles the check for parquet files as well. The data was just stored a little deeper it seems. We should be all good now! |
This commit adds the ability to choose the output format of a query via the v3 api so that a user can choose, whether by Accept headers or the format url param, how the data will be returned to them. Prior to this commit the default was a pretty printed text format, but that instead has been changed to json as the default. There are multiple formats one can choose: 1. json 2. csv 3. pretty printed text 4. parquet I've tested each of these out and it works well. In particular the parquet output is exciting as users will be able to perform a query and receive back parquet data that they can then load into say a Python script or something else to work on and operate it. As we extend what data can be queried, as well as persisting it, what people will be able to do with Edge will be really cool and I'm interested to see how users will end up using this functionality in the future.
10d7d2f
to
2c56d32
Compare
Rebased off main and fixed the lint problem with clippy. I'll merge afterwords |
This commit adds the ability to choose the output format of a query via the v3 api so that a user can choose, whether by Accept headers or the format url param, how the data will be returned to them. Prior to this commit the default was a pretty printed text format, but that instead has been changed to json as the default. There are multiple formats one can choose: 1. json 2. csv 3. pretty printed text 4. parquet I've tested each of these out and it works well. In particular the parquet output is exciting as users will be able to perform a query and receive back parquet data that they can then load into say a Python script or something else to work on and operate it. As we extend what data can be queried, as well as persisting it, what people will be able to do with Edge will be really cool and I'm interested to see how users will end up using this functionality in the future.
This commit adds the ability to choose the output format of a query via
the v3 api so that a user can choose, whether by Accept headers or the
format url param, how the data will be returned to them.
Prior to this commit the default was a pretty printed text format, but
that instead has been changed to json as the default.
There are multiple formats one can choose:
I've tested each of these out and it works well. In particular the
parquet output is exciting as users will be able to perform a query and
receive back parquet data that they can then load into say a Python
script or something else to work on and operate it. As we extend what
data can be queried, as well as persisting it, what people will be able
to do with Edge will be really cool and I'm interested to see how users
will end up using this functionality in the future.
Note @pauldix that I'm opening this up as a draft until #24605 is merged as this work depends on it. I'll rebase and remove that code from here once it is. The most relevant bits of the PR are ininfluxdb3_server/src/http.rs
if you want to look at it before the other PR is in.