feat: Add different output support to queries #24616

mgattozzi · 2024-01-31T17:26:54Z

This commit adds the ability to choose the output format of a query via
the v3 api so that a user can choose, whether by Accept headers or the
format url param, how the data will be returned to them.

Prior to this commit the default was a pretty printed text format, but
that instead has been changed to json as the default.

There are multiple formats one can choose:

json
csv
pretty printed text
parquet

I've tested each of these out and it works well. In particular the
parquet output is exciting as users will be able to perform a query and
receive back parquet data that they can then load into say a Python
script or something else to work on and operate it. As we extend what
data can be queried, as well as persisting it, what people will be able
to do with Edge will be really cool and I'm interested to see how users
will end up using this functionality in the future.

Note @pauldix that I'm opening this up as a draft until #24605 is merged as this work depends on it. I'll rebase and remove that code from here once it is. The most relevant bits of the PR are in influxdb3_server/src/http.rs if you want to look at it before the other PR is in.

pauldix

Overall looks good, but can you add tests to confirm the responses? You should be able to mock out the WriteBuffer and the QueryExecutor to use fake data.

mgattozzi · 2024-02-01T17:10:09Z

I updated the test to check each format. However, I'm only uncertain about the check on the host for the parquet file one. The other values seem to change fine and I can test for them if I change the input data, but if I change the host name it does not work. I'm okay with the changes overall, but that test I feel a bit iffy on, but I can also serialize the parquet data to a RecordBatch so overall I think it's fine.

pauldix · 2024-02-05T15:56:05Z

influxdb3_server/src/lib.rs

+        let res = query(&server, "foo", "select * from cpu", "json", None).await;
+        let body = body::to_bytes(res.into_body()).await.unwrap();
+        let actual = std::str::from_utf8(body.as_bytes()).unwrap();
+        let expected = r#"[{"host":"a","time":"1970-01-01T00:00:00.000000123","val":1}]"#;


The JSON output is actually supposed to be JSONL, which doesn't have the wrapping brackets []. To really confirm that output, you'd ideal have at least two rows come back in the result so you can validate that it outputs one row per line.

JSON Lines: https://jsonlines.org/

We talked about this offline and decided to just open an issue for this -> #24654

mgattozzi · 2024-02-08T19:59:03Z

Hey @pauldix I updated the PR and it now successfully handles the check for parquet files as well. The data was just stored a little deeper it seems. We should be all good now!

This commit adds the ability to choose the output format of a query via the v3 api so that a user can choose, whether by Accept headers or the format url param, how the data will be returned to them. Prior to this commit the default was a pretty printed text format, but that instead has been changed to json as the default. There are multiple formats one can choose: 1. json 2. csv 3. pretty printed text 4. parquet I've tested each of these out and it works well. In particular the parquet output is exciting as users will be able to perform a query and receive back parquet data that they can then load into say a Python script or something else to work on and operate it. As we extend what data can be queried, as well as persisting it, what people will be able to do with Edge will be really cool and I'm interested to see how users will end up using this functionality in the future.

mgattozzi · 2024-02-12T16:01:40Z

Rebased off main and fixed the lint problem with clippy. I'll merge afterwords

This commit adds the ability to choose the output format of a query via the v3 api so that a user can choose, whether by Accept headers or the format url param, how the data will be returned to them. Prior to this commit the default was a pretty printed text format, but that instead has been changed to json as the default. There are multiple formats one can choose: 1. json 2. csv 3. pretty printed text 4. parquet I've tested each of these out and it works well. In particular the parquet output is exciting as users will be able to perform a query and receive back parquet data that they can then load into say a Python script or something else to work on and operate it. As we extend what data can be queried, as well as persisting it, what people will be able to do with Edge will be really cool and I'm interested to see how users will end up using this functionality in the future.

mgattozzi requested a review from pauldix January 31, 2024 17:26

pauldix requested changes Feb 1, 2024

View reviewed changes

mgattozzi force-pushed the mgattozzi/formats branch from c23d6ac to 16c4475 Compare February 1, 2024 16:15

mgattozzi marked this pull request as ready for review February 1, 2024 16:26

mgattozzi requested a review from pauldix February 2, 2024 16:02

pauldix reviewed Feb 5, 2024

View reviewed changes

mgattozzi mentioned this pull request Feb 8, 2024

Support JSON Lines in InfluxDB Edge #24654

Open

mgattozzi requested a review from pauldix February 8, 2024 19:56

mgattozzi force-pushed the mgattozzi/formats branch from 7874dea to 10d7d2f Compare February 8, 2024 19:58

pauldix approved these changes Feb 9, 2024

View reviewed changes

mgattozzi force-pushed the mgattozzi/formats branch from 10d7d2f to 2c56d32 Compare February 12, 2024 16:01

mgattozzi merged commit b555ddf into main Feb 12, 2024
11 checks passed

mgattozzi deleted the mgattozzi/formats branch February 12, 2024 17:04

mgattozzi added the v3 label Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add different output support to queries #24616

feat: Add different output support to queries #24616

mgattozzi commented Jan 31, 2024 •

edited

pauldix left a comment

mgattozzi commented Feb 1, 2024

pauldix Feb 5, 2024

pauldix Feb 5, 2024

mgattozzi Feb 8, 2024

mgattozzi commented Feb 8, 2024

mgattozzi commented Feb 12, 2024

feat: Add different output support to queries #24616

feat: Add different output support to queries #24616

Conversation

mgattozzi commented Jan 31, 2024 • edited

pauldix left a comment

Choose a reason for hiding this comment

mgattozzi commented Feb 1, 2024

pauldix Feb 5, 2024

Choose a reason for hiding this comment

pauldix Feb 5, 2024

Choose a reason for hiding this comment

mgattozzi Feb 8, 2024

Choose a reason for hiding this comment

mgattozzi commented Feb 8, 2024

mgattozzi commented Feb 12, 2024

mgattozzi commented Jan 31, 2024 •

edited