Handle records and arrays #145

ArielSSchwartz · 2017-03-25T05:43:11Z

Can extract_data be changed to support optionally returning structured JSON data instead of a data.frame? This is useful when the query results includes ARRAYs and STRUCTs.

The text was updated successfully, but these errors were encountered:

hadley · 2017-04-18T13:19:39Z

I think it should return a data frame with list columns in that scenario. Could you provide a minimal reprex of such data?

ArielSSchwartz · 2017-04-19T22:13:10Z

Here is an example query:

SELECT * FROM [bigquery-public-data:samples.github_nested] WHERE repository.owner IN ('hadley') LIMIT 100

Rather than having to flatten repeated fields (ARRAYS), I would like to capture the JSON output and load it into a structured list in R.

hadley · 2017-04-19T22:31:59Z

Thanks, that's useful. I think it should be trivial to return this data as a list-column.

hadley · 2017-04-20T12:40:50Z

Even with the query explorer, and this simpler query:

SELECT repository 
FROM [bigquery-public-data.samples.github_nested]
WHERE repository.owner IN ('hadley') LIMIT 100

I had to check "allow large results" and uncheck "flatten" results.

Do you have an example of this working elsewhere? (I just want to double check that it is really this hard; I thought it would be easier)

ArielSSchwartz · 2017-04-20T19:07:38Z

Yes. This this the only way I know to get nested JSON results from BigQuery Web UI.
Using the command line tools you can try something like:
bq --format prettyjson query "SELECT repository.* FROM [bigquery-public-data.samples.github_nested] WHERE repository.owner IN ('hadley') LIMIT 100"

craigcitro · 2017-04-20T20:46:29Z

You can also use the UI -- use standard SQL, and click the button to view results as JSON.

The --format flag was, in fact, designed for precisely this use case, so kudos for finding it. 😁

hadley · 2018-04-06T21:45:25Z

Reprex with latest (in-dev) API:

tb <- bq_project_query(bq_test_project(), 
  "SELECT * FROM publicdata.samples.github_nested WHERE repository.owner IN ('hadley') LIMIT 100",
  use_legacy_sql = FALSE
)
df <- bq_table_download(tb)

It's a bit harder than I had ancticipated thanks to the extreme nesting of the json.

hadley · 2018-04-08T17:20:02Z

Focussing the query a little to get a smaller selection repeated and non-repeated records.

tb <- bq_project_query(bq_test_project(), 
  "SELECT repository, type, payload.member, payload.shas
    FROM publicdata.samples.github_nested 
    WHERE repository.owner IN ('hadley') 
    LIMIT 100",
  use_legacy_sql = FALSE
)

zippeurfou · 2018-04-12T17:37:09Z

Here is another example:

query <- "SELECT APPROX_QUANTILES(x, 4) AS output, 2 as t FROM UNNEST(GENERATE_ARRAY(1, 100)) AS x"
mydf <- query_exec(query, project = project, use_legacy_sql = FALSE, max_page = Inf)

> mydf
  output  t
1      1 25
2     50 75
3    100  2

vs what it look like from the UI

and the json

[
  {
    "output": [
      "1",
      "25",
      "50",
      "75",
      "100"
    ],
    "t": "2"
  }
]

Doing the same query the way @hadley did trigger an error:

> tb <- bq_project_query(project,query,use_legacy_sql=FALSE)
Complete
Billed: 0 B
> df <- bq_table_download(tb)
Error in bq_tabledata_to_list(json) :
  embedded nul in string: '\001\0\0\0\001'

hadley · 2018-04-13T00:06:25Z

Thanks @zippeurfou here's an even more minimal example for testing:

tb1 <- bq_project_query(bq_test_project(), "SELECT STRUCT(1 AS a, 'abc' AS b) as x")
bq_table_download(tb1)

tb2 <- bq_project_query(bq_test_project(), "SELECT GENERATE_ARRAY(1, 10) as x")
bq_table_download(tb2)

Parse JSON in C++: this considerably improves performance and adds full support for arrays, records, and arrays of records. Fixes #145

bbhoss · 2019-02-26T15:37:19Z

It seems that the schema generator doesn't support saving lists back to bigquery as a repeated field? Am I missing something or should I open a ticket?

hadley · 2019-02-26T15:55:51Z

@bbhoss please file a new issue with simple reprex.

ArielSSchwartz changed the title ~~Return JSON structured data instead of a data_frame~~ Return JSON structured data instead of a data.frame Mar 25, 2017

hadley added the reprex needs a minimal reproducible example label Apr 18, 2017

hadley added feature a feature request or enhancement and removed reprex needs a minimal reproducible example labels Apr 20, 2017

This was referenced Mar 6, 2018

Error: don't know how to convert type record #183

Closed

Error while trying to get nested data #168

Closed

hadley changed the title ~~Return JSON structured data instead of a data.frame~~ Handle records and arrays Apr 12, 2018

hadley mentioned this issue Apr 17, 2018

Parse JSON in C++ #228

Merged

hadley closed this as completed in #228 Apr 17, 2018

hadley added a commit that referenced this issue Apr 17, 2018

Merge pull request #228 from r-dbi/full-parse

164d8c0

Parse JSON in C++: this considerably improves performance and adds full support for arrays, records, and arrays of records. Fixes #145

zippeurfou mentioned this issue Apr 18, 2018

package dependency not met? #230

Closed

nbenn mentioned this issue Nov 1, 2021

Specialised data types in R r-dbi/dbi3#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle records and arrays #145

Handle records and arrays #145

ArielSSchwartz commented Mar 25, 2017

hadley commented Apr 18, 2017

ArielSSchwartz commented Apr 19, 2017

hadley commented Apr 19, 2017

hadley commented Apr 20, 2017

ArielSSchwartz commented Apr 20, 2017

craigcitro commented Apr 20, 2017

hadley commented Apr 6, 2018

hadley commented Apr 8, 2018

zippeurfou commented Apr 12, 2018

hadley commented Apr 13, 2018

bbhoss commented Feb 26, 2019

hadley commented Feb 26, 2019

Handle records and arrays #145

Handle records and arrays #145

Comments

ArielSSchwartz commented Mar 25, 2017

hadley commented Apr 18, 2017

ArielSSchwartz commented Apr 19, 2017

hadley commented Apr 19, 2017

hadley commented Apr 20, 2017

ArielSSchwartz commented Apr 20, 2017

craigcitro commented Apr 20, 2017

hadley commented Apr 6, 2018

hadley commented Apr 8, 2018

zippeurfou commented Apr 12, 2018

hadley commented Apr 13, 2018

bbhoss commented Feb 26, 2019

hadley commented Feb 26, 2019