Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query causes bigrquery to fail, likely due to page_size memory issue #169

Closed
ras44 opened this issue Jul 17, 2017 · 9 comments
Closed

query causes bigrquery to fail, likely due to page_size memory issue #169

ras44 opened this issue Jul 17, 2017 · 9 comments
Labels
api 🕸️ bug an unexpected problem or unintended behavior

Comments

@ras44
Copy link
Contributor

ras44 commented Jul 17, 2017

I have a table with 128 columns and about 10,000 rows in bigquery. In R Studio running on a Google Cloud instance, I have been assigning the result of a query to a variable with:

tm_orig <- query_exec("SELECT *
  FROM my_dataset.my_table
;
", project='my-project');

This has worked for the past few weeks with no problems. Today, I received the error:

0 bytes processed
Error: Internal Error

Solution:
If we change page_size to a smaller value (default is 10,000) the query succeeds, supposedly because results are spread across multiple page requests, and whatever (possibly memory issue?) is causing the internal error is not triggered. I'm skeptical as to whether it's a system memory issue, because I can continue to assign values to other variables in R Studio with no problems. It seems like there is an issue with iterating through the pages in the response.

If this paging exception were caught, outputting a "decrease page_size" suggestion might help others.

@YvesCR
Copy link

YvesCR commented Aug 9, 2017

I had the same error and changing the page_size to a smaller value solved my issue

@mineshkumar
Copy link

Changing the page_size is not helping me. I've 20,000 rows with 175 columns. It processes some megabytes and then shows

Error: Internal Error

@hadley
Copy link
Member

hadley commented Mar 28, 2018

Can someone please attempt to create a reprex using publicly available data? Unfortunately the chances of me being able to locate and fix the underlying issue are slim.

@hadley hadley added the reprex needs a minimal reproducible example label Mar 28, 2018
@hadley
Copy link
Member

hadley commented Mar 28, 2018

Oh this is actually a duplicate of #209

@hadley hadley closed this as completed Mar 28, 2018
@ras44
Copy link
Contributor Author

ras44 commented Apr 3, 2018

#209 is actually an issue related to handling the response from the GBQ API. It is resolved (98dec6f)by outputting the "responseTooLarge" reason received from the API.

This issue is related to a memory error on the local machine while attempting to parse the response received with the default page_size=10,000. A reprex is available-memory-dependent, but by increasing the page_size I was able to reproduce it with the following on a machine with 4 GB of RAM:


query <- "
  SELECT *
  FROM `fh-bigquery.dbpedia.place`
"

query_exec(project=project,
  query=query,
  max_pages=Inf,
  page_size=500000,
  use_legacy_sql=FALSE
)

@hadley hadley added bug an unexpected problem or unintended behavior api 🕸️ and removed reprex needs a minimal reproducible example labels Apr 3, 2018
@hadley hadley reopened this Apr 3, 2018
@hadley
Copy link
Member

hadley commented Apr 3, 2018

@ras44 thanks! I'll look into it, but it's unlikely I'll be able to do much more than give an informative error.

@ras44
Copy link
Contributor Author

ras44 commented Apr 6, 2018

@hadley I attempted to reproduce this again today and it looks like it's not a memory issue: bigrquery is actually receiving a response from the GBQ API. With the results of #209 and 98dec6f, bigrquery now outputs the "Response too large" reason, which should help people understand that they need to reduce the page_size. I think we can close this one again if you agree.

@ras44 ras44 closed this as completed Apr 6, 2018
@hadley
Copy link
Member

hadley commented Apr 6, 2018

It's also possible that you need to explicitly save to a temporary table: https://cloud.google.com/bigquery/docs/writing-results#large-results

@ras44
Copy link
Contributor Author

ras44 commented Apr 9, 2018

I can confirm that in both cases (when I create a temporary table or just run the query) I receive the "responseTooLarge" reason from the API. It's as if the API has an internal limit with which it decides whether a response is too large. In either case, I think it's safe to say this is not an issue with R and the response is being handled appropriately now with the reason included in the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api 🕸️ bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants