Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Metabase truncates CSV/xlsx downloads with large data volume. #6960

Open
hden opened this Issue Feb 15, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@hden
Copy link

hden commented Feb 15, 2018

Metabase truncates CSV/xlsx downloads with large data volume.

Minimal required SQL

#standardsql
select *
from `bigquery-public-data.samples.github_timeline`
limit 20000

wc -l query_result_2018-02-15T09_17_51.084Z.csv expected 20,000 rows, actual 1,759 rows in our system.


  • Your browser and the version: Chrome 63.0
  • Your operating system: OS X 10
  • Your databases: BigQuery
  • Metabase version: v0.28.0
  • Metabase hosting environment: jar
  • Metabase internal database: MySQL

@hden

This comment has been minimized.

Copy link
Author

hden commented Jul 10, 2018

@salsakran Any update on this?

@almegeddon

This comment has been minimized.

Copy link

almegeddon commented Mar 5, 2019

We're seeing the same issue when trying to download large-ish .csv files (~77k rows, well under the 1 million row limit for CSVs).

Digging into the BigQuery driver (bigquery.clj) it looks like the post-process-native function is not properly handling paged results. I hacked in a this logging statement to see if I could prove this theory:

(log/info (u/format-color 'cyan (str "BigQuery QueryResponse details, Total rows: " (.getTotalRows response) " Fetched Rows: " (.size (.getRows response)) " Page Token: " (.getPageToken response))))

and I'm seeing fetched rows less than the total rows available when downloading full results via .csv:

03-05 13:44:03 INFO driver.bigquery :: BigQuery QueryResponse details, Total rows: 77699 Fetched Rows: 31818 Page Token: BFSFOYSPNEAQAAASA4EAAEEAQCAAKGQIBDFPQAIQUCGQMIFQVYKQ====

The process-native & post-process-native functions need to check the response for a page token and paginate the results accordingly when present. My guess is in my case (as well as @hden's), the result set is truncated due to BQs Query Job default results size limit Jobs:query.maxResults:

[Optional] The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

This issue is a blocker for BigQuery - even when results are limited by the UI's imposed 10k limit. Depending on the content of your results, it is entirely feasible for a result set with less than 10k rows getting truncated.

@setyven

This comment has been minimized.

Copy link

setyven commented Mar 13, 2019

any visibility when this will be looked into / fixed?

@almegeddon

This comment has been minimized.

Copy link

almegeddon commented Mar 13, 2019

I've implemented changes locally to page results for csv/xlsx/json downloads. Hoping to create a pull request today.

@setyven

This comment has been minimized.

Copy link

setyven commented Mar 13, 2019

Nice! hopefully it can be added here and make it to 0.32 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.