Option for downloading CSV with no rows limit #3810

ant1j · 2016-11-24T22:30:07Z

It would be great to facilitate data extraction through Metabase.

A few potential enhancements to do so:

Choose columns to display/export (already covered by Allow users to select fields to be displayed in the table #1445)
Have a LIMIT to the query for visualisation update, and choose to remove this LIMIT when exporting
=> this is possible today (but not so user-friendly) by setting the limit, hit "get answer" to check data, and remove the LIMIT options before clicking on the export to CSV icon
Allow additional options to CSV export (remove LIMIT condition)

ibushong · 2016-12-16T23:31:35Z

Not being able to download all rows is a pretty big issue for us. Having something like a "download all" button would be great.

axb21 · 2016-12-21T17:18:23Z

I'd also like an option to download CSV files with at least a higher row limit than the current 10,000. I can see the rationale for the limit, but I am exploring using metabase as a way to deliver insights to non-technical people in my company, and every now and then they'll generate a result set with slightly more than 10,000 rows. The most recent one was 13,000 rows or so, and it was frustrating that they could not hit a button and download the whole thing. Just a little too big.

I am not familiar at all with the metabase codebase, having literally just looked at it minutes ago. However, I hacked a bit and found that changing one line in src/metabase/api/dataset.clj (as of commit 4844ecc) sufficed for my purposes, at least as far as I'm able to tell so far. I have attached a patch if you're curious. The runnable jar file is easy to build following the developer notes and so far it's running fine in my environment and people are happily downloading CSV files with >10k rows. Again, I have no clue whether this will wreak havoc on the rest of the codebase, so word of warning.

up-download-row-limit..txt

hugocar · 2016-12-23T10:26:26Z

The fact the limit wasn't enforced in version 0.20 is causing troubles for me. Users, where I work, were beginning to get used to export big data sets and are seeing this limit as a bug/regression. I'm not sure what to tell them.

tlrobinson · 2016-12-23T16:25:35Z

Confirmed this is a regression in v0.21.0. Downloads of the Orders table in the sample dataset are limited to 2,000 rows even though the copy in the UI says "The maximum download size is 1 million rows.". In v0.20.3 I get all 17,624 rows.

salsakran · 2016-12-31T02:10:50Z

@tlrobinson I think this is happening on the frontend.

I'm fairly certain the query dictionary being serialized for POST params in https://github.com/metabase/metabase/blob/master/frontend/src/metabase/query_builder/components/DownloadWidget.jsx#L32 includes the limits.

tlrobinson · 2017-01-01T18:30:59Z

@salsakran I think you're right for unsaved questions. It looks like we started using json_query from the results, which includes the constraints object. If we strip out that object it downloads all rows. PR at #4037.

However, saved questions are still limited to 2k rows by the backend (the frontend doesn't send the query at all in that case).

axb21 · 2017-01-01T18:43:11Z

@tlrobinson I think you're right about that. If you have a look at the patch I posted earlier, you'll see I changed code on the backend that allowed me to retrieve >2,000 rows from a saved question; I neglected to mention in my previous comment that my observations concerned saved questions. Anyway, I'm not steeped in this codebase, but this all suggests to me that at least in some cases the backend is controlling how many rows are ever sent to the frontend.

tlrobinson · 2017-01-01T18:52:02Z

@axb21 Thanks for looking into this. The constant you modified is the limit for normal queries (in the query builder and dashboards), but we want to ignore that for CSV downloads and use the 1 million row limit instead. Also I think your change will only increase the limit for aggregated queries, not "bare rows", which has a separate limit of 2000. We don't actually want to modify either of those limits.

axb21 · 2017-01-01T19:01:14Z

@tlrobinson gotcha, that makes sense. One thing I can say that might be of interest: I searched through the source for the number 10,000--the largest number of rows I was ever able to download--and only found that number in a handful of places:

$ find src/ -type f -exec egrep -nil '\b10000\b' {} \;
src/metabase/util/stats.clj
src/metabase/api/dataset.clj
src/metabase/driver/googleanalytics/query_processor.clj
src/metabase/driver.clj

The patch I posted changes that number in src/metabase/api/dataset.clj, which I understand isn't going to work. The other places where 10000 appears did not look (to my naive eyes) like the right places to make a change like this.

salsakran · 2017-01-01T20:50:25Z

@tlrobinson I don't think the front end should control how many rows get sent back. This is a limit in place for the backend to remain stable, and it should control how many rows get sent back.

I'll keep poking, I stopped there as I thought that was the root bug. =(

tlrobinson · 2017-01-01T21:13:46Z

@salsakran My PR already prevents the frontend from sending the constraints object. Are you saying the backend should not respect constraints passed in the request as well?

The following endpoints override constraints with the hard-coded query-constraints (10k grouped rows / 2k bare rows)

POST /api/dataset - unsaved question qb/dashboard endpoint
POST /api/card/:card-id/query - saved question qb/dashboard endpoint
POST /api/card/:card-id/query/csv - saved question CSV download
POST /api/card/:card-id/query/json - saved question JSON download

(those last 2 are why the saved question download endpoints are still limited to 10k/2k rows)

And the following do not (allowing constraints to pass through from the request)

POST /api/dataset/csv - unsaved question CSV download
POST /api/dataset/json - unsaved question JSON download

Perhaps we should just change all the CSV/JSON download endpoints to discard constraints and not replace with query-constraints, so the query processor will use absolute-max-results (1048576, apparently the same as Excel https://github.com/metabase/metabase/blob/master/src/metabase/query_processor.clj#L28-L33 )

tlrobinson · 2017-01-01T21:20:06Z

@axb21 The 10000 constant you found (max-results) is the correct one, but I'm saying that limit shouldn't be applied for the download endpoints at all.

mmerchant · 2017-01-09T22:32:31Z

@tlrobinson Just to clarify -- on the CSV export it is intended to limit the export to 10,000 rows? Or is that constraint removed? I tried on the v0.22.0-rc1 tag and do still run into the 10K row limit on CSV exports.

tlrobinson · 2017-01-09T22:36:04Z

@mmerchant Yes but I think this was merged after RC1. We'll put out another RC later today.

luqmansyauqi · 2017-01-26T02:58:20Z

@tlrobinson is this already implemented in v0.22.1?

SchwiftyOne · 2017-08-01T15:43:06Z

@tlrobinson @axb21 .I know this is an old issue, but does changing (max-results) allow the front-end and/or the CSV export to send more than 10,000 rows? I'm very interested in using Metabase in our corporate environment but the 10,000 row limit is major problem for how we'd want to use it and if changing the constant in the source is fixes this that would solve a big problem for us.

salsakran · 2017-08-01T15:45:35Z

How do you want to use it?

We're kind of at a mostly stable settling point where most people find 10k enough, while some people on small instances still manage to crash their server once every so often. In general, we think of Metabase as a tool for interactive BI vs pushing big files around. There are a lot of little assumptions on that regard.

If you can give us a clear sense of what you're trying to do with it, we can perhaps find a way to get it done. In general though, we're not trying to be a "pull massive files out of databases" play, and at a certain size (depending on your data warehouse, server specs and browser), you'll want to have a dedicated data export tool in conjunction with us.

SchwiftyOne · 2017-08-01T19:37:26Z

We're not looking to download huge datasets (1million rows+), but the queries we'd like to have exported are around 200k to 300k rows or so. I understand that Metabase is primarily a BI visualization tool, so I don't expect it to do huge data set exports but something in that range would be great for us.

mazameli · 2017-08-01T19:48:27Z

Wait, so the limit for the number of downloadable rows is 10k and not 1 million rows? If so, I'm going to change this message in the UI:

axb21 · 2017-08-01T19:49:04Z

@SchwiftyOne yes. I've been happily using a modified version of metabase for over a year now and a higher limit than 10k rows. I couldn't tell you the largest result set someone's downloaded, but it's well over 10k rows and metabase has worked fine.

@salsakran Yes, it makes sense that metabase is not tuned to query and return millions of rows. But an arbitrary, software-selected-at-deploy-time, hard limit on the number of returned rows seems unnecessarily constraining. My preference would be that I, as the person installing and administering a metabase instance, have the power to override this limit if I so choose, and it's on me if I create havoc. Really the request here is to turn a hardcoded limit into a configurable option and it's hard to understand why that'd be controversial unless there's no one available to implement it.

SchwiftyOne · 2017-08-01T20:38:36Z

@axb21 Thanks for the feedback! That's great that you can modify it. My only problem now is how to compile it. Unfortunately I don't have a mac, and reading up on how to compile the metabase source to jar, it seems you need a mac to do so. You wouldn't happen to have a compiled metabase jar with your modifications available for download somewhere?

salsakran · 2017-08-01T20:44:55Z

Sorry, I misspoke.

The limits are
2k for raw table in-browser display
10k for aggregations in-browser display
1M rows for downloads.

@SchwiftyOne it seems like the number you mention (200-300k) should happen without too much drama. You will want to make sure your instance has enough memory, and you might need to tweak some config settings if you start seeing download failures.

axb21 · 2017-08-01T20:51:37Z

@SchwiftyOne according to @salsakran's last response you shouldn't need my version of the jar. Honestly I'm several minor versions behind and I hacked in what I needed, so I'd suggest sticking with an official metabase release if it can handle what you need. I'd be happy to share offline what I hacked if it turns out the official release doesn't work for you. Good luck!

SchwiftyOne · 2017-08-01T21:04:40Z

I just tested it out and it worked!. Didn't realize that the display limit was 10k but the CSV limit was 1M. Thanks @salsakran and @axb21

MiBaDK · 2017-08-17T10:05:19Z

Only getting 100.000 rows. Version 0.25.2. Has this been changed from the 1M?

paoliniluis · 2024-01-31T21:14:47Z

#28144

camsaul changed the title ~~Ease data subset export~~ Option for CSV download without limit Dec 13, 2016

camsaul added the Proposal label Dec 13, 2016

camsaul changed the title ~~Option for CSV download without limit~~ Option for downloading CSV ignoring rows limit of query Dec 13, 2016

camsaul changed the title ~~Option for downloading CSV ignoring rows limit of query~~ Option for downloading CSV with no rows limit Dec 13, 2016

camsaul mentioned this issue Dec 13, 2016

CSV download should not be limited to 10,000 #3936

Closed

aagarwal-gtr mentioned this issue Dec 16, 2016

CSV download limited to 10,000 rows #3967

Closed

tlrobinson added the Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness label Dec 23, 2016

tlrobinson added this to the 0.22.0 milestone Dec 23, 2016

tlrobinson added Type:Bug Product defects Querying/Processor and removed Proposal labels Dec 28, 2016

tlrobinson mentioned this issue Jan 1, 2017

CSV/JSON download limits #4037

Merged

2 tasks

tlrobinson self-assigned this Jan 3, 2017

tlrobinson closed this as completed Jan 6, 2017

tlrobinson mentioned this issue Mar 21, 2017

Create admin-overridable constants #4564

Open

mazameli mentioned this issue Aug 1, 2017

We should clarify how many rows a user is downloading #5657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option for downloading CSV with no rows limit #3810

Option for downloading CSV with no rows limit #3810

ant1j commented Nov 24, 2016 •

edited

Loading

ibushong commented Dec 16, 2016

axb21 commented Dec 21, 2016 •

edited

Loading

hugocar commented Dec 23, 2016

tlrobinson commented Dec 23, 2016

salsakran commented Dec 31, 2016

tlrobinson commented Jan 1, 2017

axb21 commented Jan 1, 2017

tlrobinson commented Jan 1, 2017

axb21 commented Jan 1, 2017 •

edited

Loading

salsakran commented Jan 1, 2017 •

edited

Loading

tlrobinson commented Jan 1, 2017 •

edited

Loading

tlrobinson commented Jan 1, 2017

mmerchant commented Jan 9, 2017

tlrobinson commented Jan 9, 2017

luqmansyauqi commented Jan 26, 2017

SchwiftyOne commented Aug 1, 2017 •

edited

Loading

salsakran commented Aug 1, 2017

SchwiftyOne commented Aug 1, 2017

mazameli commented Aug 1, 2017

axb21 commented Aug 1, 2017

SchwiftyOne commented Aug 1, 2017

salsakran commented Aug 1, 2017

axb21 commented Aug 1, 2017

SchwiftyOne commented Aug 1, 2017

MiBaDK commented Aug 17, 2017

paoliniluis commented Jan 31, 2024

Option for downloading CSV with no rows limit #3810

Option for downloading CSV with no rows limit #3810

Comments

ant1j commented Nov 24, 2016 • edited Loading

ibushong commented Dec 16, 2016

axb21 commented Dec 21, 2016 • edited Loading

hugocar commented Dec 23, 2016

tlrobinson commented Dec 23, 2016

salsakran commented Dec 31, 2016

tlrobinson commented Jan 1, 2017

axb21 commented Jan 1, 2017

tlrobinson commented Jan 1, 2017

axb21 commented Jan 1, 2017 • edited Loading

salsakran commented Jan 1, 2017 • edited Loading

tlrobinson commented Jan 1, 2017 • edited Loading

tlrobinson commented Jan 1, 2017

mmerchant commented Jan 9, 2017

tlrobinson commented Jan 9, 2017

luqmansyauqi commented Jan 26, 2017

SchwiftyOne commented Aug 1, 2017 • edited Loading

salsakran commented Aug 1, 2017

SchwiftyOne commented Aug 1, 2017

mazameli commented Aug 1, 2017

axb21 commented Aug 1, 2017

SchwiftyOne commented Aug 1, 2017

salsakran commented Aug 1, 2017

axb21 commented Aug 1, 2017

SchwiftyOne commented Aug 1, 2017

MiBaDK commented Aug 17, 2017

paoliniluis commented Jan 31, 2024

ant1j commented Nov 24, 2016 •

edited

Loading

axb21 commented Dec 21, 2016 •

edited

Loading

axb21 commented Jan 1, 2017 •

edited

Loading

salsakran commented Jan 1, 2017 •

edited

Loading

tlrobinson commented Jan 1, 2017 •

edited

Loading

SchwiftyOne commented Aug 1, 2017 •

edited

Loading