-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option for downloading CSV with no rows limit #3810
Comments
Not being able to download all rows is a pretty big issue for us. Having something like a "download all" button would be great. |
I'd also like an option to download CSV files with at least a higher row limit than the current 10,000. I can see the rationale for the limit, but I am exploring using metabase as a way to deliver insights to non-technical people in my company, and every now and then they'll generate a result set with slightly more than 10,000 rows. The most recent one was 13,000 rows or so, and it was frustrating that they could not hit a button and download the whole thing. Just a little too big. I am not familiar at all with the metabase codebase, having literally just looked at it minutes ago. However, I hacked a bit and found that changing one line in src/metabase/api/dataset.clj (as of commit 4844ecc) sufficed for my purposes, at least as far as I'm able to tell so far. I have attached a patch if you're curious. The runnable jar file is easy to build following the developer notes and so far it's running fine in my environment and people are happily downloading CSV files with >10k rows. Again, I have no clue whether this will wreak havoc on the rest of the codebase, so word of warning. |
The fact the limit wasn't enforced in version 0.20 is causing troubles for me. Users, where I work, were beginning to get used to export big data sets and are seeing this limit as a bug/regression. I'm not sure what to tell them. |
Confirmed this is a regression in |
@tlrobinson I think this is happening on the frontend. I'm fairly certain the query dictionary being serialized for POST params in https://github.com/metabase/metabase/blob/master/frontend/src/metabase/query_builder/components/DownloadWidget.jsx#L32 includes the limits. |
@salsakran I think you're right for unsaved questions. It looks like we started using However, saved questions are still limited to 2k rows by the backend (the frontend doesn't send the query at all in that case). |
@tlrobinson I think you're right about that. If you have a look at the patch I posted earlier, you'll see I changed code on the backend that allowed me to retrieve >2,000 rows from a saved question; I neglected to mention in my previous comment that my observations concerned saved questions. Anyway, I'm not steeped in this codebase, but this all suggests to me that at least in some cases the backend is controlling how many rows are ever sent to the frontend. |
@axb21 Thanks for looking into this. The constant you modified is the limit for normal queries (in the query builder and dashboards), but we want to ignore that for CSV downloads and use the 1 million row limit instead. Also I think your change will only increase the limit for aggregated queries, not "bare rows", which has a separate limit of 2000. We don't actually want to modify either of those limits. |
@tlrobinson gotcha, that makes sense. One thing I can say that might be of interest: I searched through the source for the number 10,000--the largest number of rows I was ever able to download--and only found that number in a handful of places:
The patch I posted changes that number in |
@tlrobinson I don't think the front end should control how many rows get sent back. This is a limit in place for the backend to remain stable, and it should control how many rows get sent back. I'll keep poking, I stopped there as I thought that was the root bug. =( |
@salsakran My PR already prevents the frontend from sending the The following endpoints override
(those last 2 are why the saved question download endpoints are still limited to 10k/2k rows) And the following do not (allowing
Perhaps we should just change all the CSV/JSON download endpoints to discard |
@axb21 The 10000 constant you found ( |
@tlrobinson Just to clarify -- on the CSV export it is intended to limit the export to 10,000 rows? Or is that constraint removed? I tried on the v0.22.0-rc1 tag and do still run into the 10K row limit on CSV exports. |
@mmerchant Yes but I think this was merged after RC1. We'll put out another RC later today. |
@tlrobinson is this already implemented in v0.22.1? |
@tlrobinson @axb21 .I know this is an old issue, but does changing ( |
How do you want to use it? We're kind of at a mostly stable settling point where most people find 10k enough, while some people on small instances still manage to crash their server once every so often. In general, we think of Metabase as a tool for interactive BI vs pushing big files around. There are a lot of little assumptions on that regard. If you can give us a clear sense of what you're trying to do with it, we can perhaps find a way to get it done. In general though, we're not trying to be a "pull massive files out of databases" play, and at a certain size (depending on your data warehouse, server specs and browser), you'll want to have a dedicated data export tool in conjunction with us. |
We're not looking to download huge datasets (1million rows+), but the queries we'd like to have exported are around 200k to 300k rows or so. I understand that Metabase is primarily a BI visualization tool, so I don't expect it to do huge data set exports but something in that range would be great for us. |
@SchwiftyOne yes. I've been happily using a modified version of metabase for over a year now and a higher limit than 10k rows. I couldn't tell you the largest result set someone's downloaded, but it's well over 10k rows and metabase has worked fine. @salsakran Yes, it makes sense that metabase is not tuned to query and return millions of rows. But an arbitrary, software-selected-at-deploy-time, hard limit on the number of returned rows seems unnecessarily constraining. My preference would be that I, as the person installing and administering a metabase instance, have the power to override this limit if I so choose, and it's on me if I create havoc. Really the request here is to turn a hardcoded limit into a configurable option and it's hard to understand why that'd be controversial unless there's no one available to implement it. |
@axb21 Thanks for the feedback! That's great that you can modify it. My only problem now is how to compile it. Unfortunately I don't have a mac, and reading up on how to compile the metabase source to jar, it seems you need a mac to do so. You wouldn't happen to have a compiled metabase jar with your modifications available for download somewhere? |
Sorry, I misspoke. The limits are @SchwiftyOne it seems like the number you mention (200-300k) should happen without too much drama. You will want to make sure your instance has enough memory, and you might need to tweak some config settings if you start seeing download failures. |
@SchwiftyOne according to @salsakran's last response you shouldn't need my version of the jar. Honestly I'm several minor versions behind and I hacked in what I needed, so I'd suggest sticking with an official metabase release if it can handle what you need. I'd be happy to share offline what I hacked if it turns out the official release doesn't work for you. Good luck! |
I just tested it out and it worked!. Didn't realize that the display limit was 10k but the CSV limit was 1M. Thanks @salsakran and @axb21 |
Only getting 100.000 rows. Version 0.25.2. Has this been changed from the 1M? |
It would be great to facilitate data extraction through Metabase.
A few potential enhancements to do so:
=> this is possible today (but not so user-friendly) by setting the limit, hit "get answer" to check data, and remove the LIMIT options before clicking on the export to CSV icon
The text was updated successfully, but these errors were encountered: