New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RStudio View() slow with very wide data (>100 columns) #1771
Comments
|
Can confirm this. It was extremely slow when I tried to view a 20 rows x 4,000 columns data frame. It would be great if you guys have an option in Thanks! |
|
Accidentally click on a big data frame (1M+ rows, tens of columns) can cause RStudio Server hang forever. |
|
I don't know if the behaviour I'm seeing is this or not. I find it slow to View dataframes even when they're quite small- today's is 163 rows and 22 variables. I have 16GB RAM, and 8 cores so it's not the hardware. I'm running Linux Mint 18.0 RStudio 1.1.442, R 3.4.4. I'm guessing it runs okay with such a small dataframe on other machines, does it? |
|
Yes, it does, ChrisBeeley. I am able to view larger dataframes without problems. I've worked with a 1M+ rows and around 115 columns without problems. But there is one specific dataframe with around 2800 rows and 100 columns that is very slow. I'm using Ubuntu 16.04, 16gb RAM and a Ryzen 1700 and the latest RStudio available (not sure about the version as I'm not on that computer). |
|
Confirmed very slow on Ubuntu linux 18.04, powerful computer. |
|
Having a Global Option to limit the nrows and ncols that Or if there was a way for it to time out so the user can realize their error and view a subset without having to terminate R that would be better. This would be even easier if In the mean time I am using this (adapted from here): In case that is useful to anyone else. |
|
Good tip @see24. But wouldn't it be possible for the viewer to only load the relevant part of the data frame that's being viewed? So initially only the 20-30 ish rows (whatever fits the window size) are loaded and when scrolling down then only the rows in view are loaded, and not everything else. It's possible to view 100k+ rows in Microsoft Excel at once, and RStudio ought to be able to perform better than excel in that respect IMHO! |
|
Rstudio freezes when trying to View 550 columns. |
|
Indeed, numerous columns is definitely slower than numerous rows. For example, System detailsSoftware |
|
A frustrating discovery: it turns out the rstudio/src/cpp/desktop/DesktopMain.cpp Lines 395 to 400 in e123311
In particular, with this preference active, rendering of the table cells is effectively synchronized with the movement of the header, and laying out the DataTable header is evidently very slow. Performance appears a lot better in a browser because the table contents are rendered quickly on-the-fly while the header lags behind but later catches up as needed. Do we know why the DT headers are so slow to scroll? It seems a bit frustrating that this is the case. |
|
For reference, one can test with e.g. View(matrix(1:1E5, nrow = 1E3))Scrolling is completely smooth in v1.0, 'okay' in v1.1, and now rather choppy in v1.2. It's worth noting that the above comment is only true with retina displays: regular-DPI displays always have choppy scrolling, regardless of whether the |
|
On my Ubuntu 18.04 LTS system |
|
I am not sure if I have the same problem with View command. It used to work normally but today when I reopened one of my files again (just 120 columns) it just hung for a few minutes. I tried a smaller file (54 column) it was still the same. I use the Ubuntu 18.04 LTS. Other thing to note is when I tried to open these files on my laptop (MacPro, a less powerful system), it worked totally ok. |
|
@NgocTNguyen Can you give me a minimal repro of your issue? With dummy data I'm not able to get any noticeable slowness until I get up to around 1000 columns. I've only tested on Windows 10 and OS X High Sierra so far but I'll make sure to test on Ubuntu as well. I wouldn't be surprised if the Chromium engine has extra issues rendering in Linux. |
|
@adamconroy: I tried to read and view the file raw_data.tdf (0.8 MB, see attached). I typed: If I read the first 10-20 columns of the file it seems ok, but when the table is more than 20 columns, the View() starts delaying. When View(matrix(1:1E5, nrow = 1E4)) and View(matrix(1:1E5, nrow = 5000)) it was ok, but View(matrix(1:1E5, nrow = 2500)) starts delaying and View(matrix(1:1E5, nrow = 1000)) is terrible for displaying the table and scrolling it. However, Rstudio works perfectly on my laptop when reading and viewing these data. Thanks in advance for any time and help. |
|
@NgocTNguyen I've just opened a pull request that hopefully helps you with your problem. I can ping you once it's merged in and you can check out a daily build if you like. |
|
Has this been fixed? I found a post from 5 years ago that had the same issue. Today I imported a file with 20 rows and 17k columns and guess what? Why isn't there a simple option to define what View() should do unless user specifically asks for loading entire file. For example View will only show first 20 rows and first 20 columns. If user wants to see more, they can just ask View(x, all) or something like that. I shouldn't have taken 5 years to make life easier. Or just let me off load view forever because even accidental click on View ruins it and the only option is a restart! |
|
@drupadt This should be fixed in the latest preview release. |
|
If your data.frame is very wide, you can use utils::View(mydf) |
I am also using this solution. I usually work with data frames that have many columns, so I just changed the default View function with this. The default View is useless for me anyway or put it in a better way, it acts as a function to crash my r server. View = function(x) { utils::View(x)} So whenever I click on a data frame in Environment, it uses utils package. This is how I found a workaround for me. |
|
I'm experiencing the same with data.table format - dataframe works just fine though. |
We've had a few reports that
View()can make RStudio very slow when data is very wide. We knew about this performance issue in RStudio 1.0, and so we truncated data to 100 columns in that release in order to keep performance at acceptable levels.However, many people with wide data indicated that they'd prefer to see all their columns (even if it was slow), so in 1.1 we lifted the 100 column limit. Predictably, it's now slow for wide data.
There are a few approaches we could take to addressing this:
Do some performance analysis to see why DataTables (which is the underlying library that renders the data to the DOM) is so slow as the number of columns increase. (We might not be able to do anything about a potential bottleneck without contributing a patch to DataTables.)
Add code to do column virtualization; just as today rows are paged in dynamically, we could show a small set of columns in the DOM and then page in more as the user scrolls left or right. (It may or may not be possible to add columns on the fly smoothly).
Restrict the number of columns visible to a small number (say, 50), and give the user an affordance for manually selecting the columns to view. (This UX might not be ideal.)
Implement the grid using a different library that scales better for wide data. (This would take a long time, but on the other hand we already do so much custom data fetching and rendering that our dependency on DT is smaller than it might seem.)
No good options, but wide data is common enough (and performance poor enough) that we should address this in the upcoming release if we can.
The text was updated successfully, but these errors were encountered: