Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RStudio View() slow with very wide data (>100 columns) #1771

Closed
jmcphers opened this issue Nov 21, 2017 · 21 comments · Fixed by #3878
Closed

RStudio View() slow with very wide data (>100 columns) #1771

jmcphers opened this issue Nov 21, 2017 · 21 comments · Fixed by #3878
Assignees
Milestone

Comments

@jmcphers
Copy link
Member

@jmcphers jmcphers commented Nov 21, 2017

We've had a few reports that View() can make RStudio very slow when data is very wide. We knew about this performance issue in RStudio 1.0, and so we truncated data to 100 columns in that release in order to keep performance at acceptable levels.

However, many people with wide data indicated that they'd prefer to see all their columns (even if it was slow), so in 1.1 we lifted the 100 column limit. Predictably, it's now slow for wide data.

There are a few approaches we could take to addressing this:

  1. Do some performance analysis to see why DataTables (which is the underlying library that renders the data to the DOM) is so slow as the number of columns increase. (We might not be able to do anything about a potential bottleneck without contributing a patch to DataTables.)

  2. Add code to do column virtualization; just as today rows are paged in dynamically, we could show a small set of columns in the DOM and then page in more as the user scrolls left or right. (It may or may not be possible to add columns on the fly smoothly).

  3. Restrict the number of columns visible to a small number (say, 50), and give the user an affordance for manually selecting the columns to view. (This UX might not be ideal.)

  4. Implement the grid using a different library that scales better for wide data. (This would take a long time, but on the other hand we already do so much custom data fetching and rendering that our dependency on DT is smaller than it might seem.)

No good options, but wide data is common enough (and performance poor enough) that we should address this in the upcoming release if we can.

@tungmilan
Copy link

@tungmilan tungmilan commented Jan 31, 2018

Can confirm this. It was extremely slow when I tried to view a 20 rows x 4,000 columns data frame.

It would be great if you guys have an option in Global Settings to set the maximum number of columns for viewing together with a warning that mentions this bug.

Thanks!

@renkun-ken
Copy link

@renkun-ken renkun-ken commented Apr 16, 2018

Accidentally click on a big data frame (1M+ rows, tens of columns) can cause RStudio Server hang forever.

@ChrisBeeley
Copy link

@ChrisBeeley ChrisBeeley commented May 1, 2018

I don't know if the behaviour I'm seeing is this or not. I find it slow to View dataframes even when they're quite small- today's is 163 rows and 22 variables. I have 16GB RAM, and 8 cores so it's not the hardware.

I'm running Linux Mint 18.0 RStudio 1.1.442, R 3.4.4. I'm guessing it runs okay with such a small dataframe on other machines, does it?

@vgastaldi
Copy link

@vgastaldi vgastaldi commented May 11, 2018

Yes, it does, ChrisBeeley.

I am able to view larger dataframes without problems. I've worked with a 1M+ rows and around 115 columns without problems. But there is one specific dataframe with around 2800 rows and 100 columns that is very slow.

I'm using Ubuntu 16.04, 16gb RAM and a Ryzen 1700 and the latest RStudio available (not sure about the version as I'm not on that computer).

@KasperSkytte
Copy link

@KasperSkytte KasperSkytte commented Jun 27, 2018

Confirmed very slow on Ubuntu linux 18.04, powerful computer.

@see24
Copy link

@see24 see24 commented Jul 12, 2018

Having a Global Option to limit the nrows and ncols that View() will attempt to load would be great. I have definitely had the problem of RStudio Server hanging forever after trying to view dataframes > 300 000 rows.

Or if there was a way for it to time out so the user can realize their error and view a subset without having to terminate R that would be better. This would be even easier if View() had an argument for number of rows (or columns) since I often find myself doing head(2000) %>% View() but it sucks that the name is gone.

In the mean time I am using this (adapted from here):

RStudioView = View
View = function(x) {
  name  <- deparse(substitute(x))
  if ("data.frame" %in% class(x)) { 
    RStudioView(x[1:1000,],name)
    } else { 
    RStudioView(x) 
    }
}

In case that is useful to anyone else.

@KasperSkytte
Copy link

@KasperSkytte KasperSkytte commented Jul 12, 2018

Good tip @see24. But wouldn't it be possible for the viewer to only load the relevant part of the data frame that's being viewed? So initially only the 20-30 ish rows (whatever fits the window size) are loaded and when scrolling down then only the rows in view are loaded, and not everything else. It's possible to view 100k+ rows in Microsoft Excel at once, and RStudio ought to be able to perform better than excel in that respect IMHO!

@moouad
Copy link

@moouad moouad commented Jul 18, 2018

Rstudio freezes when trying to View 550 columns.
The issue persists even with a small number of rows(100).

@KasperSkytte
Copy link

@KasperSkytte KasperSkytte commented Jul 18, 2018

Indeed, numerous columns is definitely slower than numerous rows. For example, View(matrix(1:1e4, ncol = 2)) instantly displays with smooth scrolling, while View(matrix(1:1e4, nrow = 2)) takes around 14 seconds to display and scrolling stutters.

System details

Software
Ubuntu linux 18.04 LTS
R version: 3.5.0
RStudio version: installed from GitHub source per 16-Jul-2018
Hardware
Intel i7-7820HQ vPro 8 Core
32GB ram
SSD storage

@kevinushey
Copy link
Contributor

@kevinushey kevinushey commented Aug 9, 2018

A frustrating discovery: it turns out the --disable-prefer-compositing-to-lcd-text preference is partially to blame for the Data Viewer slowness seen in RStudio v1.2 (at least on macOS):

// don't prefer compositing to LCD text rendering. when enabled, this causes the compositor to
// be used too aggressively on Retina displays on macOS, with the side effect that the
// scrollbar doesn't auto-hide because a compositor layer is present.
// https://github.com/rstudio/rstudio/issues/1953
static char disableCompositorPref[] = "--disable-prefer-compositing-to-lcd-text";
arguments.push_back(disableCompositorPref);

In particular, with this preference active, rendering of the table cells is effectively synchronized with the movement of the header, and laying out the DataTable header is evidently very slow. Performance appears a lot better in a browser because the table contents are rendered quickly on-the-fly while the header lags behind but later catches up as needed.

Do we know why the DT headers are so slow to scroll? It seems a bit frustrating that this is the case.

@kevinushey
Copy link
Contributor

@kevinushey kevinushey commented Sep 20, 2018

For reference, one can test with e.g.

View(matrix(1:1E5, nrow = 1E3))

Scrolling is completely smooth in v1.0, 'okay' in v1.1, and now rather choppy in v1.2.

It's worth noting that the above comment is only true with retina displays: regular-DPI displays always have choppy scrolling, regardless of whether the --disable-prefer-compositing-to-lcd-text preference is set. (Perhaps this is because macOS performs a different kind of anti-aliasing with retina displays vs. regular-DPI displays?)

@KasperSkytte
Copy link

@KasperSkytte KasperSkytte commented Sep 21, 2018

On my Ubuntu 18.04 LTS system View(matrix(1:1E5, nrow = 1E3)) is also choppy, View(matrix(1:1E5, nrow = 1E4)) is much smoother and View(matrix(1:1E5, nrow = 1E2)) is quite horrible. Running the same from the browser from rstudio server is generally a bit smoother, but takes more time to load. Same system details as my last comment.

@adamconroy adamconroy self-assigned this Sep 21, 2018
@NgocTNguyen
Copy link

@NgocTNguyen NgocTNguyen commented Sep 21, 2018

I am not sure if I have the same problem with View command. It used to work normally but today when I reopened one of my files again (just 120 columns) it just hung for a few minutes. I tried a smaller file (54 column) it was still the same. I use the Ubuntu 18.04 LTS. Other thing to note is when I tried to open these files on my laptop (MacPro, a less powerful system), it worked totally ok.

@adamconroy
Copy link
Contributor

@adamconroy adamconroy commented Sep 21, 2018

@NgocTNguyen Can you give me a minimal repro of your issue? With dummy data I'm not able to get any noticeable slowness until I get up to around 1000 columns. I've only tested on Windows 10 and OS X High Sierra so far but I'll make sure to test on Ubuntu as well. I wouldn't be surprised if the Chromium engine has extra issues rendering in Linux.

@NgocTNguyen
Copy link

@NgocTNguyen NgocTNguyen commented Sep 24, 2018

@adamconroy: I tried to read and view the file raw_data.tdf (0.8 MB, see attached). I typed:
x = read.table('raw_count.txt', col.names = TRUE)
View(x)
Then the Rstudio just freezed up for about 10 seconds.

If I read the first 10-20 columns of the file it seems ok, but when the table is more than 20 columns, the View() starts delaying.

When View(matrix(1:1E5, nrow = 1E4)) and View(matrix(1:1E5, nrow = 5000)) it was ok, but View(matrix(1:1E5, nrow = 2500)) starts delaying and View(matrix(1:1E5, nrow = 1000)) is terrible for displaying the table and scrolling it.
I use R version 3.5.1, Rstudio 1.1.456
Ubuntu 18.06 LTS
64 GB RAM
SSD storage
8 cores intelXeon

However, Rstudio works perfectly on my laptop when reading and viewing these data.
MacOS X EI Capitan version 10.11.6
Intel Core i5
8 GB RAM
SSD storage

Thanks in advance for any time and help.

@adamconroy
Copy link
Contributor

@adamconroy adamconroy commented Nov 13, 2018

@NgocTNguyen I've just opened a pull request that hopefully helps you with your problem. I can ping you once it's merged in and you can check out a daily build if you like.

@trivedi-group
Copy link

@trivedi-group trivedi-group commented Jan 17, 2019

Has this been fixed? I found a post from 5 years ago that had the same issue. Today I imported a file with 20 rows and 17k columns and guess what?

Why isn't there a simple option to define what View() should do unless user specifically asks for loading entire file. For example View will only show first 20 rows and first 20 columns. If user wants to see more, they can just ask View(x, all) or something like that. I shouldn't have taken 5 years to make life easier. Or just let me off load view forever because even accidental click on View ruins it and the only option is a restart!

@kfeinauer
Copy link
Contributor

@kfeinauer kfeinauer commented Jan 17, 2019

@drupadt This should be fixed in the latest preview release.

@vlarmet
Copy link

@vlarmet vlarmet commented Mar 26, 2019

If your data.frame is very wide, you can use utils::View(mydf)

@mtdinc
Copy link

@mtdinc mtdinc commented Jul 31, 2019

If your data.frame is very wide, you can use utils::View(mydf)

I am also using this solution. I usually work with data frames that have many columns, so I just changed the default View function with this. The default View is useless for me anyway or put it in a better way, it acts as a function to crash my r server.

View = function(x) { utils::View(x)}

So whenever I click on a data frame in Environment, it uses utils package. This is how I found a workaround for me.

@matthewgson
Copy link

@matthewgson matthewgson commented May 8, 2020

I'm experiencing the same with data.table format - dataframe works just fine though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.