Dataset Page: Should lazy load thumbnails to better support slower storage media and improve page responsiveness. #4091

kcondon · 2017-08-18T12:47:13Z

The dataset page needs to be more responsive when using slower storage media such as s3 and a number of files have thumbnails (pdfs, images).

One way that is used elsewhere in search results is to load the card with a default icon so the page can load and then load the thumbnail as it becomes available.

Otherwise page load is slow as is scrolling through pages of files.

My Data could potentially benefit from this but it is currently only displaying icons rather than thumbnails.

landreev · 2017-09-13T15:15:11Z

OK, looks like I was able to get it to work with less code than I was going to write originally.
Made the PR: #4135
moving into Code Review.

kcondon · 2017-09-14T20:24:55Z

Generally works and is an improvement but found that cannot do some actions while images are loading: download file, search file. Download file gets queued until images finish loading but in my lack of feedback I kept pressing download and got several copies.

landreev · 2017-09-18T22:04:18Z

(moved this into dev. temporarily; to further experiment with the lazy loading/address last comment from @kcondon)

…he concerns that came up during QA. (#4091)

landreev · 2017-09-19T22:47:18Z

Please re-test.
This is what has changed:

As the page is being redrawn to add thumbnails (after the initial, "lazy" pass), it is now only the files table that is subject to locking up. All the other buttons, outside the files table, are clickable and functioning.
The download/explore/etc. buttons inside the files table are now disabled while the thumbnails are being added.

And, once again, I hope that once both the Dataverse instance and S3 are on the same cloud, reading that thumbnail from storage will not be as slow anymore.

landreev · 2017-09-22T16:59:06Z

I need more info... How many images were there in your test?
And what browser was that?
So far I can't reproduce the condition in Firefox at all (everything is displayed as it should; with a few pages worth of thumbnails...).
And in Chrome it appears that I'm seeing a different issue altogether, seemingly unrelated to anything done in this branch...
Very puzzling and annoying.

kcondon · 2017-09-22T17:03:45Z

Chrome (and now FF), 91 images. I can show you the actual dataset too, Test S3 Images on internal.
After demonstrating the issue to Leonid, we've discovered a key step to make it happen: once some thumbails appear, the download buttons for those images are clickable. If I click download before all thumbnails are downloaded, the remaining thumbnails don't download and the download buttons remain disabled until you refresh the page and try again.

So, it takes approx 20 seconds for the first page of thumbnails to appear and 69 seconds for all thumbnails to appear (with my trying to scroll, some scrolling issue there). If I click download between 20 and 69 seconds, the problem happens. I tried multiple times around 30-35 seconds. Did not get very precise with the timing but can do more if needed.

landreev · 2017-09-22T17:53:00Z

So here's my take on it: the lazy implementation appears to be working, if you just let it scroll lazily, without trying to download files. Clicking a download button in the middle of it appears to be breaking it - I have no idea why, it doesn't seem to make any logical sense. So we are, once again, encountering an idiosyncratic behavior of this auto-scroll table.

I propose we simply abandon trying to implement this lazy-loading optimization for this page. We decided to try it because there was hope it would be very easy; and actually buy us something. If we get something pretty-looking appear on the page faster, but the end result is that any attempt to actually do something on the page breaks it in a confusing way - that doesn't provide any extra value.

At the very least, I would put this on hold until we get a chance to test the performance with both the Dataverse and S3 storage on the same cloud - as there is some glimmer of hope that the performance would improve enough for the whole thing to become a non issue in the first place.

landreev · 2017-09-22T20:57:26Z

@djbrooke @scolapasta @kcondon
A followup update/summary:
The report this morning was not 100% accurate; in that it wasn't just the number of image files on the page/in the dataset; you also needed to click the download button on a file while the page was in the lazy load state. That event somehow prevents the update to complete and leaves the page in a broken state. I have no explanation for why it happens. It's probably a primefaces bug.

I may have been able to work around it by adding another - seemingly random - update= attribute to the component. I can't possibly call it a "fix". The scroller is now working consistently, in that you can go through all the files, and all the thumbnails are showing and all the buttons are enabled. There is still some weirdness on the page. Specifically, there's still something wrong with that first download - there appears to be a delay before it actually starts, that's seemingly proportionate to how long it took to render the table (???). I'll experiment some more.

(I'll add more info in a sec.; just trying to catch you before you leave for the weekend)

landreev · 2017-09-22T21:02:37Z

@kcondon @djbrooke @scolapasta

So, I'm still very much in favor of abandoning this "optimization", if I cannot fully resolve this by the end of the day. I would want you to make this decision though.

Once again, I'm hoping that once both the Dataverse and S3 are on the same cloud, the performance will improve greatly.

However, if it doesn't improve, I honestly don't think we can live with it. Whether we get this lazy loading to work or not, no amount of lazy is going to make this tolerable - if we really have to wait 1 to 3 seconds to read just a few kb of image data from S3. If it really ends up being this bad, we'll need to do something else. See if there's something awfully inefficient in our S3 driver; not use S3 for thumbnails; etc. (Once again, it was most likely a mistake to make thumbnails stored via the DataAccessIO in the first place).

djbrooke · 2017-09-22T21:05:02Z

Thanks @landreev @kcondon for the efforts and the updates. Let's get together on Monday.

landreev · 2017-09-22T21:11:30Z

@kcondon @djbrooke @scolapasta

Finally, just to say this one more time. Every time I work on anything thumbnails-related, I end up feeling like we must be doing something wrong; if this particular feature - not the most important feature we have, by any means - costs us that much dev. effort and grief.

There must be some way to simplify/dumb down the whole thing. Like, maybe declare that thumbnails are only supported on publicly files, this will eliminate any permission issues and allow us to move the thumbnails to some static storage server... Just a random idea - but there gotta be something out there, that's not as complex and expensive as what we are doing now, but still somewhat satisfactory...

…e properly rendered. (#4091)

… downloads). (#4091)

landreev · 2017-09-25T03:30:32Z

@kcondon @djbrooke @scolapasta

(I deleted my last comment; the comment below should be reflecting the current state of the issue more accurately:)

Please re-test.
As in, hitting that download button, at any point, should NOT be messing anything up. You should then be able to see all the thumbnails and all the buttons as you keep scrolling down. The download itself should be starting right away after you click the button (and NOT be taking forever, as it was before).

The above should be true not just for the "straight" download, but also for various other download/explore buttons, with or without the guestbook popup required. However, I've only tested it carefully with the straightforward download - non-tabular file, no popup required. (All the different buttons and links for different download and explore scenarios are separately defined and rendered, unfortunately).

landreev · 2017-09-25T03:44:10Z

@kcondon @djbrooke @scolapasta

Gustavo, this issue, of the first download attempting to process the entire
table-full of files for the dataset, is indeed existing and showing in production. Going to a prod. dataset with a large number of images, for example:

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/G3BNQZ (834 images, 835 files total)

without scrolling past the first 25 files, click any of the download buttons. It takes forever, to get the popup, then get the file.

Adding process="@this" to both the download button in the table, and the button that actually starts the download, inside the guestbook popup, speeds this first download up by 30 seconds (!).

landreev · 2017-09-25T20:34:37Z

Thinking about it some more, I'm still not convinced this was a good idea. This "lazy loading" gimmick works ok on the dataverse/search results page; but that page is essentially read-only; plus there's always only 10 results ("cards") to show. On the dataset page, there's really too many things to disable/re-enable, and too many things the user can click while the page is loading... My gut feeling is we should NOT try to merge this just yet; but wait till we have a chance to try both this branch, and the current develop branch, on a real AWS node. And then proceed based on the results we see.

And I feel like it should be perfectly acceptable, to declare something we try a dead end and give up on it, if it's proving to be more trouble than its worth. But yes, let's see how it behaves on the Amazon cloud.

kcondon · 2017-09-25T20:44:44Z

OK, tested, it functions to a point but the page still has issues that are likely as much to do with load speed as much as page design -under which circumstances we try to load stuff. For instance, when I mark a file as restricted using the files tab edit/restrict, it is very sluggish changing state/saving. Also, moving between files/terms/versions tabs when spinner is running is sluggish and sometimes switches back and forth. Same when you edit terms requirements -view does not appear right away, then switches back and forth and is slow on save. I reproduced much of the same behavior on s3 without lazy loading with the same dataset with lots of images/thumbnails. Might just have to accept it for now. Also noticed a scrolling pause where I can't see the very last set of files until all images are loaded but I seemed to be able to see the first couple of pages while thumbnails loaded.

… that was slowing down the first download, on a dataset with many datafiles. (followup to #4091)

…- the "performance fixes mini release". Combines the Dataset page queries fix (#4173) and the S3/thumbnails improvements made as part of #4091.

landreev · 2017-10-11T23:10:39Z

The plan/consensus is to close this issue, and the pull request. The branch will be kept alive, if we ever want to revisit the "lazy" loading for the dataset page. See below for more info.

The issue was opened once we noticed that dataset pages with multiple thumbnails were loading slowly with the files store on S3. Applying the same "lazy", delayed thumbnail load used on the dataverse (search) page was the solution we had in mind.

However, as we were working on this we realized that the dataset page was already seriously slow, even in production, with no S3. Plus we found a way to actually improve the performance of S3 in the thumbnails scenario (multiple small files).

The following improvements came out of this investigation:

The "process" fix: In the absence of the "process" attribute on the components representing various download/explore buttons, it was defaulting to "process=@Form". And to process the entire "form", the page would generate the html for the entire list of files in the table. (Meaning, not just for the files already live-scrolled - but for all the files int the version). This resulted in unnecessarily long delays between clicking a download button and actually getting the file or the terms popup. Was fixed with "processed=@this" added to the components in file-download-buttons.xhtml. This fix was incorporated into 4.8 2 weeks ago.
S3 improvements for thumbnails. Serious savings were achieved by skipping multiple S3 metadata lookups in the process of retrieving these auxiliary files. For short files, these lookups appear to be more expensive than actually reading the bytes. These changes were first made in this branch. But were eventually merged and incorporated into 4.8.1 in the "4.8-performance-fixes" branch.
The dataset page was found to be very seriously slow down by a method that was sending multiple unnecessary queries. This was fixed in the "4.8-performance-fixes" branch and incorporated into 4.8.1.

So all that's left in this branch ("4091-lazy-thumbnails-datasetpage") is the actual "lazy" hack. We have concluded that it was more trouble than it was worth. With the page having been made much more usable by the fixes above, it does not seem to provide enough extra value to justify the extra complexity and the extra time needed to complete the second pass.

We'll keep the branch alive, just in case we ever want to revisit the solution.

@kcondon please close the issue and the PR.

mheppler · 2017-10-12T14:59:43Z

@landreev @kcondon -- I have added this as a related issue to Dataset - File UI Improvements #3404 because it is a feature we should consider adopting when we try to redesign and improve the UI of the files table on the dataset pg.

djbrooke added the Status: Backlog label Aug 18, 2017

djbrooke added this to the 4.8 - AWS S3 Support milestone Aug 24, 2017

landreev removed the Status: Backlog label Aug 29, 2017

djbrooke added Status: Backlog and removed Status: Backlog labels Aug 29, 2017

landreev added Status: Development and removed Status: This/Next Sprint labels Sep 7, 2017

landreev self-assigned this Sep 7, 2017

landreev added Status: Code Review and removed Status: Development labels Sep 13, 2017

landreev removed their assignment Sep 13, 2017

djbrooke assigned scolapasta Sep 13, 2017

landreev mentioned this issue Sep 13, 2017

4091 - Lazy loading for datafile thumbnails on the dataset page. #4135

Closed

5 tasks

scolapasta added Status: QA and removed Status: Code Review labels Sep 13, 2017

kcondon self-assigned this Sep 13, 2017

djbrooke unassigned scolapasta Sep 14, 2017

scolapasta unassigned kcondon Sep 14, 2017

kcondon self-assigned this Sep 14, 2017

djbrooke assigned landreev Sep 15, 2017

landreev added Status: Development and removed Status: QA labels Sep 18, 2017

landreev added a commit that referenced this issue Sep 19, 2017

re-engineered the initial lazy load of the dataset page; to address t…

75bc45e

…he concerns that came up during QA. (#4091)

landreev added Status: QA and removed Status: Development labels Sep 19, 2017

djbrooke assigned landreev Sep 22, 2017

landreev added a commit that referenced this issue Sep 22, 2017

current state of the "lazy" hack. the thumbnails and buttons should b…

9a5a051

…e properly rendered. (#4091)

landreev added a commit that referenced this issue Sep 22, 2017

Fixes the first-download-takes-forever issue (but only for "straight"…

2e6a335

… downloads). (#4091)

landreev added a commit that referenced this issue Sep 25, 2017

Added process="@this" to all the download buttons. (#4091)

23221f6

djbrooke modified the milestones: 4.8 - AWS S3 Support and Large Data Upload Integration, 5.0 - DataTags and Support for Sensitive Data Sep 26, 2017

landreev added a commit that referenced this issue Sep 26, 2017

The fixes for inefficient files table processing on the dataset page,…

06d3049

… that was slowing down the first download, on a dataset with many datafiles. (followup to #4091)

landreev mentioned this issue Sep 26, 2017

Fix for file table processing on the dataset page #4159

Merged

5 tasks

djbrooke removed this from the 4.8 - AWS S3 Support and Large Data Upload Integration milestone Sep 27, 2017

landreev mentioned this issue Oct 3, 2017

Investigate performance of dataset page, on datasets with hundreds of files. #4173

Closed

landreev added a commit that referenced this issue Oct 5, 2017

Creating a performance improvements branch, as a candidate for 4.8.1 …

86113a2

…- the "performance fixes mini release". Combines the Dataset page queries fix (#4173) and the S3/thumbnails improvements made as part of #4091.

landreev removed their assignment Oct 11, 2017

mheppler mentioned this issue Oct 12, 2017

Dataset - File UI Improvements #3404

Closed

kcondon closed this as completed Oct 12, 2017

djbrooke added this to QA 👍 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Page: Should lazy load thumbnails to better support slower storage media and improve page responsiveness. #4091

Dataset Page: Should lazy load thumbnails to better support slower storage media and improve page responsiveness. #4091

kcondon commented Aug 18, 2017 •

edited

landreev commented Sep 13, 2017

kcondon commented Sep 14, 2017

landreev commented Sep 18, 2017

landreev commented Sep 19, 2017

landreev commented Sep 22, 2017

kcondon commented Sep 22, 2017 •

edited

landreev commented Sep 22, 2017

landreev commented Sep 22, 2017

landreev commented Sep 22, 2017

djbrooke commented Sep 22, 2017

landreev commented Sep 22, 2017

landreev commented Sep 25, 2017

landreev commented Sep 25, 2017

landreev commented Sep 25, 2017 •

edited

kcondon commented Sep 25, 2017 •

edited

landreev commented Oct 11, 2017

mheppler commented Oct 12, 2017

Dataset Page: Should lazy load thumbnails to better support slower storage media and improve page responsiveness. #4091

Dataset Page: Should lazy load thumbnails to better support slower storage media and improve page responsiveness. #4091

Comments

kcondon commented Aug 18, 2017 • edited

landreev commented Sep 13, 2017

kcondon commented Sep 14, 2017

landreev commented Sep 18, 2017

landreev commented Sep 19, 2017

landreev commented Sep 22, 2017

kcondon commented Sep 22, 2017 • edited

landreev commented Sep 22, 2017

landreev commented Sep 22, 2017

landreev commented Sep 22, 2017

djbrooke commented Sep 22, 2017

landreev commented Sep 22, 2017

landreev commented Sep 25, 2017

landreev commented Sep 25, 2017

landreev commented Sep 25, 2017 • edited

kcondon commented Sep 25, 2017 • edited

landreev commented Oct 11, 2017

mheppler commented Oct 12, 2017

kcondon commented Aug 18, 2017 •

edited

kcondon commented Sep 22, 2017 •

edited

landreev commented Sep 25, 2017 •

edited

kcondon commented Sep 25, 2017 •

edited