Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INT-1162] Infinite scroll in file explorer #9715

Merged
merged 19 commits into from Feb 9, 2024
Merged

Conversation

emmajsadams
Copy link
Contributor

@emmajsadams emmajsadams commented Feb 7, 2024

This PR implements infinite scroll pagination in our custom JuypterLab FileBrowser. It does so by setting up a scroll event listener that increments and decrements an index when user scrolls to the upper 20% and lower 20% respectively of the FileBrowser Contents DOM node. When this index changes the FileBrowser is re-rendered. Each index represents an offset of 200 Contents from the start of the overall cached Contents. Since only a 700 Contents are visible at any time regardless of index position or total Contents count this solution can scale with a large number of files (tested with 500k) since the low Contents limit of the Juypterlab FileBrowser is always respected.

As part of testing this I also made a few changes to improve the performance of the each re-render of the FileBrowser.

  • Cached the initial shallowResponse after navigating to a directory and reuse that in each call of _get. Making an GET request on each re-render is unnecessarily slow and is quite noticeable when infinite scroll causes a re-render.
  • Only re-render for infinite scroll not on each paginated request for Contents. These requests can happen safely in the background now without needing to update the pagination UI.

One feature currently not implemented is the ability to jump to anywhere in the Contents list. Since this solution works off an index this should be possible in a follow up PR. Dynamically changing the scrollbar height of the Contents DOM node to match the total count of cached Contents then determining where the user clicks to set the index should accomplish this. However this is tricky and worth splitting into a separate story.

Demo

new.infinite.scroll.mov

Why no library was used for infinite scroll

Most mature infinite scroll libraries (TanStack scroll, react-window, react-virtualized) require some sort of framework like React or Svelte and modify the DOM nodes as the user scrolls. This does not work with the Juypterlab FileBrowser code since even if we mounted a ReactWidget that output lumino styled li those li would not have the same event handlers setup by the FileBrowser. On top of that the FileBrowser breaks completely if Contents DOM nodes are modified outside the expected render loop.

Copy link
Contributor

@bbonenfant bbonenfant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but the portions were a bit outside my expertise. If you'd like more substantial feed back we might be able to convince someone from console to take a look?

// How many items to request per page
const PAGINATION_NUMBER = 400;
// How many Content li are visible at a given time for the user to scroll.
const VISIBLE_CONTENT_LIS = 500;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to not do this, but maybe VISIBLE_CONTENT_LI_COUNT would be a clearer name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that name better will change!

// How many items to request per page.
const PAGINATION_NUMBER = 1000;
// How tall in pixels a Content li is.
const LI_HEIGHT_PX = 25;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to set as a constant? This seems like something we don't really have control over -- could we extract this info from the browser at runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I can grab the clientHeight from the first contentsNode.childNodes[0] instead of hard coding this.

private _cache: {
key: string | null;
now: number | null;
contents: Contents.IModel[];
filteredContents: Contents.IModel[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification, why this addition to the cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_cache.contents refers to all the files for the current directory whereas _cache.filteredContents refers to the contents currently filtered by the value the user has typed into the FilterBox above the FileBrowser.

Since _get is called each time the user changes the _index by scrolling, we want to avoid filtering the cached contents each call since the cached contents could be huge. Thus the filtering only occurs when the user changes directory or changes the filtering then is cached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok, the piece I was missing was that _get is called for every scroll. thanks!

@@ -18,7 +18,9 @@ describe('mount', () => {
cy.findAllByText('/ pfs').should('have.length', 2);
cy.findAllByText('Load').first().click();
cy.findAllByText('Unload').should('have.length', 1);
cy.findAllByText('default_images').first().click();
cy.get('#jupyterlab-pachyderm-browser-pfs')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly sure why this change is needed now, but the problem with searching findByText without an id is that default_images also appears as text in the repo list. When I debugged this locally the text is the element that was clicked which obviously does nothing. Changing this to findByText in the FileBrowser ID scope fixes this issue.

This applies to all other places where I added the get('#jupyterlab-pachyderm-browser-pfs') call.

cy.findAllByText('liberty.png').should('have.length', 1);
cy.findAllByText('default_images_branch').first().click();
cy.get('#jupyterlab-pachyderm-browser-pfs').findByText('/ pfs').click();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was really strange I honestly do not know how this test passed before this change. What this test is doing is clicking on a mounted repo folder, checking the file is there, then repeating that for a second mounted repo folder. The problem with the test is that you have to navigate back to the root directory to see the second mounted repo folder, but the test does not do that. Thus I added one step that clicks on the / pfs breadcrumb to make it pass.

@emmajsadams
Copy link
Contributor Author

Ok I responded to your comments and included two new changes @bbonenfant.

  • Disabled FileBrowser restore so that the user scrolling immediately isn't interrupted.
  • Cached the initial shallowResponse after navigating to a directory and reuse that in each call of _get. Making an GET request on each re-render is unnecessarily slow and is quite noticeable when infinite scroll causes a re-render.

I commented on the changes that fix the tests to explain why they were needed. I did figure out the issue with running Cypress tests locally, but it requires us migrating to the latest cypress. I will split that out into a separate PR since it results in a bunch of changes.

As far as having console approval maybe lets wait until the next PR which allows the user to jump to anywhere in the Contents list via the scroll bar https://pachyderm.atlassian.net/browse/INT-1203?

@bbonenfant
Copy link
Contributor

Those cypress changes are weird. Hopefully moving to a more recent version will make these tests easier to work with. LGTM -- merge when you're ready

@emmajsadams emmajsadams merged commit 41ab138 into master Feb 9, 2024
12 checks passed
@emmajsadams emmajsadams deleted the eadams/INT-1162 branch February 13, 2024 03:29
bbonenfant pushed a commit that referenced this pull request Feb 22, 2024
This PR implements infinite scroll pagination in our custom JuypterLab
FileBrowser. It does so by setting up a scroll event listener that
increments and decrements an index when user scrolls to the upper 20%
and lower 20% respectively of the FileBrowser Contents DOM node. When
this index changes the FileBrowser is re-rendered. Each index represents
an offset of 200 Contents from the start of the overall cached Contents.
Since only a 700 Contents are visible at any time regardless of index
position or total Contents count this solution can scale with a large
number of files (tested with 500k) since the low Contents limit of the
Juypterlab FileBrowser is always respected.

As part of testing this I also made a few changes to improve the
performance of the each re-render of the FileBrowser.
* Cached the initial shallowResponse after navigating to a directory and
reuse that in each call of _get. Making an GET request on each re-render
is unnecessarily slow and is quite noticeable when infinite scroll
causes a re-render.
* Only re-render for infinite scroll not on each paginated request for
Contents. These requests can happen safely in the background now without
needing to update the pagination UI.

One feature currently not implemented is the ability to jump to anywhere
in the Contents list. Since this solution works off an index this should
be possible in a follow up PR. Dynamically changing the scrollbar height
of the Contents DOM node to match the total count of cached Contents
then determining where the user clicks to set the index should
accomplish this. However this is tricky and worth splitting into a
separate story.

### Demo

https://github.com/pachyderm/pachyderm/assets/401518/0ee94933-49be-461f-b5ec-cb144d9a2f06

### Why no library was used for infinite scroll
Most mature infinite scroll libraries (TanStack scroll, react-window,
react-virtualized) require some sort of framework like React or Svelte
and modify the DOM nodes as the user scrolls. This does not work with
the Juypterlab FileBrowser code since even if we mounted a ReactWidget
that output lumino styled `li` those li would not have the same event
handlers setup by the FileBrowser. On top of that the FileBrowser breaks
completely if Contents DOM nodes are modified outside the expected
render loop.
bbonenfant added a commit that referenced this pull request Feb 23, 2024
- Lumino tabs (#9692)
- Uninstantiated python client bugfix (#9705)
- Prevent bad config from being stored (#9711)
- Remove dev_server (#9713)
- UNKNOWN cluster status state (#9719)
- Prevent error popups bugfix (#9721)
- Infinite scroll in file explorer (#9715)
- Fix local cypress tests (#9728)
- Infinite scroll w/ scrollbar (#9746)
- Clear mounted datums before new datums (#9716)
- Make input spec non-resizeable (#9720)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants