Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ag-grid and implement getRowId to improve runs table performance #5725

Merged
merged 15 commits into from
May 6, 2022

Conversation

adamreeve
Copy link
Contributor

@adamreeve adamreeve commented Apr 20, 2022

What changes are proposed in this pull request?

Upgrades to ag-grid 27.2.0 and makes the following changes to improve the performance of the runs table when loading more rows:

  • Move the "Load more" button out of the grid and just render it below the grid in a separate element
  • Implement getRowId using the run uuid so that previously rendered rows aren't re-rendered (see https://www.ag-grid.com/react-data-grid/row-ids/)
    • In order to correctly re-render cells when data has changed, this required making sure each column definition had a field specified that corresponded to an actual field present in the row data. Eg. previously the models column referred to the models field that didn't actually exist, so models were always considered equal and weren't re-rendered when they changed. Similarly, the date column actually used many different fields that affected how the cell should be rendered, so just comparing startTime values wasn't sufficient to decide whether the cell should be re-rendered.

Fixes #5653 (see that issue for some performance numbers)

How is this patch tested?

Running existing unit tests, manual testing of the UI to check that behaviour seems correct.

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Improve the performance of the runs table when loading a large number of runs.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

(I'm not sure about this classification, feel free to change it)

Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/bug-fix Mention under Bug Fixes in Changelogs. labels Apr 20, 2022
Signed-off-by: Adam Reeve <adreeve@gmail.com>
@dbczumar
Copy link
Collaborator

@harupy @sunishsheth2009 Can you take a look?

Copy link
Collaborator

@sunishsheth2009 sunishsheth2009 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions on how do we improve performance here?

I thought setting the getRowId to runId is enough. What are all the other changes needed for?

@@ -179,10 +176,26 @@ export class ExperimentRunsTableMultiColumnView2 extends React.Component {
},
{
headerName: ATTRIBUTE_COLUMN_LABELS.DATE,
field: 'startTime',
field: 'runDateInfo',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious on this, Are we getting the same data from a new field now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. I've added a corresponding new runDateInfo property in the data returned from getRowData, which is an object aggregating all of the data needed to render the date cell rather than just the start time. And DateCellRenderer now accesses these values through its value prop instead of the data prop that contains data for the whole row. This is needed so that we can correctly implement the equals property to handle deciding whether the cell needs to be re-rendered.

Comment on lines 186 to 197
equals: (dateInfo1, dateInfo2) => {
return (
dateInfo1.referenceTime === dateInfo2.referenceTime &&
dateInfo1.startTime === dateInfo2.startTime &&
dateInfo1.experimentId === dateInfo2.experimentId &&
dateInfo1.runUuid === dateInfo2.runUuid &&
dateInfo1.runStatus === dateInfo2.runStatus &&
dateInfo1.isParent === dateInfo2.isParent &&
dateInfo1.hasExpander === dateInfo2.hasExpander &&
dateInfo1.expanderOpen === dateInfo2.expanderOpen &&
_.isEqual(dateInfo1.childrenIds, dateInfo2.childrenIds)
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check the equals here. Where is this function evoked from and how do we use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't used directly within mlflow code, but is used by ag-grid to decide whether a cell needs to be re-rendered. After the data is updated, ag-grid can decide not to re-render a cell if the row id is the same and the cell values are considered to be equal. If we didn't implement this then a lot of cells would unnecessarily be re-rendered because by default reference equality is used.

This wasn't needed previously when row ids were assigned by ag-grid as a new set of ids is assigned after data is updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using _.equals here already, what's stopping us from doing _.isEqual(dateInfo1, dateInfo2)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned there might be more performance overhead of using _.isEqual but I haven't tested that to verify. Using _.isEqual(dateInfo1, dateInfo2) would definitely simplify things so I'll switch to that if I don't see a big performance difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah using _.isEqual for the top-level comparisons is slower but it's pretty insignificant compared to the time everything else takes. Eg. when loading 100 more rows with 1000 already loaded I get 25 ms in the value comparisons with the current code and then 100 ms when changing all the comparisons to just use _.isEqual, but the whole loading operation takes multiple seconds so I'll make this change.

@adamreeve
Copy link
Contributor Author

I thought setting the getRowId to runId is enough. What are all the other changes needed for?

Implementing getRowId is the main change that improves performance, and most of the other changes are needed for the grid to work correctly with application assigned row ids and re-render cells when needed.

Eg. previously for the "Start Time" column, the value of the cell was set to the startTime field, but the DateCellRenderer actually used a bunch of other properties from the row data to render this cell. This meant that when data was updated, the "Start Time" cells were never re-rendered even if things like expanderOpen had changed, because ag-grid only tested for equality of startTime to decide whether to re-render this cell. I've also had to add a referenceDate field here so that cells are re-rendered when time passes due to start time being rendered as something like "x seconds ago".

Similarly for the "Models" column, this was configured to use a non-existent models field, which would always evaluate to undefined so values for this column were always considered equal and not re-rendered, so I've had to add a new models field to fix the rendering behaviour for this column.

There are similar changes to other columns so that every column is mapped to a value field that can be tested for equality to decide whether cells need to be re-rendered when data is updated. This wasn't a problem previously when using grid assigned row ids as new ids were assigned whenever data was updated and every row was completely re-rendered.

The two changes that aren't directly related to implementing getRowId are the change to the runInfosByUuid reducer, which is a minor performance improvement I added after seeing this take a fairly long time when profiling, and moving the LoadMore button outside of the grid.

Moving the load more button out gave a performance improvement on its own, as I guess ag-grid then didn't have to test each row to see if it needed to use the FullWidthCellRenderer and could simplify its rendering logic. I'm not sure if this would still show as big a performance improvement when done after the getRowId change. When I tried just implementing getRowId without moving the load more button and using ag-grid 27.1.0, I was getting errors within ag-grid. Possibly these have been fixed in 27.2.0 but I'd argue it's still better to keep the load more button outside of the grid as rendering it within a row just seems to add unnecessary complication without any obvious benefit.

Signed-off-by: Adam Reeve <adreeve@gmail.com>
Copy link
Collaborator

@sunishsheth2009 sunishsheth2009 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the detailed explanation. It makes sense. :)
Also thank you for making these changes and the upgrade. Appreciate your help

@xanderwebs can you take a look at it as well?

Copy link
Collaborator

@xanderwebs xanderwebs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, code generally looks good, the only thing from my end is that it would be good to keep the function signature of Utils.renderSource the same (even if the params are unused here).

@@ -103,8 +103,8 @@ class Utils {
return dateFormat(d, format);
}

static timeSinceStr(date) {
const seconds = Math.max(0, Math.floor((new Date() - date) / 1000));
static timeSinceStr(date, referenceDate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this changes the function signature, do you mind adding a default here to referenceDate such that it will default to new Date() if someone calls it the old way?

*/
static renderSource(tags, queryParams, runUuid) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For edge purposes, it would be better to keep the signature of the function the same.

@@ -276,7 +276,7 @@ export class RunViewImpl extends Component {
>
<div style={{ display: 'flex', alignItems: 'center' }}>
{Utils.renderSourceTypeIcon(tags)}
{Utils.renderSource(tags, queryParams, runUuid)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, would be good to keep this for edge purposes.

Comment on lines 186 to 197
equals: (dateInfo1, dateInfo2) => {
return (
dateInfo1.referenceTime === dateInfo2.referenceTime &&
dateInfo1.startTime === dateInfo2.startTime &&
dateInfo1.experimentId === dateInfo2.experimentId &&
dateInfo1.runUuid === dateInfo2.runUuid &&
dateInfo1.runStatus === dateInfo2.runStatus &&
dateInfo1.isParent === dateInfo2.isParent &&
dateInfo1.hasExpander === dateInfo2.hasExpander &&
dateInfo1.expanderOpen === dateInfo2.expanderOpen &&
_.isEqual(dateInfo1.childrenIds, dateInfo2.childrenIds)
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using _.equals here already, what's stopping us from doing _.isEqual(dateInfo1, dateInfo2)?

Comment on lines +706 to +707
const { experimentId } = props.data;
const { name, basename } = props.value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between data and value here? ==> The question behind this question is really, why are we reading experimentId in DateCellRenderer off of value there, but off of data here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data contains the data for the whole row, whereas value is the value for this cell (selected from the row data using the field configured for the column). The value here doesn't include the experimentId because I'm using the experiment name values directly from the map returned by Utils.getExperimentNameMap in getRowData.

I could probably create new value objects that also include the experimentId which might be tidier, but that seems unnecessary as the experiment id for a row is never going to change (if it could change it would be important to include it so that the equality comparison was correct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment in the function to explain this too.

Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
…erer

Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
@adamreeve
Copy link
Contributor Author

Thanks for the feedback @xanderwebs, I think I've addressed all of your comments now.

Copy link
Collaborator

@xanderwebs xanderwebs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dbczumar dbczumar merged commit 0a88eab into mlflow:master May 6, 2022
@adamreeve adamreeve deleted the grid-update branch May 8, 2022 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/bug-fix Mention under Bug Fixes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Loading more runs in the experiment UI becomes very slow with a large number of rows
4 participants