-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ag-grid and implement getRowId to improve runs table performance #5725
Conversation
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
@harupy @sunishsheth2009 Can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions on how do we improve performance here?
I thought setting the getRowId
to runId is enough. What are all the other changes needed for?
@@ -179,10 +176,26 @@ export class ExperimentRunsTableMultiColumnView2 extends React.Component { | |||
}, | |||
{ | |||
headerName: ATTRIBUTE_COLUMN_LABELS.DATE, | |||
field: 'startTime', | |||
field: 'runDateInfo', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious on this, Are we getting the same data from a new field now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. I've added a corresponding new runDateInfo
property in the data returned from getRowData
, which is an object aggregating all of the data needed to render the date cell rather than just the start time. And DateCellRenderer
now accesses these values through its value
prop instead of the data
prop that contains data for the whole row. This is needed so that we can correctly implement the equals property to handle deciding whether the cell needs to be re-rendered.
equals: (dateInfo1, dateInfo2) => { | ||
return ( | ||
dateInfo1.referenceTime === dateInfo2.referenceTime && | ||
dateInfo1.startTime === dateInfo2.startTime && | ||
dateInfo1.experimentId === dateInfo2.experimentId && | ||
dateInfo1.runUuid === dateInfo2.runUuid && | ||
dateInfo1.runStatus === dateInfo2.runStatus && | ||
dateInfo1.isParent === dateInfo2.isParent && | ||
dateInfo1.hasExpander === dateInfo2.hasExpander && | ||
dateInfo1.expanderOpen === dateInfo2.expanderOpen && | ||
_.isEqual(dateInfo1.childrenIds, dateInfo2.childrenIds) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to check the equals here. Where is this function evoked from and how do we use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't used directly within mlflow code, but is used by ag-grid to decide whether a cell needs to be re-rendered. After the data is updated, ag-grid can decide not to re-render a cell if the row id is the same and the cell values are considered to be equal. If we didn't implement this then a lot of cells would unnecessarily be re-rendered because by default reference equality is used.
This wasn't needed previously when row ids were assigned by ag-grid as a new set of ids is assigned after data is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using _.equals here already, what's stopping us from doing _.isEqual(dateInfo1, dateInfo2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was concerned there might be more performance overhead of using _.isEqual
but I haven't tested that to verify. Using _.isEqual(dateInfo1, dateInfo2)
would definitely simplify things so I'll switch to that if I don't see a big performance difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah using _.isEqual
for the top-level comparisons is slower but it's pretty insignificant compared to the time everything else takes. Eg. when loading 100 more rows with 1000 already loaded I get 25 ms in the value comparisons with the current code and then 100 ms when changing all the comparisons to just use _.isEqual
, but the whole loading operation takes multiple seconds so I'll make this change.
Implementing Eg. previously for the "Start Time" column, the value of the cell was set to the Similarly for the "Models" column, this was configured to use a non-existent There are similar changes to other columns so that every column is mapped to a value field that can be tested for equality to decide whether cells need to be re-rendered when data is updated. This wasn't a problem previously when using grid assigned row ids as new ids were assigned whenever data was updated and every row was completely re-rendered. The two changes that aren't directly related to implementing Moving the load more button out gave a performance improvement on its own, as I guess ag-grid then didn't have to test each row to see if it needed to use the |
Signed-off-by: Adam Reeve <adreeve@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for the detailed explanation. It makes sense. :)
Also thank you for making these changes and the upgrade. Appreciate your help
@xanderwebs can you take a look at it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, code generally looks good, the only thing from my end is that it would be good to keep the function signature of Utils.renderSource
the same (even if the params are unused here).
@@ -103,8 +103,8 @@ class Utils { | |||
return dateFormat(d, format); | |||
} | |||
|
|||
static timeSinceStr(date) { | |||
const seconds = Math.max(0, Math.floor((new Date() - date) / 1000)); | |||
static timeSinceStr(date, referenceDate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this changes the function signature, do you mind adding a default here to referenceDate such that it will default to new Date()
if someone calls it the old way?
*/ | ||
static renderSource(tags, queryParams, runUuid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For edge purposes, it would be better to keep the signature of the function the same.
@@ -276,7 +276,7 @@ export class RunViewImpl extends Component { | |||
> | |||
<div style={{ display: 'flex', alignItems: 'center' }}> | |||
{Utils.renderSourceTypeIcon(tags)} | |||
{Utils.renderSource(tags, queryParams, runUuid)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, would be good to keep this for edge purposes.
equals: (dateInfo1, dateInfo2) => { | ||
return ( | ||
dateInfo1.referenceTime === dateInfo2.referenceTime && | ||
dateInfo1.startTime === dateInfo2.startTime && | ||
dateInfo1.experimentId === dateInfo2.experimentId && | ||
dateInfo1.runUuid === dateInfo2.runUuid && | ||
dateInfo1.runStatus === dateInfo2.runStatus && | ||
dateInfo1.isParent === dateInfo2.isParent && | ||
dateInfo1.hasExpander === dateInfo2.hasExpander && | ||
dateInfo1.expanderOpen === dateInfo2.expanderOpen && | ||
_.isEqual(dateInfo1.childrenIds, dateInfo2.childrenIds) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using _.equals here already, what's stopping us from doing _.isEqual(dateInfo1, dateInfo2)
?
const { experimentId } = props.data; | ||
const { name, basename } = props.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between data
and value
here? ==> The question behind this question is really, why are we reading experimentId
in DateCellRenderer
off of value
there, but off of data
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data
contains the data for the whole row, whereas value
is the value for this cell (selected from the row data using the field
configured for the column). The value
here doesn't include the experimentId
because I'm using the experiment name values directly from the map returned by Utils.getExperimentNameMap
in getRowData
.
I could probably create new value objects that also include the experimentId
which might be tidier, but that seems unnecessary as the experiment id for a row is never going to change (if it could change it would be important to include it so that the equality comparison was correct).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment in the function to explain this too.
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
…erer Signed-off-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Adam Reeve <adreeve@gmail.com>
Thanks for the feedback @xanderwebs, I think I've addressed all of your comments now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What changes are proposed in this pull request?
Upgrades to ag-grid 27.2.0 and makes the following changes to improve the performance of the runs table when loading more rows:
getRowId
using the run uuid so that previously rendered rows aren't re-rendered (see https://www.ag-grid.com/react-data-grid/row-ids/)models
column referred to themodels
field that didn't actually exist, so models were always considered equal and weren't re-rendered when they changed. Similarly, thedate
column actually used many different fields that affected how the cell should be rendered, so just comparingstartTime
values wasn't sufficient to decide whether the cell should be re-rendered.Fixes #5653 (see that issue for some performance numbers)
How is this patch tested?
Running existing unit tests, manual testing of the UI to check that behaviour seems correct.
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
Improve the performance of the runs table when loading a large number of runs.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes(I'm not sure about this classification, feel free to change it)