Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Ray Data jobs detail table #40756

Merged
merged 13 commits into from
Nov 3, 2023
Merged

Conversation

Zandew
Copy link
Contributor

@Zandew Zandew commented Oct 27, 2023

Why are these changes needed?

Creates a table under the jobs page to display dataset-level metrics.

The _StatsActor now stores dataset metadata like state, progress and start/end_time for each executed dataset, that is directly queried by the new data_head dashboard api. This api also makes requests to the prometheus server to get other metrics that are displayed.

Screenshot 2023-10-31 at 4 39 10 PM

cluster env: jobs-data-overview to test on workspaces.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@Zandew Zandew force-pushed the data-head-api branch 2 times, most recently from 2e3356a to bcdd57d Compare October 30, 2023 17:40
Comment on lines +24 to +27
MAX = (
"max",
"max_over_time(sum({}) by (dataset)[" + f"{MAX_TIME_WINDOW}:{SAMPLE_RATE}])",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simpler way to get the max of a metric?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the brackets ([]) do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like [1h:1s] in this case means that we're querying the last 1 hour at 1 second intervals.

Comment on lines 56 to 62
for metric, queries in DATASET_METRICS.items():
for query in queries:
result = await self._query_prometheus(query.value[1].format(metric))
for res in result["data"]["result"]:
dataset, value = res["metric"]["dataset"], res["value"][1]
if dataset in datasets:
datasets[dataset][metric][query.value[0]] = value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could store these values back in the _StatsActor for completed datasets so we don't have to query prometheus for datasets that are done executing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's do this in a future PR (feel free to leave a TODO comment or open an issue)

@scottsun94
Copy link
Contributor

Can we detect if people use Ray data in a ray job? If so, I'd like to move this table to the top above the "Task/actor overview" and hide this table when people don't use Ray Data

@Zandew
Copy link
Contributor Author

Zandew commented Oct 30, 2023

Can we detect if people use Ray data in a ray job? If so, I'd like to move this table to the top above the "Task/actor overview" and hide this table when people don't use Ray Data

Yeah we can just hide the table if there aren't any datasets.

@scottsun94
Copy link
Contributor

Can we detect if people use Ray data in a ray job? If so, I'd like to move this table to the top above the "Task/actor overview" and hide this table when people don't use Ray Data

Yeah we can just hide the table if there aren't any datasets.

Great! Can you update the screenshot?

@Zandew Zandew marked this pull request as ready for review October 31, 2023 16:12
Copy link
Contributor

@raulchen raulchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data changes look good to me

dashboard/client/src/components/DataOverviewTable.tsx Outdated Show resolved Hide resolved
dashboard/client/src/components/DataOverviewTable.tsx Outdated Show resolved Hide resolved
dashboard/client/src/pages/data/DataOverview.tsx Outdated Show resolved Hide resolved
Comment on lines +24 to +27
MAX = (
"max",
"max_over_time(sum({}) by (dataset)[" + f"{MAX_TIME_WINDOW}:{SAMPLE_RATE}])",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the brackets ([]) do?

dashboard/client/src/pages/job/JobDetail.tsx Outdated Show resolved Hide resolved
@scottsun94
Copy link
Contributor

LGTM. Thanks!

Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
dashboard/BUILD Outdated Show resolved Hide resolved
.buildkite/data.rayci.yml Show resolved Hide resolved
.buildkite/data.rayci.yml Show resolved Hide resolved
Copy link
Contributor

@scottjlee scottjlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

dashboard/client/src/components/DataOverviewTable.tsx Outdated Show resolved Hide resolved
dashboard/client/src/pages/job/JobDetail.tsx Show resolved Hide resolved
Comment on lines 56 to 62
for metric, queries in DATASET_METRICS.items():
for query in queries:
result = await self._query_prometheus(query.value[1].format(metric))
for res in result["data"]["result"]:
dataset, value = res["metric"]["dataset"], res["value"][1]
if dataset in datasets:
datasets[dataset][metric][query.value[0]] = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's do this in a future PR (feel free to leave a TODO comment or open an issue)

dashboard/modules/data/data_head.py Outdated Show resolved Hide resolved
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for data side.

@c21
Copy link
Contributor

c21 commented Nov 2, 2023

Hi @alanwguo - do you wanna do a final review on dashboard side? Thanks.

@c21 c21 merged commit f08498e into ray-project:master Nov 3, 2023
80 of 90 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
Creates a table under the jobs page to display dataset-level metrics.

The `_StatsActor` now stores dataset metadata like `state`, `progress` and `start/end_time` for each executed dataset, that is directly queried by the new `data_head` dashboard api. This api also makes requests to the prometheus server to get other metrics that are displayed.

Signed-off-by: Andrew Xue <andewzxue@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants