[Datasets] Remove stats actor #31571
Labels
data
Ray Data-related issues
data-observability
stale
The issue is stale. It will be closed within 7 days unless there are further conversation
What happened + What you expected to happen
Stats actor is responsbile to keep the stats for read tasks only, and be accessed by driver when driver RPC to stats actor. This model has the weird timing issue that if
ds.stats()
is called immediately after read, the stats may not be up-dated. The reason is when ds.stats() is called and stats actor does not get the metadata from read tasks yet. Read tasks makes RPC to stats actor for sending the metadata. The read stats is stored in stats actor, and fetched into driver memory. so causes this timing issue here.Plan here is to kill stats actor and move the read stats into driver memory directly.
Versions / Dependencies
master
Reproduction script
on master, this script may or may not print execution stats depending on timing:
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: