-
Notifications
You must be signed in to change notification settings - Fork 144
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
In order to profile and optimize the current inference server architecture and best tune its hyper-parameters for various applications, it would be very useful for AlphaZero.jl to have a mode where it outputs a debugging timeline in which it is possible to easily visualize when each worker submits an inference request, when it gets an answer, and when inference concretely runs on the GPU (along with some info on the concrete batch size that is used).
More concretely, I imagine adding a profile_inference_timeline option to the simulate function. When this option is used:
- Every time a worker sends an inference request or gets an answer, the current worker id and wall-clock time is recorded (along with the id of the thread hosting the current worker?)
- We also record the wall-clock time every time the inference server sends a batch to the GPU and also when it gets an answer (also logging info about the batch size that was used would be useful).
- This data could be dumped into a big JSON file and then visualized using another tool.
- One possible visualization would be a profiling timeline similar to the ones visualized with "chrome://tracing", with one track per worker and a separate track for the inference server. Maybe we could even generate some JSON output that is directly compatible with "chrome://tracing" (which is what the pytorch profiler is doing for example)
In particular, such a tool would be invaluable to investigate issues such as this one.
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed