Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Optimize Horovod Timeline and add Cycle Markers #782
This change introduces a separate thread for writing timeline to disk that uses a lock-free queue to receive timeline records from the main thread.
With this change, tensor accounting during negotiation is taking 1.0-1.2 microsecond per tensor x rank, while it used to take 3+ microseconds.