Move to Async Read/Writes #121

blester125 · 2022-12-25T20:24:07Z

This PR updates the our clean and smudge filters to use async to do I/O bound operations (mostly tensorstore serialization and running git-lfs-(clean|smudge)). This is done with an async wrapper of subprocesses that mimics subprocess.run. It also includes tools for running an async function on all the k, v pairs in a dictionary so that they will happen concurrently.

The main async entry point is in the clean and smudge filters where the cleaning (or smuding) function is applied to each leaf in the parameter tree concurrently.

This also updates the Update API a little bit. This update makes async possible (i.e. some methods are converted from def to async def) and it also makes the API easier for writing new update plugins. Additionally the serializer is now passed to the update class and used internally instead of being used externally. Also things like communication with git-lfs is now done in the write method explicitly instead of being part of the LfsMeatadata creation.

There are still some change wrt to documentation, typing, and the exact Updater API but I wanted to get eyes on it sooner rather than later.

git_theta/file_io.py

git_theta/git_utils.py

git_theta/params.py

git_theta/updates/base.py

nkandpa2 · 2022-12-28T06:45:57Z

No issue for me with the direction of the Update API and async clean/smudge/push.

blester125 · 2023-01-03T16:50:15Z

closes #61

This PR updates the our clean and smudge filters to use async to do I/O bound operations (mostly tensorstore serialization and running git-lfs-(clean|smudge)). This is done with an async wrapper of subprocesses that mimics subprocess.run. It also includes tools for running an async function on all the k, v pairs in a dictionary so that they will happen concurrently. The main async entry point is in the clean and smudge filters where the cleaning (or smuding) function is applied to each leaf in the parameter tree concurrently. This also updates the Update API a little bit. This update makes async possible (i.e. some methods are converted from `def` to `async def`) and it also makes the API easier for writing new update plugins. Additionally the serializer is now passed to the update class and used internally instead of being used externally. Also things like communication with git-lfs is now done in the write method explicitly instead of being part of the LfsMeatadata creation.

blester125 requested a review from nkandpa2 December 25, 2022 20:24

nkandpa2 reviewed Dec 28, 2022

View reviewed changes

This was referenced Dec 29, 2022

Fix broken tests and add test cases for new classes #119

Merged

Refactor Update plugins #112

Closed

nkandpa2 approved these changes Jan 5, 2023

View reviewed changes

blester125 force-pushed the feat/async branch 3 times, most recently from e7c69ba to 7aff100 Compare January 6, 2023 13:32

blester125 force-pushed the feat/async branch from 7aff100 to 0422c68 Compare January 6, 2023 13:40

blester125 merged commit 8222c05 into r-three:main Jan 9, 2023

blester125 deleted the feat/async branch January 9, 2023 19:39

This was referenced Jan 10, 2023

use tensorstore async for writing out parameter group files #61

Closed

Update Update plugin classes #107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to Async Read/Writes #121

Move to Async Read/Writes #121

blester125 commented Dec 25, 2022

nkandpa2 commented Dec 28, 2022

blester125 commented Jan 3, 2023

Move to Async Read/Writes #121

Move to Async Read/Writes #121

Conversation

blester125 commented Dec 25, 2022

nkandpa2 commented Dec 28, 2022

blester125 commented Jan 3, 2023