-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracking] Model loading/caching enhancements #284
Comments
Great! I was thinking to group all of them too. I would implement a facade and then we can implement specific storages. I can definitely take this one! Im actually taking it for a project of mine. |
@DavidGOrtega Thanks for offering help! You are referring to item C3 right?
I think we should make the changes in TVMjs. I wouldn't be too worried about getting things merged there:) Also cc @DiegoCao who is looking into IndexedDB as well. |
Hi David, thanks for offering the help! I'm looking into the TVMjs as well and we need to make changes in module TVMjs. |
@CharlieFRuan I think all of them are in TVMjs. Parallelise downloads and change the cache layer for something much more agnostic. As I say I would use level because we can then me it work with different caches. Its a facade |
let me know what works for you @DiegoCao. What do you want to pick? |
I think we can go with your suggestion and use level here. Looking forward to the change! |
@DavidGOrtega I can work on C2 first and work on C4 after you built the indexDB and levels. |
Perfect so I do a PR for C1 and another for C3 |
Sorry for chiming in late. For the caching layer, ideally we would like something that comes with minimal dependency. Spefically, we should:
This way the default implementation won't come with extra dependency via IndexDB API. |
Received, will do a PR for C0 for migrating old ArtifactCache to interface and implementation of the existing Basic Approach. |
Another nice feature, especially if you're doing C4, would be to allow inspecting items in the cache "downloading" them, i.e. copying them out of the cache to disk (so that users don't have to find them wherever their browser is storing files). |
I would be interested in contributing to C4, for both web-llm and web-sd. @DiegoCao unless you already have some progress on it, I can take a crack at it this weekend. Lmk if I'm late to the party. |
Hi @ethrx thanks! It requires some changes on the TVM side and I have started working on it |
Hi! Thank you for this issue, quite helpful to see a glimpse of the future here. Are there any plans to allow for resumable downloads and/or add the ability the cancel a download? |
Hi @germain-gg! I believe currently the downloads are resumable, as weights are broken into shards (e.g. ~105 shards for Llama-3-8B). For each shard that finishes downloading, it would be cached. To see the effect, try load a model in the demo page, then refresh/close the browser, and re-load, you'll see the download resumes rather than starting over. |
Overview
There have been many great suggestions from the community regarding loading and caching model weights. This tracker issue compiles the suggestions and keeps track of the progress.
Action Items
C0: Make ArtifactCache https://github.com/apache/tvm/blob/main/web/src/runtime.ts#L991 an interface ArtifactCache in a new file
artifact_cache.ts
C1: Parallelize weight shards download on tvmjs side
C2: Add a helper function to delete cache storage (part of C0)
C3: Switch to IndexDB for caching
C4: Allow using local models
The text was updated successfully, but these errors were encountered: