-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oh hell yes #2
Comments
I'm finally attempting to replace I was wondering if you have suggestions on how to replace some callbacks:
Unrelated: have you per chance tried running Phi 3 with Wllama? I know the 128K context is not officially supported yet, but there does seem to be some success with getting 64K context. I'm personally really looking forward to when Phi 3 128 is supported, as I suspect it would be the ultimate small "do it all" model for browser-based contexts. |
More questions as I'm going along:
|
Oh darn, the advanced example answers a lot of my questions, apologies.https://github.com/ngxson/wllama/blob/master/examples/advanced/index.html // Is this a bug in the example? Setting the same property twice: wllama/examples/advanced/index.html Line 56 in 2450545
|
Cool project! Thanks for paying attention for wllama.
I planned to add one (and cache control options) but there're still some issues. If you want, you can implement your own download function (with callback), then pass the final buffer to Line 125 in 2450545
If you want to have more control over the response, you can implement your own Line 224 in 2450545
By implementing your own
Yes, since models are loaded into RAM, it's better to unload the model before loading a new one to prevent running out of RAM.
No because many default options are defined inside llama.cpp (cpp code, not javascript level). I'm planning to copy then into this project in the future. This requires parsing cpp code and convert them either into ts/js, either simply generate a markdown documentation. Either way will be quite complicated. For now, you can see default values in llama.h file: https://github.com/ggerganov/llama.cpp/blob/master/llama.h
Yes, it's a typo. Because the index.html file is not typescript, I don't have any suggestion from IDE. One should be top_p and the other should be top_k. |
Whoop! I've got initial implementation working! Now to get to the details.
I went ahead and created a very minimal implementation of a download progress callback in a PR. It should hold me over until your prefered implementation is done, to which I'll then update.
By looking at the advanced example I found the I'm going to see if I can hack in an abort button next :-) |
It seems I can simply call |
I created an extremely minimalist way to interrupt the inference here: |
Wllama now has a built-in interruption ability.
|
Thank you for this! I've been using llama-cpp-wasm and the 2GB size restriction was a real stumbling block.
The text was updated successfully, but these errors were encountered: