-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New llama-run #17554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
New llama-run #17554
Conversation
94a98cf to
500a90c
Compare
|
@angt @ggerganov @CISC @ngxson PTAL |
|
Considering these points:
So, I think it's better to repurpose one of the 3 binaries mentioned in my first point instead. |
|
f839cfb to
5b0c817
Compare
|
I think what's better doing at this point is improve the usability of |
5b0c817 to
fddbeed
Compare
I’m on board with the idea, but can we push ahead with this PR and plan the refactor for a later stage? To be honest, I think you’re the best person to handle the refactoring. You know exactly how you want the architecture to look, and it’s hard for others to guess those specifics. If someone else tries, we risk getting stuck in a loop of corrections, so it’s probably more efficient if you drive that part. |
For most parts, server code already refactored to be reused in another tool/example. I'm not sure what kind of refactoring you're talking about |
Moving the relevant code to llama-cli, llama-run, etc. or some other place. |
|
I think your
This way, you simply make In the future, we can make server as a static (or maybe shared) lib target and reuse it in both server and run |
0e3f326 to
f7a7775
Compare
f347cb1 to
2619c11
Compare
|
People should give this a shot, the UX is quite neat: |
|
Seems to be a better direction now. I'll see if it's worth splitting some changes specific to One question though: why do we need to split |
|
Btw maybe a TODO, we can now add multimodal support to the CLI too |
2619c11 to
f9ae221
Compare
I've received mixed feedback in the past regarding granular file separation versus consolidation, so I am unsure of the preferred direction here. run-chat.cpp and run.cpp seems like a reasonable split between chat-focused activities and other code required to get the tool running. |
- Added readline.cpp include - Created run_chat_mode(): - Initializes readline with command history - Maintains conversation history - Applies chat templates to format messages - Submits completion tasks to the server queue - Displays assistant responses interactively Signed-off-by: Eric Curtin <eric.curtin@docker.com>
f9ae221 to
a449065
Compare
|
If somebody could restart the failing builds I'd appreciate it, I don't have any sort of maintainer access anymore, as limited as a random contributor |
|
The failed builds are not related to server, we can ignore them anw. Beside, I usually open a mirror PR on my fork to skip the long waiting line of CIs on main repo, you can give that a try. |
Uh oh!
There was an error while loading. Please reload this page.