Simple webchat for server #1998

tobi · 2023-06-26T01:36:06Z

I put together a simple web-chat that demonstrates how to use the SSE(ish) streaming in the server example. I also went ahead and served it from the root url, to make the server a bit more approachable.

I tried to match the spirit of llama.cpp and used minimalistic js dependencies and went with the ozempic css style of ggml.ai.

Initially I went for no-js dependencies but gave up and used a few minimal that i'm importing from js cdns instead of adding them here. Let me know if you agree with this approach. I needed microsoft's fetch-event-source for using event-source over POST (super disappointed that browsers don't support that, actually) and preact+htm for keeping my sanity with all this state,. The upshot is that everything is in one small html file. Speaking of- there is probably a better (and less fragile) way to include the server.html in the cpp binary, but it's been 25 years since I worked with cpp tooling.

(updated screenshot)

ggerganov

Love this!

Initially I went for no-js dependencies but gave up and used a few minimal that i'm importing from js cdns instead of adding them here. Let me know if you agree with this approach.

I think it's good

Speaking of- there is probably a better (and less fragile) way to include the server.html in the cpp binary, but it's been 25 years since I worked with cpp tooling.

I guess it would be useful to specify the HTTP root path from the command line arguments instead of hard coding the path. But we can fix this later.

Approving and letting the "server team" take a look and merge

slaren · 2023-06-26T07:51:58Z

I think this is a good idea, but the html file should be in the binary. This will not work with the automatic builds because they don't include the contents of the examples directory.

IgnacioFDM · 2023-06-26T08:29:33Z

IMHO having the js dependencies locally would be better, so it works without an internet connection, and solves the risk of malicious js.

SlyEcho · 2023-06-26T09:18:15Z

I have done something like this before, I was serving HTML files in a C HTTP server. There was a CMake option to either build them in or to read them from disk. Reading from files is useful for development because you don't need to rebuild and restart the server. But building them in requires creating a small program that can hexdump the file into a C array definition. Overall pretty complex and then we have the Makefile as well...

using event-source over POST (super disappointed that browsers don't support that, actually)

Maybe we could add an endpoint with GET and query parameters?

Green-Sky · 2023-06-26T11:14:53Z

requires creating a small program that can hexdump the file into a C array definition.

should be pretty simple, i don't touch Makefiles directly often, how bad are custom target?

or we ship the generated.

(more reading here https://thephd.dev/finally-embed-in-c23)

SlyEcho · 2023-06-26T11:46:28Z

should be pretty simple, i don't touch Makefiles directly often, how bad are custom target?

I'd say Makefiles are a lot easier for this than CMake but it's just added complexity.

@tobi, How hard would it be for you to jam the HTML file contents into the .cpp file?

Green-Sky · 2023-06-26T11:47:43Z

after some thinking, i realized that we have pure text file(s), which means we only need to pre and post-fix the file with raw string literal markers
eg:

echo "R\"htmlraw(" > html_build.cpp
cat index.html >> html_build.cpp
echo ")htmlraw\"" >> html_build.cpp

in server.cpp:

const char* html_str =
#include "html_build.cpp"
;

edit: resulting html_build.cpp:

R"htmlraw(<html></html>)htmlraw"

examples/server/server.html

Green-Sky · 2023-06-26T12:25:20Z

Ok, gave it a go (running it) and found an issue. when ever the window looses focus (switching windows), it restarts the current outputting promt. see screencap

(i switched a couple of times back and forth)

tobi · 2023-06-26T14:00:27Z

@tobi, How hard would it be for you to jam the HTML file contents into the .cpp file?

Simple enough. I was hoping to hear that there is some kind of #embed thing that works in all the cpp compilers that we care about. Crazy that it took till C23 to get that into the standard.

I can just include it. I can also just embed one dependency js and call it a day.

The best argument for keeping it in the html file is to allow people to hack on it easier. I think this could become a really good chatbot UX if we are welcome to contributors. It's got good bones 😄

SlyEcho · 2023-06-26T14:07:13Z

#embed is not gonna work because it's too new.

Yes, it will be harder to develop, but you can also run a simple web server like with Python while developing it.

We can improve it later.

howard0su · 2023-06-26T14:14:11Z

Check this cmake script:
https://gist.github.com/sivachandran/3a0de157dccef822a230

I am also thinking if we should use the same tech to embed OpenCL kernels. The current approach which mixed kernel and normal C code will get into more maintenance headache.

SlyEcho · 2023-06-26T14:15:56Z

cmake script

Cool but we also have to support pure Makefile.

Green-Sky · 2023-06-26T14:20:49Z

i feel ignored 😅
we are not dealing with binary files here so my #1998 (comment) solution is simple 3 text file concats. pretty sure it wont get much simpler :)

SlyEcho · 2023-06-26T14:21:51Z

3 text file concats

Does it work in Windows?

Green-Sky · 2023-06-26T14:45:30Z

3 text file concats

Does it work in Windows?

if you use make on windows, you likely also have some coreutils installed (echo and cat)

cmake has built in functions for read/write/append file :)

ggerganov · 2023-06-26T18:32:05Z

For me, the greatest value of this example is that it demonstrates a minimalistic way of how to implement a basic HTML/JS client that communicates with the server using just a browser without having to install node or curl. How the client is served can be solved in many different ways, depending on the needs of the specific project. I recommend to merge the example as it is and potentially add improvements from master

Green-Sky · 2023-06-26T18:45:37Z

For me, the greatest value of this example is that it demonstrates a minimalistic way of how to implement a basic HTML/JS client that communicates with the server using just a browser without having to install node or curl. How the client is served can be solved in many different ways, depending on the needs of the specific project. I recommend to merge the example as it is and potentially add improvements from master

Agree, except we should really not hard code the path to the html. we basically ship the server, and that would look funky.

@tobi would it be too much to ask to implement the html root cli parameter for the server executable?
or for the fasttrack, if the hardcoded .html file could not be loaded (!file.is_open()) to fall back to the previous html string?

tobi · 2023-06-26T23:21:07Z

sure, i'll try to do that tonight.

tobi · 2023-06-27T00:41:01Z

OK so I did basically all of those things. There is now a --path param that you can point to any directory and static files will be served from this. I also added a deps.sh which just bakes the index.html and index.js into .hpp file (as per @Green-Sky's suggestion). So really, you can launch ./server from the llama.cpp folder and it will use ./examples/server/public directory, copy the ./server file to tmp and it will just use the baked ones, or use --path to work on your own UX.

The only downside is that we duplicate some files in git here, because of the baked .cpp files. But the deps are so small that it probably doesn't matter. It would be slightly cleaner to go and make deps.sh a build step in cmake and makefile, but... well... I ran out of courage.

tobi · 2023-06-27T00:51:33Z

@ggerganov server is in reasonably good shape overall. Maybe time for including it in the default build?

IgnacioFDM · 2023-06-27T07:52:02Z

I like the current approach, with the website embedded in the binary for simplicity, but also the option to serve from a directory, to improve iteration time and to allow user customization without recompiling. It also includes the js dependencies locally.

I agree with merging this in its current state. Further improvements can be done in future PRs.

🚢

SlyEcho

Actually, could you move the generated files next to the source files into the public folder. If we get more files it will keep it neater to keep to the same directory structure.

ggerganov · 2023-06-27T08:08:46Z

server is in reasonably good shape overall. Maybe time for including it in the default build?

Yes, let's do that. Originally, I insisted to put it behind an option since it was bringing the boost library as a dependency, which is a very big burden. Now that the implementation is so self-contained and minimal, we should enable the build by default and maintain it long term

examples/server/index.html.cpp

examples/server/server.cpp

Green-Sky · 2023-06-27T10:44:28Z

If you pull from master, the ci issues should go away.

tobi · 2023-06-27T12:51:12Z

Actually, could you move the generated files next to the source files into the public folder. If we get more files it will keep it neater to keep to the same directory structure.

that would make it possible to request the files at /index.html.cpp - still want that?

rain-1 · 2023-07-05T17:52:55Z

Is it chat only or is there also a text completion UI?

SlyEcho · 2023-07-05T18:01:01Z

It is just chat for now.

jarombouts · 2023-07-05T19:51:59Z

I tried to match the spirit of llama.cpp and used minimalistic js dependencies and went with the ozempic css style of ggml.ai.

I have nothing of use to add to this discussion but just want to point out that your use of ozempic as an adjective is blowing my mind

YannickFricke · 2023-07-06T02:37:15Z

@tobi
Is there a specific reason why you went with POST requests for SSE?

And there is (pretty good) support for SSE in browsers: https://developer.mozilla.org/en-US/docs/Web/API/EventSource#browser_compatibility

Green-Sky · 2023-07-06T09:57:27Z

Warning: When not used over HTTP/2, SSE suffers from a limitation to the maximum number of open connections, which can be specially painful when opening various tabs as the limit is per browser and set to a very low number (6). The issue has been marked as "Won't fix" in Chrome and Firefox.

countzero · 2023-07-06T10:51:33Z

@tobi I love it! Very functional and a great addition!

I compiled it successfully on Windows (https://github.com/countzero/windows_llama.cpp) and everything works except one CLI setting: The server example does not support the --n-predict option. Is this an oversight / bug or intended?

I expected the model options of server to be symmetrical to main.

SlyEcho · 2023-07-06T12:15:41Z

n_predict is an API request parameter. The web chat currently has a hardcoded number.

shametim · 2023-07-06T14:40:17Z

Fwiw on Windows 10 running on msys2 ucrt64 I had to add LDFLAGS += -lws2_32 (links the Windows Sockets API) in llama.cpp/Makefile to resolve building issues like this:

server.cpp:(.text$_ZN7httplib6Server24process_and_close_socketEy[_ZN7httplib6Server24process_and_close_socketEy]+0x10d): undefined reference to `__imp_closesocket'

YannickFricke · 2023-07-06T15:53:26Z

Warning: When not used over HTTP/2, SSE suffers from a limitation to the maximum number of open connections, which can be specially painful when opening various tabs as the limit is per browser and set to a very low number (6). The issue has been marked as "Won't fix" in Chrome and Firefox.

@Green-Sky

Yeah SSE isnt that good compared to Websockets.

So you suggest to switch over to websockets? But it will be pain to implement them as the current httplib doesnt really support them (you would have to do everything on your own)

And when it's the case that a user has more than 6 concurrently open connections to llama?

Another approach could be, that the EventSource is only open while we stream the completions - so you could completly circumvent this issue :)

rain-1 · 2023-07-07T16:50:35Z

dd1df3f#diff-045455b121ce797624fc9116aaab984486750bee48f02e246708cb168964ec41

I don't like this. Can we please include this as text not octets. It is obfuscated.

An option is to generate the hpp file from the js plain text during compilation.

rain-1 · 2023-07-07T17:37:17Z

There is a crash if the user continues to enter text and hit return to send it while the model is streaming tokens back.

I think it is due to the lock being released in the /completion handler. It may need to be passed inside the thread that is spun off: const auto chunked_content_provider = [&](size_t, DataSink & sink) {

actually

            const auto chunked_content_provider = [&](size_t, DataSink & sink) {
                auto my_lock = llama.lock();

seems to fix this.

Green-Sky · 2023-07-07T20:06:42Z

I don't like this. Can we please include this as text not octets. It is obfuscated.

An option is to generate the hpp file from the js plain text during compilation.

you can find more context further up (in this thread)

tobi · 2023-07-08T14:58:59Z

we should definitely add the xxd runs to the make file instead and remove them from the repo soon because the octet output will balloon the git history otherwise. I do think it's very important to bake the files into the binary though. Honestly, it's crazy that it took C and Cpp so long to add standard ways of doing this.

We did this in ASM in the 90s all the time.

rain-1 · 2023-07-08T15:06:09Z

we should definitely add the xxd runs to the make file instead and remove them from the repo soon because the octet output will balloon the git history otherwise. I do think it's very important to bake the files into the binary though. Honestly, it's crazy that it took C and Cpp so long to add standard ways of doing this.

We did this in ASM in the 90s all the time.

another option may be to use ld -b binary, whatever you prefer :)

SlyEcho · 2023-07-08T16:52:04Z

xxd is not a standard tool, it is part of the vim editor. Issue is with portability. Better would be to have our own hex converter in something like Python or C.

Green-Sky · 2023-07-08T17:44:02Z

xxd is not a standard tool, it is part of the vim editor. Issue is with portability. Better would be to have our own hex converter in something like Python or C.

i think he means https://en.cppreference.com/w/c/preprocessor/embed :)

commit 8432e9d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 9 16:55:30 2023 -0500 Update Makefile commit b58c189 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 9 16:20:00 2023 -0500 Add multi-gpu CuBLAS support to new GUI commit 0c1c71b Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 8 07:56:57 2023 -0500 Update Makefile commit f864f60 Author: Johannes Gäßler <johannesg@5d6.de> Date: Sat Jul 8 00:25:15 2023 +0200 CUDA: add __restrict__ to mul mat vec kernels (ggerganov#2140) commit 4539bc2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 8 01:36:14 2023 -0500 update makefile for changes commit 912e31e Merge: 74e2703 ddaa4f2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jul 7 23:15:37 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit ddaa4f2 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 22:14:14 2023 +0800 fix cuda garbage results and gpu selection issues commit 95eca51 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 18:39:47 2023 +0800 add gpu choice for GUI for cuda commit a689a66 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 17:52:34 2023 +0800 make it work with pyinstaller commit 9ee9a77 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 16:25:37 2023 +0800 warn outdated GUI (+1 squashed commits) Squashed commits: [15aec3d] spelling error commit 32102c2 Merge: 8424a35 481f793 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 14:15:39 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # README.md commit 481f793 Author: Howard Su <howard0su@gmail.com> Date: Fri Jul 7 11:34:18 2023 +0800 Fix opencl by wrap #if-else-endif with \n (ggerganov#2086) commit dfd9fce Author: Georgi Gerganov <ggerganov@gmail.com> Date: Thu Jul 6 19:41:31 2023 +0300 ggml : fix restrict usage commit 36680f6 Author: Judd <foldl@users.noreply.github.com> Date: Fri Jul 7 00:23:49 2023 +0800 convert : update for baichuan (ggerganov#2081) 1. guess n_layers; 2. relax warnings on context size; 3. add a note that its derivations are also supported. Co-authored-by: Judd <foldl@boxvest.com> commit a17a268 Author: tslmy <tslmy@users.noreply.github.com> Date: Thu Jul 6 09:17:50 2023 -0700 alpaca.sh : update model file name (ggerganov#2074) The original file name, `ggml-alpaca-7b-q4.bin`, implied the first-generation GGML. After the breaking changes (mentioned in ggerganov#382), `llama.cpp` requires GGML V3 now. Those model files are named `*ggmlv3*.bin`. We should change the example to an actually working model file, so that this thing is more likely to run out-of-the-box for more people, and less people would waste time downloading the old Alpaca model. commit 8424a35 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Thu Jul 6 23:24:21 2023 +0800 added the ability to ban any substring tokens commit 27a0907 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Thu Jul 6 22:33:46 2023 +0800 backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas commit 220aa70 Merge: 4d1700b 31cfbb1 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Thu Jul 6 15:40:40 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # scripts/sync-ggml.sh # tests/test-grad0.c # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp commit 4d1700b Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Thu Jul 6 15:17:47 2023 +0800 adjust some ui sizing commit 1c80002 Author: Vali-98 <137794480+Vali-98@users.noreply.github.com> Date: Thu Jul 6 15:00:57 2023 +0800 New UI using customtkinter (LostRuins#284) * Initial conversion to customtkinter. * Initial conversion to customtkinter. * Additions to UI, still non-functional * UI now functional, untested * UI now functional, untested * Added saving configs * Saving and loading now functional * Fixed sliders not loading * Cleaned up duplicate arrays * Cleaned up duplicate arrays * Fixed loading bugs * wip fixing all the broken parameters. PLEASE test before you commit * further cleaning * bugfix completed for gui. now evaluating save and load * cleanup prepare to merge --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com> commit 31cfbb1 Author: Tobias Lütke <tobi@shopify.com> Date: Wed Jul 5 16:51:13 2023 -0400 Expose generation timings from server & update completions.js (ggerganov#2116) * use javascript generators as much cleaner API Also add ways to access completion as promise and EventSource * export llama_timings as struct and expose them in server * update readme, update baked includes * llama : uniform variable names + struct init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> commit 74e2703 Merge: cf65429 f9108ba Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 5 15:16:49 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 983b555 Author: Jesse Jojo Johnson <williamsaintgeorge@gmail.com> Date: Wed Jul 5 18:03:19 2023 +0000 Update Server Instructions (ggerganov#2113) * Update server instructions for web front end * Update server README * Remove duplicate OAI instructions * Fix duplicate text --------- Co-authored-by: Jesse Johnson <thatguy@jessejojojohnson.com> commit ec326d3 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Wed Jul 5 20:44:11 2023 +0300 ggml : fix bug introduced in ggerganov#1237 commit 1b6efea Author: Georgi Gerganov <ggerganov@gmail.com> Date: Wed Jul 5 20:20:05 2023 +0300 tests : fix test-grad0 commit 1b107b8 Author: Stephan Walter <stephan@walter.name> Date: Wed Jul 5 16:13:06 2023 +0000 ggml : generalize `quantize_fns` for simpler FP16 handling (ggerganov#1237) * Generalize quantize_fns for simpler FP16 handling * Remove call to ggml_cuda_mul_mat_get_wsize * ci : disable FMA for mac os actions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> commit 8567c76 Author: Jesse Jojo Johnson <williamsaintgeorge@gmail.com> Date: Wed Jul 5 15:13:35 2023 +0000 Update server instructions for web front end (ggerganov#2103) Co-authored-by: Jesse Johnson <thatguy@jessejojojohnson.com> commit 924dd22 Author: Johannes Gäßler <johannesg@5d6.de> Date: Wed Jul 5 14:19:42 2023 +0200 Quantized dot products for CUDA mul mat vec (ggerganov#2067) commit 051c70d Author: Howard Su <howard0su@gmail.com> Date: Wed Jul 5 18:31:23 2023 +0800 llama: Don't double count the sampling time (ggerganov#2107) commit ea79e54 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Wed Jul 5 17:29:35 2023 +0800 fixed refusing to quantize some models commit 9e4475f Author: Johannes Gäßler <johannesg@5d6.de> Date: Wed Jul 5 08:58:05 2023 +0200 Fixed OpenCL offloading prints (ggerganov#2082) commit 7f0e9a7 Author: Nigel Bosch <pnigelb@gmail.com> Date: Tue Jul 4 18:33:33 2023 -0500 embd-input: Fix input embedding example unsigned int seed (ggerganov#2105) commit b472f3f Author: Georgi Gerganov <ggerganov@gmail.com> Date: Tue Jul 4 22:25:22 2023 +0300 readme : add link web chat PR commit ed9a54e Author: Georgi Gerganov <ggerganov@gmail.com> Date: Tue Jul 4 21:54:11 2023 +0300 ggml : sync latest (new ops, macros, refactoring) (ggerganov#2106) - add ggml_argmax() - add ggml_tanh() - add ggml_elu() - refactor ggml_conv_1d() and variants - refactor ggml_conv_2d() and variants - add helper macros to reduce code duplication in ggml.c commit f257fd2 Author: jwj7140 <32943891+jwj7140@users.noreply.github.com> Date: Wed Jul 5 03:06:12 2023 +0900 Add an API example using server.cpp similar to OAI. (ggerganov#2009) * add api_like_OAI.py * add evaluated token count to server * add /v1/ endpoints binding commit 7ee76e4 Author: Tobias Lütke <tobi@shopify.com> Date: Tue Jul 4 10:05:27 2023 -0400 Simple webchat for server (ggerganov#1998) * expose simple web interface on root domain * embed index and add --path for choosing static dir * allow server to multithread because web browsers send a lot of garbage requests we want the server to multithread when serving 404s for favicon's etc. To avoid blowing up llama we just take a mutex when it's invoked. * let's try this with the xxd tool instead and see if msvc is happier with that * enable server in Makefiles * add /completion.js file to make it easy to use the server from js * slightly nicer css * rework state management into session, expose historyTemplate to settings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> commit acc111c Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 4 15:38:04 2023 +0300 Allow old Make to build server. (ggerganov#2098) Also make server build by default. Tested with Make 3.82 commit 23c7c6f Author: ZhouYuChen <zhouyuchen@naver.com> Date: Tue Jul 4 20:15:16 2023 +0800 Update Makefile: clean simple (ggerganov#2097) commit 69add28 Merge: 00e35d0 698efad Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 18:51:42 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml commit 00e35d0 Merge: fff705d f9108ba Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 18:46:40 2023 +0800 Merge branch 'concedo' into concedo_experimental commit f9108ba Author: Michael Moon <triffid.hunter@gmail.com> Date: Tue Jul 4 18:46:08 2023 +0800 Make koboldcpp.py executable on Linux (LostRuins#293) commit fff705d Merge: 784628a c6c0afd Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 18:42:02 2023 +0800 Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental commit c6c0afd Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 18:35:03 2023 +0800 refactor to avoid code duplication commit 784628a Merge: ca9a116 309534d Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 16:38:32 2023 +0800 Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental commit 698efad Author: Erik Scholz <Green-Sky@users.noreply.github.com> Date: Tue Jul 4 01:50:12 2023 +0200 CI: make the brew update temporarily optional. (ggerganov#2092) until they decide to fix the brew installation in the macos runners. see the open issues. eg actions/runner-images#7710 commit 14a2cc7 Author: Govlzkoy <gotope@users.noreply.github.com> Date: Tue Jul 4 07:50:00 2023 +0800 [ggml] fix index for ne03 value in ggml_cl_mul_f32 (ggerganov#2088) commit cf65429 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 16:56:40 2023 -0500 print cuda or opencl based on what's used commit 72c16d2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 16:45:39 2023 -0500 Revert "fix my mistake that broke other arches" This reverts commit 777aed5. commit 1cf14cc Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 4 00:05:23 2023 +0300 fix server crashes (ggerganov#2076) commit 777aed5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 15:53:32 2023 -0500 fix my mistake that broke other arches commit cc45a7f Author: Howard Su <howard0su@gmail.com> Date: Tue Jul 4 02:43:55 2023 +0800 Fix crash of test-tokenizer-0 under Debug build (ggerganov#2064) * Fix crash of test-tokenizer-0 under Debug build * Change per comment commit ca9a116 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 00:35:02 2023 +0800 possibly slower, but cannot use larger batches without modifying ggml library. commit bfeb347 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Mon Jul 3 21:36:42 2023 +0800 fix typos commit 55dbb91 Author: Howard Su <howard0su@gmail.com> Date: Mon Jul 3 19:58:58 2023 +0800 [llama] No need to check file version when loading vocab score (ggerganov#2079) commit d7d2e6a Author: WangHaoranRobin <56047610+WangHaoranRobin@users.noreply.github.com> Date: Mon Jul 3 05:38:44 2023 +0800 server: add option to output probabilities for completion (ggerganov#1962) * server: add option to output probabilities for completion * server: fix issue when handling probability output for incomplete tokens for multibyte character generation * server: fix llama_sample_top_k order * examples/common.h: put all bool variables in gpt_params together commit 27780a9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 16:03:27 2023 -0500 rocm fixes commit f52c7d4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 16:02:58 2023 -0500 Revert "rocm fixes" This reverts commit 2fe9927. commit 2fe9927 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:58:21 2023 -0500 rocm fixes commit efe7560 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:55:43 2023 -0500 Revert "move HIPBLAS definitions into ggml-cuda.h" This reverts commit bf49a93. commit 4fc0181 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:55:36 2023 -0500 Revert "move hipblas definitions to header files" This reverts commit 2741ffb. commit 89eb576 Merge: 2741ffb 3d2907d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 14:44:13 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 309534d Author: Ycros <18012+ycros@users.noreply.github.com> Date: Sun Jul 2 18:15:34 2023 +0000 implement sampler order, expose sampler order and mirostat in api commit 3d2907d Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 18:28:09 2023 +0800 make gptneox and gptj work with extended context too commit d6b47e6 Merge: e17c849 46088f7 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 17:26:39 2023 +0800 Merge branch 'master' into concedo_experimental commit e17c849 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 17:25:08 2023 +0800 switched to NTK aware scaling commit e19483c Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 14:55:08 2023 +0800 increase scratch for above 4096 commit 46088f7 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sun Jul 2 09:46:46 2023 +0300 ggml : fix build with OpenBLAS (close ggerganov#2066) commit b85ea58 Merge: ef3b8dc 0bc2cdf Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 14:45:25 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # README.md commit 2741ffb Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 17:07:42 2023 -0500 move hipblas definitions to header files commit bf49a93 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 16:38:50 2023 -0500 move HIPBLAS definitions into ggml-cuda.h commit 540f4e0 Merge: 2c3b46f eda663f Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 14:58:32 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 0bc2cdf Author: Johannes Gäßler <johannesg@5d6.de> Date: Sat Jul 1 21:49:44 2023 +0200 Better CUDA synchronization logic (ggerganov#2057) commit befb3a3 Author: Johannes Gäßler <johannesg@5d6.de> Date: Sat Jul 1 21:47:26 2023 +0200 Test-based VRAM scratch size + context adjustment (ggerganov#2056) commit b213227 Author: Daniel Drake <drake@endlessos.org> Date: Sat Jul 1 20:31:44 2023 +0200 cmake : don't force -mcpu=native on aarch64 (ggerganov#2063) It's currently not possible to cross-compile llama.cpp for aarch64 because CMakeLists.txt forces -mcpu=native for that target. -mcpu=native doesn't make sense if your build host is not the target architecture, and clang rejects it for that reason, aborting the build. This can be easily reproduced using the current Android NDK to build for aarch64 on an x86_64 host. If there is not a specific CPU-tuning target for aarch64 then -mcpu should be omitted completely. I think that makes sense, there is not enough variance in the aarch64 instruction set to warrant a fixed -mcpu optimization at this point. And if someone is building natively and wishes to enable any possible optimizations for the host device, then there is already the LLAMA_NATIVE option available. Fixes LostRuins#495. commit 2f8cd97 Author: Aaron Miller <apage43@ninjawhale.com> Date: Sat Jul 1 11:14:59 2023 -0700 metal : release buffers when freeing metal context (ggerganov#2062) commit 471aab6 Author: Judd <foldl@users.noreply.github.com> Date: Sun Jul 2 01:00:25 2023 +0800 convert : add support of baichuan-7b (ggerganov#2055) Co-authored-by: Judd <foldl@boxvest.com> commit 463f2f4 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sat Jul 1 19:05:09 2023 +0300 llama : fix return value of llama_load_session_file_internal (ggerganov#2022) commit cb44dbc Author: Rand Xie <randxiexyy29@gmail.com> Date: Sun Jul 2 00:02:58 2023 +0800 llama : catch llama_load_session_file_internal exceptions (ggerganov#2022) * convert checks in llama_load_session_file to throw and handle them * make llama_load_session_file_internal static * address feedbacks to avoid using exceptions commit 79f634a Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sat Jul 1 18:46:00 2023 +0300 embd-input : fix returning ptr to temporary commit 04606a1 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sat Jul 1 18:45:44 2023 +0300 train : fix compile warning commit b1ca8f3 Author: Qingyou Meng <meng.qingyou@gmail.com> Date: Sat Jul 1 23:42:43 2023 +0800 ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (ggerganov#1995) Will not be scheduled unless explicitly enabled. commit 2c3b46f Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 18:43:43 2023 -0500 changes to fix build commit c9e1103 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 18:20:07 2023 -0500 Update ggml_v2-cuda-legacy.cu for ROCM commit b858fc5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 17:49:39 2023 -0500 changes to work with upstream commit 69a0c25 Merge: 096f0b0 1347d3a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 16:59:06 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 096f0b0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 15:27:02 2023 -0500 revert unnecessary hipblas conditionals commit d81e81a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 14:48:23 2023 -0500 Update Makefile hipblas nvcc correction commit 2579ecf Merge: abed427 d2034ce Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 25 17:50:04 2023 -0500 Merge branch 'LostRuins:concedo' into main commit abed427 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jun 24 19:16:30 2023 -0500 reorganize If statements to include proper headers commit 06c3bf0 Merge: ea6d320 8342fe8 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jun 24 16:57:20 2023 -0500 Merge branch 'LostRuins:concedo' into main commit ea6d320 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jun 23 01:53:28 2023 -0500 Update README.md commit 4d56ad8 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 16:19:43 2023 -0500 Update README.md commit 21f9308 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 15:42:05 2023 -0500 kquants_iter for hipblas and add gfx803 commit b6ff890 Merge: eb094f0 e6ddb15 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 12:42:09 2023 -0500 Merge branch 'LostRuins:concedo' into main commit eb094f0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 23:59:18 2023 -0500 lowvram parameter description commit 3a5dfeb Merge: 665cc11 b1f00fa Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 16:53:03 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit 665cc11 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 01:13:19 2023 -0500 add lowvram parameter commit 222cbbb Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 19:03:28 2023 -0500 add additional hipblas conditions for cublas commit e1f9581 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 16:51:59 2023 -0500 Add hip def for cuda v2 commit 3bff5c0 Merge: a7e74b3 266d47a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 13:38:06 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit a7e74b3 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 22:04:18 2023 -0500 Update README.md commit 5e99b3c Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 22:03:42 2023 -0500 Update Makefile commit 9190b17 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 21:47:10 2023 -0500 Update README.md commit 2780ea2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 15:48:00 2023 -0500 Update Makefile commit 04a3e64 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:33:39 2023 -0500 remove extra line commit cccbca9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:31:17 2023 -0500 attempt adding ROCM hipblas commit a44a1d4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:31:01 2023 -0500 attempt adding ROCM hipblas commit b088184 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:30:54 2023 -0500 attempt adding ROCM hipblas

arch-btw · 2023-07-22T19:28:32Z

examples/server/public/index.html

+
+            <div>
+              <label for="nPredict">Predictions</label>
+              <input type="range" id="nPredict" min="1" max="2048" step="1" name="n_predict" value="${params.value.n_predict}" oninput=${updateParamsFloat} />


Should we make min="-1" an option for infinity? Thanks!

arch-btw · 2023-07-22T19:30:26Z

Oops sorry @tobi & @Green-Sky I didn't realize it had already been merged.

tobi changed the title ~~Server simple web~~ Simple webchat for server Jun 26, 2023

ggerganov approved these changes Jun 26, 2023

View reviewed changes

ggerganov requested review from SlyEcho and Green-Sky June 26, 2023 07:51

Green-Sky reviewed Jun 26, 2023

View reviewed changes

examples/server/server.html Outdated Show resolved Hide resolved

SlyEcho requested changes Jun 27, 2023

View reviewed changes

Green-Sky requested changes Jun 27, 2023

View reviewed changes

examples/server/index.html.cpp Outdated Show resolved Hide resolved

examples/server/server.cpp Outdated Show resolved Hide resolved

tobi added 3 commits July 4, 2023 09:14

rework state management into session, expose historyTemplate to settings

fedce00

fix mobile, fix missing prompt cache

eee6d69

basic response formatting

c19daa4

tobi force-pushed the server-simple-web branch from b970292 to c19daa4 Compare July 4, 2023 13:21

Green-Sky merged commit 7ee76e4 into ggerganov:master Jul 4, 2023
22 checks passed

jessejohnson mentioned this pull request Jul 4, 2023

Update Server Instructions For Web Front End #2103

Merged

tobi mentioned this pull request Jul 5, 2023

Expose generation timings from server & update completions.js #2116

Merged

headllines bot mentioned this pull request Jul 6, 2023

Hacker News Daily Top 10 @2023-07-06 headllines/hackernews-daily#1086

Open

jacky1234 mentioned this pull request Jul 6, 2023

HackerNews Top 10 @2023-07-06 jacky1234/blogPages#107

Open

github-actions bot mentioned this pull request Jul 6, 2023

Hacker News Daily Top 30 @2023-07-06 meixger/hackernews-daily#291

Open

rain-1 mentioned this pull request Jul 7, 2023

Add a webui to this #1479

Closed

arch-btw reviewed Jul 22, 2023

View reviewed changes

Simple webchat for server #1998

Simple webchat for server #1998

Conversation

tobi commented Jun 26, 2023 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

slaren commented Jun 26, 2023

IgnacioFDM commented Jun 26, 2023 • edited Loading

SlyEcho commented Jun 26, 2023

Green-Sky commented Jun 26, 2023

SlyEcho commented Jun 26, 2023

Green-Sky commented Jun 26, 2023 • edited Loading

Green-Sky commented Jun 26, 2023

tobi commented Jun 26, 2023

SlyEcho commented Jun 26, 2023

howard0su commented Jun 26, 2023

SlyEcho commented Jun 26, 2023

Green-Sky commented Jun 26, 2023

SlyEcho commented Jun 26, 2023

Green-Sky commented Jun 26, 2023

ggerganov commented Jun 26, 2023

Green-Sky commented Jun 26, 2023

tobi commented Jun 26, 2023

tobi commented Jun 27, 2023

tobi commented Jun 27, 2023

IgnacioFDM commented Jun 27, 2023

SlyEcho left a comment

Choose a reason for hiding this comment

ggerganov commented Jun 27, 2023

Green-Sky commented Jun 27, 2023

tobi commented Jun 27, 2023

rain-1 commented Jul 5, 2023

SlyEcho commented Jul 5, 2023

jarombouts commented Jul 5, 2023

YannickFricke commented Jul 6, 2023

Green-Sky commented Jul 6, 2023

countzero commented Jul 6, 2023

SlyEcho commented Jul 6, 2023

shametim commented Jul 6, 2023

YannickFricke commented Jul 6, 2023 • edited Loading

rain-1 commented Jul 7, 2023 • edited Loading

rain-1 commented Jul 7, 2023 • edited Loading

Green-Sky commented Jul 7, 2023

tobi commented Jul 8, 2023 • edited Loading

rain-1 commented Jul 8, 2023

SlyEcho commented Jul 8, 2023

Green-Sky commented Jul 8, 2023

arch-btw Jul 22, 2023

Choose a reason for hiding this comment

arch-btw commented Jul 22, 2023

tobi commented Jun 26, 2023 •

edited

Loading

IgnacioFDM commented Jun 26, 2023 •

edited

Loading

Green-Sky commented Jun 26, 2023 •

edited

Loading

YannickFricke commented Jul 6, 2023 •

edited

Loading

rain-1 commented Jul 7, 2023 •

edited

Loading

rain-1 commented Jul 7, 2023 •

edited

Loading

tobi commented Jul 8, 2023 •

edited

Loading