Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server Example Refactor and Improvements #1570

Merged
merged 161 commits into from
Jun 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
161 commits
Select commit Hold shift + click to select a range
1c3fdf8
Add all generation parameters to server.cpp and allow resetting context
digiwombat May 23, 2023
2071d73
Forgot to remove some testing code.
digiwombat May 23, 2023
421e66b
Update examples/server/server.cpp
digiwombat May 23, 2023
add5f1b
Update examples/server/server.cpp
digiwombat May 23, 2023
3537ad1
Merge branch 'ggerganov:master' into master
digiwombat May 23, 2023
8d7b28c
Fixed some types in the params.
digiwombat May 23, 2023
c2b55cc
Added LoRA Loading
digiwombat May 25, 2023
48cb16a
Merge branch 'ggerganov:master' into master
digiwombat May 27, 2023
66ed19d
Corrected dashes in the help lines.
digiwombat May 27, 2023
36c86d7
Automate Context resetting and minor fixes
digiwombat May 27, 2023
d20f36b
Removed unnecessary last_prompt_token set
digiwombat May 27, 2023
fdce895
Merge branch 'ggerganov:master' into master
digiwombat May 27, 2023
e84b802
Change top_k type.
digiwombat May 28, 2023
1f40a78
Didn't see the already defined top_k var.
digiwombat May 28, 2023
51e0994
server rewrite
SlyEcho May 27, 2023
f93fe36
Add all generation parameters to server.cpp and allow resetting context
digiwombat May 23, 2023
df0e0d0
Forgot to remove some testing code.
digiwombat May 23, 2023
549291f
keep processed from the beginning
SlyEcho May 28, 2023
177868e
Changed to params/args
digiwombat May 28, 2023
e8efd75
Initial timeout code and expanded json return on completion.
digiwombat May 28, 2023
23928f2
Added generation_settings to final json object.
digiwombat May 28, 2023
2e5c5ee
Changed JSON names to match the parameter name rather than the variab…
digiwombat May 28, 2023
dda915c
Added capturing the stopping word and sending it along with the final…
digiwombat May 28, 2023
7740301
Set unspecified generation settings back to default. (Notes below)
digiwombat May 28, 2023
7186d65
seed and gen params
SlyEcho May 28, 2023
15ddc49
Merge remote-tracking branch 'slyecho/server_refactor'
digiwombat May 28, 2023
74c6f36
Editorconfig suggested fixes
SlyEcho May 28, 2023
2c9ee7a
Apply suggestions from code review
digiwombat May 28, 2023
655899d
Add ignore_eos option to generation settings.
digiwombat May 28, 2023
b38d41e
--memory_f32 flag to --memory-f32 to match common.cpp
digiwombat May 28, 2023
6c58f64
--ctx_size flag to --ctx-size to match common.cpp
digiwombat May 28, 2023
33b6957
Fixed failing to return result on stopping token.
digiwombat May 28, 2023
42cf4d8
Merge branch 'master' into master
SlyEcho May 28, 2023
03ea8f0
Fix for the regen issue.
digiwombat May 30, 2023
d6fff56
add streaming via server-sent events
May 30, 2023
3292f05
Changed to single API endpoint for streaming and non.
digiwombat May 30, 2023
38eaf2b
Removed testing fprintf calls.
digiwombat May 30, 2023
a25f830
Default streaming to false if it's not set in the request body.
digiwombat May 31, 2023
2533878
Merge branch 'master' into sse
digiwombat May 31, 2023
e6de69a
Merge pull request #3 from anon998/sse
digiwombat May 31, 2023
7a853dc
prevent the server from swallowing exceptions in debug mode
May 31, 2023
aa0788b
add --verbose flag and request logging
May 31, 2023
9197674
Merge pull request #4 from anon998/logging
digiwombat May 31, 2023
b6f536d
Cull to end of generated_text when encountering a stopping string in …
digiwombat May 31, 2023
7a8104f
add missing quote when printing stopping strings
May 31, 2023
3a079d5
stop generating when the stream is closed
May 31, 2023
9f2424a
Merge pull request #5 from anon998/stop-stream
digiwombat May 31, 2023
c1cbde8
print error when server can't bind to the interface
May 31, 2023
2c08f29
make api server use only a single thread
May 31, 2023
284bc29
reserve memory for generated_text
May 31, 2023
f1710b9
add infinite generation when n_predict is -1
May 31, 2023
aa2bbb2
fix parameter type
May 31, 2023
27911d6
fix default model alias
May 31, 2023
dd30219
buffer incomplete multi-byte characters
May 31, 2023
40e1380
print timings + build info
May 31, 2023
d58e486
default penalize_nl to false + format
May 31, 2023
3edaf6b
print timings by default
May 31, 2023
96fa480
Merge pull request #6 from anon998/fix-multibyte
digiwombat May 31, 2023
7332b41
Simple single-line server log for requests
digiwombat May 31, 2023
dda4c10
Switch to the CPPHTTPLIB logger. Verbose adds body dump as well as re…
digiwombat May 31, 2023
86337e3
Server console logs now come in one flavor: Verbose.
digiwombat May 31, 2023
1b96df2
Spacing fix. Nothing to see here.
digiwombat May 31, 2023
276fa99
Misunderstood the instructions, I think. Back to the raw JSON output …
digiwombat May 31, 2023
43d295f
filter empty stopping strings
May 31, 2023
1bd7cc6
reuse format_generation_settings for logging
May 31, 2023
497160a
remove old log function
May 31, 2023
f2e1130
Merge pull request #7 from anon998/logging-reuse
digiwombat May 31, 2023
9104fe5
Change how the token buffers work.
SlyEcho May 31, 2023
8478e59
Merge pull request #8 from SlyEcho/server_refactor
digiwombat May 31, 2023
bed308c
Apply suggestions from code review
SlyEcho May 31, 2023
342604b
Added a super simple CORS header as default for all endpoints.
digiwombat May 31, 2023
e9b1f0b
fix stopping strings
May 31, 2023
5f6e16d
Merge pull request #9 from anon998/stopping-strings
digiwombat Jun 1, 2023
f7882e2
Fixed a crash caused by erasing from empty last_n_tokens
digiwombat Jun 1, 2023
5bbc030
Add Options enpoints and Access-Control-Allow-Headers to satisfy CORS…
cirk2 Jun 1, 2023
8c6a5fc
last tokens fixes
SlyEcho Jun 1, 2023
9531ae6
Add logit bias support
SlyEcho Jun 1, 2023
797155a
Merge pull request #10 from cirk2/master
digiwombat Jun 1, 2023
af71126
Merge pull request #11 from SlyEcho/server_refactor
digiwombat Jun 1, 2023
49a18bd
remove unused parameter warning
Jun 1, 2023
6025476
default penalize_nl back to true
Jun 1, 2023
8cbc4be
clear logit_bias between requests + print
Jun 1, 2023
d29b6d5
Merge pull request #12 from anon998/clear-logit-bias
digiwombat Jun 1, 2023
0bc0477
Apply suggestions from code review
SlyEcho Jun 2, 2023
731ecc0
fix typo
Jun 2, 2023
ebfead6
remove unused variables
Jun 2, 2023
1488a0f
make functions that never return false void
Jun 2, 2023
49dce94
make types match gpt_params exactly
Jun 2, 2023
a8a9f19
small fixes
Jun 2, 2023
2932db1
avoid creating element in logit_bias accidentally
Jun 2, 2023
47efbb5
use std::isinf to check if ignore_eos is active
Jun 2, 2023
88cc7bb
Stuff with logits
SlyEcho Jun 2, 2023
abb7782
Merge branch 'master' into small-fixes
anon998 Jun 2, 2023
bebea65
Merge pull request #13 from anon998/small-fixes
digiwombat Jun 2, 2023
8f9e546
trim partial stopping strings when not streaming
Jun 2, 2023
f820740
move multibyte check to doCompletion
Jun 2, 2023
f5d5e70
Merge pull request #14 from anon998/do-completion-update
digiwombat Jun 2, 2023
1bd52c8
Merge branch 'ggerganov:master' into master
digiwombat Jun 2, 2023
3df0192
improve long input truncation
SlyEcho Jun 2, 2023
28cc0cd
Merge pull request #15 from SlyEcho/server_refactor
digiwombat Jun 2, 2023
3ff27d3
Fixed up a few things in embedding mode.
digiwombat Jun 2, 2023
41bb71b
replace invalid characters instead of crashing
Jun 2, 2023
4dd72fc
Merge pull request #16 from anon998/fix-log-json
digiwombat Jun 2, 2023
16e1c98
Removed the embedding api endpoint and associated code.
digiwombat Jun 2, 2023
7cebe2e
Merge branch 'master' of https://github.com/digiwombat/llama.cpp
digiwombat Jun 2, 2023
bcd6167
improve docs and example
SlyEcho Jun 2, 2023
de6df48
Removed embedding from README
digiwombat Jun 2, 2023
310bf61
Merge pull request #17 from SlyEcho/server_refactor
digiwombat Jun 2, 2023
5758e9f
Removed embedding from flags.
digiwombat Jun 2, 2023
e1e2be2
remove --keep from help text
Jun 2, 2023
a6ed390
update readme
Jun 2, 2023
05a5a48
make help text load faster
Jun 2, 2023
98ae2de
parse --mlock and --no-mmap + format
Jun 2, 2023
df2ecc9
Merge pull request #18 from anon998/update-readme
digiwombat Jun 2, 2023
64a0653
Merge remote-tracking branch 'upstream/master'
digiwombat Jun 7, 2023
61befcb
Apply suggestions from code review
SlyEcho Jun 8, 2023
ccd85e0
Apply suggestions from code review
SlyEcho Jun 8, 2023
a9c3477
Spaces to 4 and other code style cleanup. Notes in README.
digiwombat Jun 9, 2023
cc2b336
Missed a pair of catch statements for formatting.
digiwombat Jun 9, 2023
23a1b18
Merge branch 'ggerganov:master' into master
digiwombat Jun 9, 2023
7580427
Resolving some review comments
digiwombat Jun 9, 2023
889d904
Merge branch 'master' of https://github.com/digiwombat/llama.cpp
digiwombat Jun 9, 2023
7cdeb08
More formatting cleanup
digiwombat Jun 9, 2023
1a9141b
Remove model assign in main(). Clarified stop in README.
digiwombat Jun 9, 2023
917540c
Clarify build instructions in README.
lesaun Jun 10, 2023
d6d263f
Merge pull request #19 from lesaun/master
digiwombat Jun 10, 2023
bac0ddb
Merge branch 'ggerganov:master' into master
digiwombat Jun 10, 2023
2c00bf8
more formatting changes
SlyEcho Jun 11, 2023
9612d12
big logging update
SlyEcho Jun 11, 2023
6518f9c
build settings
SlyEcho Jun 11, 2023
eee8b28
Merge pull request #20 from SlyEcho/server_refactor
digiwombat Jun 11, 2023
4148b9b
remove void
SlyEcho Jun 12, 2023
dff11a1
json parsing improvements
SlyEcho Jun 12, 2023
13cf692
more json changes and stop info
SlyEcho Jun 12, 2023
b91200a
javascript chat update.
SlyEcho Jun 12, 2023
1510337
fix make flags propagation
SlyEcho Jun 12, 2023
fc4264d
api url
SlyEcho Jun 12, 2023
28694f7
add a simple bash script too
SlyEcho Jun 12, 2023
429ed95
move CPPHTTPLIB settings inside server
SlyEcho Jun 12, 2023
f344d09
streaming shell script
SlyEcho Jun 12, 2023
50e7c54
Merge pull request #21 from SlyEcho/server_refactor
digiwombat Jun 12, 2023
fc78910
Merge branch 'ggerganov:master' into master
digiwombat Jun 12, 2023
6d72f0f
Make chat shell script work by piping the content out of the subshell.
digiwombat Jun 12, 2023
9d564db
trim response and trim trailing space in prompt
Jun 13, 2023
9099709
Merge pull request #22 from anon998/bash-trim
digiwombat Jun 13, 2023
b8b8a6e
Add log flush
SlyEcho Jun 13, 2023
6627a02
Allow overriding the server address
SlyEcho Jun 13, 2023
1f39452
remove old verbose variable
Jun 13, 2023
99ef967
add static prefix to the other functions too
Jun 13, 2023
575cf23
remove json_indent variable
Jun 13, 2023
7df316b
fix linter warnings + make variables const
Jun 13, 2023
7a48ade
fix comment indentation
Jun 13, 2023
6075d78
Merge pull request #23 from anon998/fix-linter-warnings
digiwombat Jun 13, 2023
546f850
Update examples/server/server.cpp
SlyEcho Jun 14, 2023
bd81096
fix typo in readme + don't ignore integers
Jun 14, 2023
5e107c2
Merge pull request #24 from anon998/logit-bias
digiwombat Jun 14, 2023
f858cd6
Merge remote-tracking branch 'upstream/master'
digiwombat Jun 14, 2023
aee8595
Update README.md
digiwombat Jun 15, 2023
488c62a
Merge remote-tracking branch 'upstream/master'
digiwombat Jun 15, 2023
fb49c05
Merge branch 'ggerganov:master' into master
digiwombat Jun 16, 2023
1b4b93a
Merge branch 'ggerganov:master' into master
digiwombat Jun 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ models/*
/train-text-from-scratch
/benchmark-matmult
/vdot
/server
/Pipfile
/libllama.so

Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ BUILD_TARGETS = main quantize quantize-stats perplexity embedding vdot train-tex

ifdef LLAMA_BUILD_SERVER
BUILD_TARGETS += server
LLAMA_SERVER_VERBOSE ?= 1
server: private CXXFLAGS += -DSERVER_VERBOSE=$(LLAMA_SERVER_VERBOSE)
endif

default: $(BUILD_TARGETS)
Expand Down
4 changes: 4 additions & 0 deletions examples/server/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
set(TARGET server)
option(LLAMA_SERVER_VERBOSE "Build verbose logging option for Server" ON)
include_directories(${CMAKE_CURRENT_SOURCE_DIR})
add_executable(${TARGET} server.cpp json.hpp httplib.h)
target_compile_definitions(${TARGET} PRIVATE
SERVER_VERBOSE=$<BOOL:${LLAMA_SERVER_VERBOSE}>
)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_11)
if(TARGET BUILD_INFO)
Expand Down
318 changes: 91 additions & 227 deletions examples/server/README.md

Large diffs are not rendered by default.

89 changes: 89 additions & 0 deletions examples/server/chat.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
import * as readline from 'node:readline'
import { stdin, stdout } from 'node:process'

const API_URL = 'http://127.0.0.1:8080'

const chat = [
{
human: "Hello, Assistant.",
assistant: "Hello. How may I help you today?"
},
{
human: "Please tell me the largest city in Europe.",
assistant: "Sure. The largest city in Europe is Moscow, the capital of Russia."
},
]

const instruction = `A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.`

function format_prompt(question) {
return `${instruction}\n${
chat.map(m =>`### Human: ${m.human}\n### Assistant: ${m.assistant}`).join("\n")
}\n### Human: ${question}\n### Assistant:`
}

async function tokenize(content) {
const result = await fetch(`${API_URL}/tokenize`, {
method: 'POST',
body: JSON.stringify({ content })
})

if (!result.ok) {
return []
}

return await result.json().tokens
}

const n_keep = await tokenize(instruction).length

async function chat_completion(question) {
const result = await fetch(`${API_URL}/completion`, {
method: 'POST',
body: JSON.stringify({
prompt: format_prompt(question),
temperature: 0.2,
top_k: 40,
top_p: 0.9,
n_keep: n_keep,
n_predict: 256,
stop: ["\n### Human:"], // stop completion after generating this
stream: true,
})
})

if (!result.ok) {
return
}

let answer = ''

for await (var chunk of result.body) {
const t = Buffer.from(chunk).toString('utf8')
if (t.startsWith('data: ')) {
const message = JSON.parse(t.substring(6))
answer += message.content
process.stdout.write(message.content)
if (message.stop) {
if (message.truncated) {
chat.shift()
}
break
}
}
}

process.stdout.write('\n')
chat.push({ human: question, assistant: answer.trimStart() })
}

const rl = readline.createInterface({ input: stdin, output: stdout });

const readlineQuestion = (rl, query, options) => new Promise((resolve, reject) => {
rl.question(query, options, resolve)
});

while(true) {
const question = await readlineQuestion(rl, '> ')
await chat_completion(question)
}
77 changes: 77 additions & 0 deletions examples/server/chat.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/bin/bash

API_URL="${API_URL:-http://127.0.0.1:8080}"

CHAT=(
"Hello, Assistant."
"Hello. How may I help you today?"
"Please tell me the largest city in Europe."
"Sure. The largest city in Europe is Moscow, the capital of Russia."
)

INSTRUCTION="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions."

trim() {
shopt -s extglob
set -- "${1##+([[:space:]])}"
printf "%s" "${1%%+([[:space:]])}"
}

trim_trailing() {
shopt -s extglob
printf "%s" "${1%%+([[:space:]])}"
}

format_prompt() {
echo -n "${INSTRUCTION}"
printf "\n### Human: %s\n### Assistant: %s" "${CHAT[@]}" "$1"
}

tokenize() {
curl \
--silent \
--request POST \
--url "${API_URL}/tokenize" \
--data-raw "$(jq -ns --arg content "$1" '{content:$content}')" \
| jq '.tokens[]'
}

N_KEEP=$(tokenize "${INSTRUCTION}" | wc -l)

chat_completion() {
PROMPT="$(trim_trailing "$(format_prompt "$1")")"
DATA="$(echo -n "$PROMPT" | jq -Rs --argjson n_keep $N_KEEP '{
prompt: .,
temperature: 0.2,
top_k: 40,
top_p: 0.9,
n_keep: $n_keep,
n_predict: 256,
stop: ["\n### Human:"],
stream: true
}')"

ANSWER=''

while IFS= read -r LINE; do
if [[ $LINE = data:* ]]; then
CONTENT="$(echo "${LINE:5}" | jq -r '.content')"
printf "%s" "${CONTENT}"
ANSWER+="${CONTENT}"
SlyEcho marked this conversation as resolved.
Show resolved Hide resolved
fi
done < <(curl \
--silent \
--no-buffer \
--request POST \
--url "${API_URL}/completion" \
--data-raw "${DATA}")

printf "\n"

CHAT+=("$1" "$(trim "$ANSWER")")
}

while true; do
read -r -e -p "> " QUESTION
chat_completion "${QUESTION}"
done
Loading
Loading