Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json-schema-to-grammar improvements (+ added to server) #5978

Merged
merged 106 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
09248e0
json: fix arrays (disallow `[,1]`)
ochafik Mar 1, 2024
2d9580a
json: support tuple types (`[number, string]`)
ochafik Mar 1, 2024
3c339ce
json: support additionalProperties (`{[k: string]: [string,number][]}`)
ochafik Mar 1, 2024
1daaf30
json: support required / optional properties
ochafik Mar 1, 2024
1428a85
json: add support for pattern
ochafik Mar 1, 2024
12f0d7e
json: resolve $ref (and support https schema urls)
ochafik Mar 1, 2024
ea4244e
json: fix $ref resolution
ochafik Mar 1, 2024
82ade9f
join: support union types (mostly for nullable types I think)
ochafik Mar 1, 2024
f6f851b
json: support allOf + nested anyOf
ochafik Mar 1, 2024
bc0e0d9
json: support any (`{}` or `{type: object}`)
ochafik Mar 1, 2024
148555c
json: fix merge
ochafik Mar 1, 2024
ed24688
json: temp fix for escapes
ochafik Mar 1, 2024
c5bc154
json: spaces in output and unrestricted output spaces
ochafik Mar 1, 2024
5827ff4
json: add typings
ochafik Mar 1, 2024
be13247
Merge remote-tracking branch 'origin/master' into json-fixes
ochafik Mar 3, 2024
06b04e9
json:fix typo
ochafik Mar 3, 2024
21ac451
Create ts-type-to-grammar.sh
ochafik Mar 3, 2024
a78eb4a
json: fix _format_literal (json.dumps already escapes quotes)
ochafik Mar 5, 2024
d5ef412
json: merge lit sequences and handle negatives
ochafik Mar 5, 2024
4e7c26c
json: handle pattern repetitions
ochafik Mar 5, 2024
660e832
Update json-schema-to-grammar.mjs
ochafik Mar 8, 2024
add8fee
Create regex-to-grammar.py
ochafik Mar 10, 2024
1cde8de
json: extract repeated regexp patterns to subrule
ochafik Mar 10, 2024
259f350
Update json-schema-to-grammar.py
ochafik Mar 10, 2024
b061de5
Update json-schema-to-grammar.py
ochafik Mar 10, 2024
ba57964
Update json-schema-to-grammar.py
ochafik Mar 10, 2024
f37ad0a
json: handle schema from pydantic Optional fields
ochafik Mar 10, 2024
307110a
Update json-schema-to-grammar.py
ochafik Mar 10, 2024
ee492c9
Merge remote-tracking branch 'origin/master' into json-fixes
ochafik Mar 10, 2024
5764d9f
Update json-schema-to-grammar.py
ochafik Mar 10, 2024
364bf9e
Update ts-type-to-grammar.sh
ochafik Mar 10, 2024
8597caa
Update ts-type-to-grammar.sh
ochafik Mar 10, 2024
dab2ea9
json: simplify nullable fields handling
ochafik Mar 10, 2024
ade339d
json: accept duplicate identical rules
ochafik Mar 10, 2024
e8b78c2
json: revert space to 1 at most
ochafik Mar 10, 2024
37b59d1
json: reuse regexp pattern subrules
ochafik Mar 10, 2024
e8f25d6
json: handle uuid string format
ochafik Mar 10, 2024
54291e1
json: fix literal escapes
ochafik Mar 10, 2024
f57b467
json: add --allow-fetch
ochafik Mar 10, 2024
d1fda6f
json: simplify range escapes
ochafik Mar 10, 2024
478f62e
json: support negative ranges in patterns
ochafik Mar 10, 2024
27b1fef
Delete commit.txt
ochafik Mar 10, 2024
0e94941
json: custom regex parser, adds dot support & JS-portable
ochafik Mar 11, 2024
11813a6
json: rm trailing spaces
ochafik Mar 11, 2024
5389820
Update json-schema-to-grammar.mjs
ochafik Mar 11, 2024
4e2d06c
json: updated server & chat `( cd examples/server && ./deps.sh )`
ochafik Mar 11, 2024
c8254e5
json: port fixes from mjs to python
ochafik Mar 11, 2024
56b8744
Update ts-type-to-grammar.sh
ochafik Mar 11, 2024
d736e92
json: support prefixItems alongside array items
ochafik Mar 11, 2024
9a61802
json: add date format + fix uuid
ochafik Mar 11, 2024
e1ed7a0
json: add date, time, date-time formats
ochafik Mar 11, 2024
b816734
json: preserve order of props from TS defs
ochafik Mar 11, 2024
d0dd75c
json: port schema converter to C++, wire in ./server
ochafik Mar 11, 2024
51ca7cb
json: nits
ochafik Mar 11, 2024
cb364ef
Merge branch 'json-fixes' into json-fixes-cpp
ochafik Mar 11, 2024
d934adc
Update json-schema-to-grammar.cpp
ochafik Mar 11, 2024
8caaf16
Update json-schema-to-grammar.cpp
ochafik Mar 12, 2024
8fee84b
Update json-schema-to-grammar.cpp
ochafik Mar 12, 2024
0be059d
json: fix mjs implementation + align outputs
ochafik Mar 12, 2024
a740bfa
Update json-schema-to-grammar.mjs.hpp
ochafik Mar 12, 2024
192a58a
json: test C++, JS & Python versions
ochafik Mar 12, 2024
7e1440c
Merge branch 'json-fixes-cpp' into json-fixes
ochafik Mar 12, 2024
917b5d2
json: nits + regen deps
ochafik Mar 12, 2024
ee6166a
json: cleanup test
ochafik Mar 12, 2024
6165c55
json: revert from c++17 to 11
ochafik Mar 12, 2024
bed826f
json: nit fixes
ochafik Mar 12, 2024
59c899d
json: dirty include for test
ochafik Mar 12, 2024
3feac66
Merge remote-tracking branch 'origin/master' into json-fixes
ochafik Mar 14, 2024
f216550
json: fix zig build
ochafik Mar 14, 2024
5a7deb2
json: pass static command to std::system in tests (fixed temp files)
ochafik Mar 15, 2024
3b3ad94
json: fix top-level $refs
ochafik Mar 15, 2024
235ff68
json: don't use c++20 designated initializers
ochafik Mar 15, 2024
daceced
nit
ochafik Mar 15, 2024
5714487
json: basic support for reserved names `{number:{number:{root:number}}}`
ochafik Mar 15, 2024
af31aa2
Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
ochafik Mar 16, 2024
842eb83
json: re-ran server deps.sh
ochafik Mar 16, 2024
5602a8b
Merge remote-tracking branch 'origin/master' into json-fixes
ochafik Mar 16, 2024
f30d6c2
json: simplify test
ochafik Mar 16, 2024
391b17e
json: support mix of additional props & required/optional
ochafik Mar 16, 2024
64799ba
json: add tests for some expected failures
ochafik Mar 17, 2024
5c50ffa
json: fix type=const in c++, add failure expectations for non-str con…
ochafik Mar 17, 2024
84e383c
json: test (& simplify output of) empty schema
ochafik Mar 17, 2024
3e1bf44
json: check parsing in test + fix value & string refs
ochafik Mar 17, 2024
edbd2e9
json: add server tests for OAI JSON response_format
ochafik Mar 17, 2024
20869ed
Merge remote-tracking branch 'origin/master' into json-fixes
ochafik Mar 17, 2024
6182478
json: test/fix top-level anyOf
ochafik Mar 18, 2024
bbd7080
json: improve grammar parsing failures
ochafik Mar 18, 2024
dd922a4
json: test/fix additional props corner cases
ochafik Mar 18, 2024
24f0b94
json: fix string patterns (was missing quotes)
ochafik Mar 18, 2024
bd96df4
json: ws nit
ochafik Mar 18, 2024
05fd7e3
json: fix json handling in server when there's no response_format
ochafik Mar 18, 2024
e7de643
json: catch schema conversion errors in server
ochafik Mar 19, 2024
02e3bde
json: don't complain about unknown format type in server if unset
ochafik Mar 19, 2024
263a86e
json: cleaner build of test
ochafik Mar 19, 2024
874599e
json: create examples/json-schema-pydantic-example.py
ochafik Mar 19, 2024
7fc759b
json: fix date pattern
ochafik Mar 19, 2024
7628bd8
json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
ochafik Mar 20, 2024
10ee30f
json: indent 4 spaces
ochafik Mar 20, 2024
6dcf856
Merge remote-tracking branch 'origin/master' into json-fixes
ochafik Mar 20, 2024
df00efb
json: fix naming of top-level c++ function (+ drop unused one)
ochafik Mar 20, 2024
d0600d9
json: avoid using namespace std
ochafik Mar 20, 2024
9260350
json: fix zig build
ochafik Mar 20, 2024
b8c0025
Update server.feature
ochafik Mar 20, 2024
ad6c475
json: iostream -> fprintf
ochafik Mar 21, 2024
c26e7b8
json: space before & refs for consistency
ochafik Mar 21, 2024
4c46aec
json: nits
ochafik Mar 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 12 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ TEST_TARGETS = \
tests/test-llama-grammar tests/test-grammar-parser tests/test-double-float tests/test-grad0 tests/test-opt \
tests/test-quantize-fns tests/test-quantize-perf tests/test-sampling tests/test-tokenizer-0-llama \
tests/test-tokenizer-0-falcon tests/test-tokenizer-1-llama tests/test-tokenizer-1-bpe tests/test-rope \
tests/test-backend-ops tests/test-model-load-cancel tests/test-autorelease
tests/test-backend-ops tests/test-model-load-cancel tests/test-autorelease \
tests/test-json-schema-to-grammar

# Code coverage output files
COV_TARGETS = *.gcno tests/*.gcno *.gcda tests/*.gcda *.gcov tests/*.gcov lcov-report gcovr-report
Expand Down Expand Up @@ -653,6 +654,11 @@ console.o: common/console.cpp common/console.h
grammar-parser.o: common/grammar-parser.cpp common/grammar-parser.h
$(CXX) $(CXXFLAGS) -c $< -o $@

json-schema-to-grammar.o: examples/server/json-schema-to-grammar.cpp examples/server/json-schema-to-grammar.h
$(CXX) $(CXXFLAGS) -c $< -o $@
# $(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<) -DLLAMA_BUILD_JSON_SCHEMA_CONVERTER=1
# $(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

train.o: common/train.cpp common/train.h
$(CXX) $(CXXFLAGS) -c $< -o $@

Expand Down Expand Up @@ -728,7 +734,7 @@ save-load-state: examples/save-load-state/save-load-state.cpp ggml.o llama.o $(C
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

server: examples/server/server.cpp examples/server/utils.hpp examples/server/httplib.h examples/server/json.hpp examples/server/index.html.hpp examples/server/index.js.hpp examples/server/completion.js.hpp common/stb_image.h ggml.o llama.o $(COMMON_DEPS) grammar-parser.o $(OBJS)
server: examples/server/server.cpp examples/server/utils.hpp examples/server/httplib.h examples/server/json.hpp examples/server/index.html.hpp examples/server/index.js.hpp examples/server/completion.js.hpp json-schema-to-grammar.o common/stb_image.h ggml.o llama.o $(COMMON_DEPS) grammar-parser.o $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h %.hpp $<,$^) -Iexamples/server $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS) $(LWINSOCK2)

Expand Down Expand Up @@ -844,6 +850,10 @@ tests/test-double-float: tests/test-double-float.cpp ggml.o $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

tests/test-json-schema-to-grammar: tests/test-json-schema-to-grammar.cpp
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

tests/test-grad0: tests/test-grad0.cpp ggml.o $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
Expand Down
482 changes: 432 additions & 50 deletions examples/json-schema-to-grammar.py

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions examples/regex-to-grammar.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import json, subprocess, sys, os

assert len(sys.argv) >= 2
[_, pattern, *rest] = sys.argv

print(subprocess.check_output(
[
"python",
os.path.join(
os.path.dirname(os.path.realpath(__file__)),
"json-schema-to-grammar.py"),
*rest,
"-",
],
text=True,
input=json.dumps({
"type": "string",
"pattern": pattern,
}, indent=2)))
9 changes: 8 additions & 1 deletion examples/server/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@ set(TARGET server)
option(LLAMA_SERVER_VERBOSE "Build verbose logging option for Server" ON)
option(LLAMA_SERVER_SSL "Build SSL support for the server" OFF)
include_directories(${CMAKE_CURRENT_SOURCE_DIR})
add_executable(${TARGET} server.cpp utils.hpp json.hpp httplib.h)
add_executable(${TARGET}
server.cpp
utils.hpp
json.hpp
httplib.h
json-schema-to-grammar.cpp
json-schema-to-grammar.h
)
install(TARGETS ${TARGET} RUNTIME)
target_compile_definitions(${TARGET} PRIVATE
SERVER_VERBOSE=$<BOOL:${LLAMA_SERVER_VERBOSE}>
Expand Down
5 changes: 3 additions & 2 deletions examples/server/chat.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ const propOrder = grammarJsonSchemaPropOrder

let grammar = null
if (grammarJsonSchemaFile) {
const schema = JSON.parse(readFileSync(grammarJsonSchemaFile, 'utf-8'))
const converter = new SchemaConverter(propOrder)
let schema = JSON.parse(readFileSync(grammarJsonSchemaFile, 'utf-8'))
const converter = new SchemaConverter({prop_order: propOrder, allow_fetch: true})
schema = await converter.resolveRefs(schema, grammarJsonSchemaFile)
converter.visit(schema, '')
grammar = converter.formatGrammar()
}
Expand Down
4,532 changes: 2,281 additions & 2,251 deletions examples/server/index.html.hpp

Large diffs are not rendered by default.