common: llama_load_model_from_url using --model-url #6098

phymbert · 2024-03-16T11:35:46Z

Motivation

Since GGUF is officially supported on huggingface it can be useful to start the cli directly from a http url.

We considered this at some point #4735, but it opens a lot of security concerns due to the core library having to execute commands.
So we settled with #5501 which should be good enough for most cases, but when there is no shell available it cannot work (example docker distroless container).

Changes

llama_load_model_from_url to first download the file if it does not exist locally or the remote is newer, then call llama_load_model_from_file in common
introduce --model-url which will trigger the download to --model in llama_init_from_gpt_params
enable this feature if -DLLAMA_CURL=ON in cmake and make toolchains
server tests using this parameter in embeddings.feature

Attention points

The common will be dynamic linked to libcurl.
It can later be moved to llama API if there is positive feedback from the community.
it saved ${model_path}.etag or ${model_path}.lastModified files along with the model_path
It's not possible to pass a custom CA, system CAs are always used in this first implementation on both posix and windows platform

Task

download is triggered only if the file has changed based on etag and last-modified http headers
CI build is passing
Server CI build is passing

References

… url using libopenssl only

phymbert · 2024-03-16T12:13:41Z

@ggerganov Georgi, if you approve the proposal with libcurl dependency I can continue further to support Etag headers to avoid downloading the file each time.

ggerganov

Yes, this can work. The curl dependency is optional, the implementation is isolated in common. Looks great

common/common.cpp

Artefact2 · 2024-03-16T12:21:00Z

Supporting split GGUF files (for models above 50GB) would be nice too. Ideally you'd merge files with copy_file_range() when available to avoid extra disk I/O.

phymbert · 2024-03-16T12:22:35Z

Supporting split GGUF files (for models above 50GB) would be nice too. Ideally you'd merge files with copy_file_range() when available to avoid extra disk I/O.

Yes I am working on this in a separate branch:

GGUF huge files size / Split possible ? #6000

phymbert · 2024-03-16T12:47:14Z

Someone can help with the error in the windows build ?

D:\a\llama.cpp\llama.cpp\common\common.cpp(906,52): error C1061: compiler limit: blocks nested too deeply [D:\a\llama.cpp\llama.cpp\build\common\common.vcxproj]

Also @ggerganov please double check my Makefile changes as I am not used to.

…ault cmake

dranger003 · 2024-03-16T14:36:55Z

Someone can help with the error in the windows build ?
D:\a\llama.cpp\llama.cpp\common\common.cpp(906,52): error C1061: compiler limit: blocks nested too deeply [D:\a\llama.cpp\llama.cpp\build\common\common.vcxproj]
Also @ggerganov please double check my Makefile changes as I am not used to.

I submitted a PR #6101 to fix this issue.
If you are in a hurry, you can try this workaround #6096

Makefile

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

phymbert · 2024-03-16T14:44:57Z

I submitted a PR #6101 to fix this issue. If you are in a hurry, you can try this workaround #6096

Thanks a lot, no let's wait for your PR to be merged first.

…n etag and last-modified http headers

# Conflicts: # common/common.cpp

examples/server/README.md

common/common.cpp

examples/main/README.md

…o be coherent with the make toolchain

phymbert · 2024-03-17T01:31:37Z

Need to fix the windows server CI tests, it's not passing at the moment

…RLSSLOPT_NATIVE_CA

…e global curl function, use a write callback.

…ding

phymbert · 2024-03-17T17:51:37Z

It finally works on both windows and linux CI tests. Happy to merge it when CI passes

phymbert · 2024-03-17T17:53:52Z

@ggerganov Georgi, I suffered a little on Windows, but this implementation works on both platform. Would you please do another review as I changed a little bit the logic - and sorry but still need to find a good linter for CLion.

cebtenzzre · 2024-03-18T17:00:09Z

examples/server/tests/features/environment.py

+    try:
+        psutil.Process(pid).kill()
+    except psutil.NoSuchProcess:
+        return False
+    return True


@phymbert This calls TerminateProcess(handle, SIGTERM) on Windows and os.kill(pid, signal.SIGKILL) on Unix. os.kill(pid, signal.SIGTERM) also calls TerminateProcess on Windows. psutil really seems like overkill here.

Again,feel free to issue a PR

* common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

phymbert added 2 commits March 16, 2024 09:59

common: introduce llama_load_model_from_url to download model from hf…

3221ab0

… url using libopenssl only

common: llama_load_model_from_url witch to libcurl dependency

a0ebdfc

phymbert added demo Demonstrate some concept or idea, not intended to be merged need feedback Testing and feedback with results are needed labels Mar 16, 2024

phymbert requested a review from ggerganov March 16, 2024 11:35

ggerganov approved these changes Mar 16, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

phymbert added 2 commits March 16, 2024 13:27

common: PR feedback, rename the definition to LLAMA_USE_CURL

42b25da

common: LLAMA_USE_CURL in make toolchain

7e78285

phymbert added 2 commits March 16, 2024 13:52

ci: compile the server with curl, add make option curl example in def…

df0d822

…ault cmake

llama_load_model_from_url: try to make the windows build passing

80bec98

ggerganov reviewed Mar 16, 2024

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Update Makefile

2c3a00e

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

phymbert added 4 commits March 16, 2024 16:27

llama_load_model_from_url: typo

4135d4a

llama_load_model_from_url: download the file only if modified based o…

5d99f32

…n etag and last-modified http headers

ci: build, fix the default build to use LLAMA_CURL

921e4af

llama_load_model_from_url: cleanup code

6633689

phymbert changed the title ~~proposal: common: llama_load_model_from_url~~ common: llama_load_model_from_url Mar 16, 2024

phymbert requested review from ggerganov and ngxson March 16, 2024 15:52

Merge branch 'master' into hp/download-model-from-hf

1430e89

# Conflicts: # common/common.cpp

phymbert removed demo Demonstrate some concept or idea, not intended to be merged need feedback Testing and feedback with results are needed labels Mar 16, 2024

phymbert changed the title ~~common: llama_load_model_from_url~~ common: llama_load_model_from_url using --model-url Mar 16, 2024

ggerganov approved these changes Mar 16, 2024

View reviewed changes

phymbert added 10 commits March 16, 2024 21:59

build: introduce cmake option LLAMA_CURL to trigger libcurl linking t…

d81acb6

…o be coherent with the make toolchain

build: move the make build with env LLAMA_CURL to a dedicated place

dbd9691

llama_load_model_from_url: minor spacing and log message changes

9da4eec

ci: build: fix ubuntu-focal-make-curl

89d3483

ci: build: try to fix the windows build

13d8817

common: remove old dependency to openssl

1ddaf71

common: fix build

73b4b44

common: fix windows build

a3ed3d4

common: fix windows tests

5e66ec8

common: fix windows tests

9ca4acc

phymbert added 8 commits March 17, 2024 09:35

common: llama_load_model_from_url windows set CURLOPT_SSL_OPTIONS, CU…

c1b002e

…RLSSLOPT_NATIVE_CA

ci: tests: print server logs in case of scenario failure

cff7faa

common: llama_load_model_from_url: make it working on windows: disabl…

4fe431d

…e global curl function, use a write callback.

ci: tests: increase timeout for windows

47a9e5d

common: fix typo

31272c6

common: llama_load_model_from_url use a temporary file for downloading

f902ab6

common: llama_load_model_from_url delete previous file before downloa…

b24f30f

…ding

ci: tests: fix behavior on windows

fcf327f

ggerganov approved these changes Mar 17, 2024

View reviewed changes

phymbert merged commit d01b3c4 into master Mar 17, 2024
54 of 63 checks passed

phymbert deleted the hp/download-model-from-hf branch March 17, 2024 18:12

phymbert mentioned this pull request Mar 18, 2024

gguf-split: split and merge gguf per batch of tensors #6135

Merged

cebtenzzre reviewed Mar 18, 2024

View reviewed changes

cebtenzzre mentioned this pull request Mar 18, 2024

server tests : more pythonic process management; fix bare except: #6146

Merged

ggerganov mentioned this pull request Mar 22, 2024

common : add HF arg helpers #6234

Merged

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

common: llama_load_model_from_url using --model-url (ggerganov#6098)

68fb511

* common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024

common: llama_load_model_from_url using --model-url (ggerganov#6098)

f68d6a2

* common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common: llama_load_model_from_url using --model-url #6098

common: llama_load_model_from_url using --model-url #6098

phymbert commented Mar 16, 2024 •

edited

phymbert commented Mar 16, 2024

ggerganov left a comment

Artefact2 commented Mar 16, 2024

phymbert commented Mar 16, 2024

phymbert commented Mar 16, 2024

dranger003 commented Mar 16, 2024

phymbert commented Mar 16, 2024

phymbert commented Mar 17, 2024

phymbert commented Mar 17, 2024

phymbert commented Mar 17, 2024

cebtenzzre Mar 18, 2024 •

edited

phymbert Mar 18, 2024

common: llama_load_model_from_url using --model-url #6098

common: llama_load_model_from_url using --model-url #6098

Conversation

phymbert commented Mar 16, 2024 • edited

Motivation

Changes

Attention points

Task

References

phymbert commented Mar 16, 2024

ggerganov left a comment

Choose a reason for hiding this comment

Artefact2 commented Mar 16, 2024

phymbert commented Mar 16, 2024

phymbert commented Mar 16, 2024

dranger003 commented Mar 16, 2024

phymbert commented Mar 16, 2024

phymbert commented Mar 17, 2024

phymbert commented Mar 17, 2024

phymbert commented Mar 17, 2024

cebtenzzre Mar 18, 2024 • edited

Choose a reason for hiding this comment

phymbert Mar 18, 2024

Choose a reason for hiding this comment

phymbert commented Mar 16, 2024 •

edited

cebtenzzre Mar 18, 2024 •

edited