Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance autogptq backend to support VL models #1860

Merged
merged 7 commits into from
Mar 26, 2024
Merged

Conversation

thiner
Copy link
Contributor

@thiner thiner commented Mar 19, 2024

Description

Due to prompt template incompatible with current prompt processing flow, I removed support for internlm/internlm-xcomposer2-vl-7b-4bit in this PR. I may add the support in a separate PR.

This PR fixes #1812, and enhanced AutoGPTQ to support Qwen/Qwen-VL-Chat-Int4 and internlm/internlm-xcomposer2-vl-7b-4bit models.
The AutoGPTQ version is updated to v0.7.1, and enables use_marlin flag for internlm/internlm-xcomposer2-vl-7b-4bit by default. Please make sure you have the correct model in place. Ref: https://github.com/IST-DASLab/marlin

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

Copy link

netlify bot commented Mar 19, 2024

Deploy Preview for localai canceled.

Name Link
🔨 Latest commit afa16d3
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/6602b29e2761660008b670c6

@thiner
Copy link
Contributor Author

thiner commented Mar 19, 2024

I've no idea what is the difference between autogptq.yml and transformers-*.yml in common-env folder. If autogptq shares the same dependencies with transformers, does that mean no need to create a separate autogptq conda env?

Copy link
Contributor Author

@thiner thiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just try to remove the "pending" status.

backend/python/autogptq/autogptq.yml Show resolved Hide resolved
@thiner
Copy link
Contributor Author

thiner commented Mar 20, 2024

How to retrigger the test build?

@thiner
Copy link
Contributor Author

thiner commented Mar 20, 2024

@mudler Below code snippet shows that images in payload are converted to base64 StringImage. I think that's the reason why calling the models kept failing.

if pp.Type == "image_url" {
	// Detect if pp.ImageURL is an URL, if it is download the image and encode it in base64:
	base64, err := getBase64Image(pp.ImageURL.URL)
	if err == nil {
		input.Messages[i].StringImages = append(input.Messages[i].StringImages, base64) // TODO: make sure that we only return base64 stuff
		// set a placeholder for each image
		input.Messages[i].StringContent = fmt.Sprintf("[img-%d]", index) + input.Messages[i].StringContent
		index++
}

I found a relevant question in Qwen-VL issues, this answer QwenLM/Qwen-VL#112 (comment) reveals that qwen-vl doesn't support base64 as image input. It supports only URL or image file path.
To solve the problem, I think we have two options:

  1. Refactor request.go, allow pass image_url to template directly. This solution requires updating the template evaluation engine, allow reference the image_url and text in the API payload.
  2. Keep request.go logic as is, convert the base64 string into image in autogptq.py, then save it to a temporary folder instead. Then we can pass the image file path to the model. But the question is how to get the StringImage value in autogptq.py?

Please share your thoughts.

@mudler
Copy link
Owner

mudler commented Mar 21, 2024

@mudler Below code snippet shows that images in payload are converted to base64 StringImage. I think that's the reason why calling the models kept failing.

if pp.Type == "image_url" {
	// Detect if pp.ImageURL is an URL, if it is download the image and encode it in base64:
	base64, err := getBase64Image(pp.ImageURL.URL)
	if err == nil {
		input.Messages[i].StringImages = append(input.Messages[i].StringImages, base64) // TODO: make sure that we only return base64 stuff
		// set a placeholder for each image
		input.Messages[i].StringContent = fmt.Sprintf("[img-%d]", index) + input.Messages[i].StringContent
		index++
}

I found a relevant question in Qwen-VL issues, this answer QwenLM/Qwen-VL#112 (comment) reveals that qwen-vl doesn't support base64 as image input. It supports only URL or image file path. To solve the problem, I think we have two options:

1. Refactor `request.go`, allow pass `image_url` to template directly. This solution requires updating the template evaluation engine, allow reference the `image_url` and `text` in the API payload.

2. Keep `request.go` logic as is, convert the base64 string into image in `autogptq.py`, then save it to a temporary folder instead. Then we can pass the image file path to the model. But the question is how to get the StringImage value in `autogptq.py`?

Please share your thoughts.

The option 1 would be what I'd try first, however might be more complex to achieve. You need to be careful to keep a consistent logic between the golang API and the llama.cpp grpc-server. If you are not much into put your hands on C++ that might be more complex to explore

The option 2 sounds more easy to do. In the llama.cpp backend we do:

    //
    for (int i = 0; i < predict->images_size(); i++) {
        data["image_data"].push_back(json
            {
                {"id", i},
                {"data",    predict->images(i)},
            });
    }

because Images inside the request is an array, and each entry contains the base64 encoded images (see here for the golang counter-part). Each [img-0], [img-1] is the index used by the backend to retrieve the image

@mudler mudler added the enhancement New feature or request label Mar 25, 2024
@thiner
Copy link
Contributor Author

thiner commented Mar 26, 2024

@mudler I have finally implemented this feature for Qwen-vl model. Please review.

I tested this feature with below configuration.

  1. Model: Qwen/Qwen-VL-Chat-Int4
  2. The model configuration qwen-vl.yaml
    # Model name.
  # The model name is used to identify the model in the API calls.
- name: gpt-4-vision-preview
  # Default model parameters.
  # These options can also be specified in the API calls
  parameters:
    model: /opt/models/qwen-vl-chat-int4
    temperature: 0.7
    top_k: 85
    top_p: 0.7

  # Default context size
  context_size: 4096
  # Default number of threads
  threads: 16
  backend: autogptq

  # define chat roles
  roles:
    user: "user:"
    assistant: "assistant:"
    system: "system:"
  template:
    chat: &template |
      {{.Input}}
    completion: *template
  # Enable F16 if backend supports it
  f16: true
  embeddings: false
  # Enable debugging
  debug: true

  # GPU Layers (only used when built with cublas)
  gpu_layers: -1

  # Diffusers/transformers
  cuda: true
  1. The gpt-vision request:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d \
'{"model": "gpt-4-vision-preview", "messages": [{"role": "user", \
"content": [{"type":"text", "text": "Describe the image?"}, {"type": "image_url", "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg" }}], "temperature": 0.2}]}'
  1. The output:
{
    "created": 1711430450,
    "object": "chat.completion",
    "id": "23add6fd-009e-43b4-a361-341cb2386672",
    "model": "gpt-4-vision-preview",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "A woman sitting on a beach with a dog and giving it a high five."
            }
        }
    ],
    "usage": {
        "prompt_tokens": 0,
        "completion_tokens": 0,
        "total_tokens": 0
    }
}

Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good here, thanks @thiner !

@mudler mudler merged commit b7ffe66 into mudler:master Mar 26, 2024
6 of 21 checks passed
@mudler mudler mentioned this pull request Mar 26, 2024
1 task
@thiner
Copy link
Contributor Author

thiner commented Apr 3, 2024

@mudler I just found that this feature doesn't work if there are multiple rounds conversation. Below is the error message:

 [10.70.2.129]:50410 200 - GET /readyz

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr generated_text: Instruct: user:g'day

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr assistant:G'day! How can I assist you today?

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr 

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr user:who are you

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr assistant:I am Buddy, your AI assistant, designed to provide assistance, answer questions, and engage in conversations on a wide range of topics. How can I help you today?

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr 

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr user:<img>/tmp/vl-1712132673400.jpg</img>,describe the image

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr Output:

 8:24AM DBG GRPC(/opt/models/llava/qwen-vl-chat-int4-127.0.0.1:37607): stderr A large building with a lot of windows next to a body of water.

 Error rpc error: code = Unknown desc = Exception iterating responses: 'Result' object is not an iterator

This relevant code snippet:

t = pipeline(compiled_prompt)[0]["generated_text"]
        print(f"generated_text: {t}", file=sys.stderr)

        if compiled_prompt in t:
            t = t.replace(compiled_prompt, "")

        # house keeping. Remove the image files from /tmp folder
        for img_path in prompt_images[1]:
            try:
                os.remove(img_path)
            except Exception as e:
                print(f"Error removing image file: {img_path}, {e}", file=sys.stderr)

        return backend_pb2.Result(message=bytes(t, encoding='utf-8'))

According to the code structure and the error message, I suspect the error was thrown from the return statement. Maybe something wrong while pack the result in protobuff?

truecharts-admin added a commit to truecharts/charts that referenced this pull request Apr 9, 2024
…2.1 by renovate (#20490)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.11.0-cublas-cuda11-ffmpeg-core` ->
`v2.12.1-cublas-cuda11-ffmpeg-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.11.0-cublas-cuda11-core` -> `v2.12.1-cublas-cuda11-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.11.0-cublas-cuda12-ffmpeg-core` ->
`v2.12.1-cublas-cuda12-ffmpeg-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.11.0-cublas-cuda12-core` -> `v2.12.1-cublas-cuda12-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.11.0-ffmpeg-core` -> `v2.12.1-ffmpeg-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.11.0` -> `v2.12.1` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>mudler/LocalAI (docker.io/localai/localai)</summary>

###
[`v2.12.1`](https://togithub.com/mudler/LocalAI/releases/tag/v2.12.1)

[Compare
Source](https://togithub.com/mudler/LocalAI/compare/v2.12.0...v2.12.1)

I'm happy to announce the v2.12.1 LocalAI release is out!

##### 🌠  Landing page and Swagger

Ever wondered what to do after LocalAI is up and running? Integration
with a simple web interface has been started, and you can see now a
landing page when hitting the LocalAI front page:

![Screenshot from 2024-04-07
14-43-26](https://togithub.com/mudler/LocalAI/assets/2420543/e7aea8de-4385-45ae-b52e-db8154495493)

You can also now enjoy Swagger to try out the API calls directly:


![swagger](https://togithub.com/mudler/LocalAI/assets/2420543/6405ab11-2908-45ff-b635-38e4456251d6)

##### 🌈 AIO images changes

Now the default model for CPU images is
https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF -
pre-configured for functions and tools API support!
If you are an Intel-GPU owner, the Intel profile for AIO images is now
available too!

##### 🚀  OpenVINO and transformers enhancements

Now there is support for OpenVINO and transformers got token streaming
support now thanks to [@&#8203;fakezeta](https://togithub.com/fakezeta)!

To try OpenVINO, you can use the example available in the documentation:
https://localai.io/features/text-generation/#examples

##### 🎈 Lot of small improvements behind the scenes!

Thanks for our outstanding community, we have enhanced several areas:

- The build time of LocalAI was speed up significantly! thanks to
[@&#8203;cryptk](https://togithub.com/cryptk) for the efforts in
enhancing the build system
- [@&#8203;thiner](https://togithub.com/thiner) worked hardly to get
Vision support for AutoGPTQ
- ... and much more! see down below for a full list, be sure to star
LocalAI and give it a try!

##### 📣 Spread the word!

First off, a massive thank you (again!) to each and every one of you
who've chipped in to squash bugs and suggest cool new features for
LocalAI. Your help, kind words, and brilliant ideas are truly
appreciated - more than words can say!

And to those of you who've been heros, giving up your own time to help
out fellow users on Discord and in our repo, you're absolutely amazing.
We couldn't have asked for a better community.

Just so you know, LocalAI doesn't have the luxury of big corporate
sponsors behind it. It's all us, folks. So, if you've found value in
what we're building together and want to keep the momentum going,
consider showing your support. A little shoutout on your favorite social
platforms using @&#8203;LocalAI_OSS and @&#8203;mudler_it or joining our
sponsors can make a big difference.

Also, if you haven't yet joined our Discord, come on over! Here's the
link: https://discord.gg/uJAeKSAGDy

Every bit of support, every mention, and every star adds up and helps us
keep this ship sailing. Let's keep making LocalAI awesome together!

Thanks a ton, and here's to more exciting times ahead with LocalAI!

##### What's Changed

##### Bug fixes 🐛

- fix: downgrade torch by [@&#8203;mudler](https://togithub.com/mudler)
in
[mudler/LocalAI#1902
- fix(aio): correctly detect intel systems by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1931
- fix(swagger): do not specify a host by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1930
- fix(tools): correctly render tools response in templates by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1932
- fix(grammar): respect JSONmode and grammar from user input by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1935
- fix(hermes-2-pro-mistral): add stopword for toolcall by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1939
- fix(functions): respect when selected from string by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1940
- fix: use exec in entrypoint scripts to fix signal handling by
[@&#8203;cryptk](https://togithub.com/cryptk) in
[mudler/LocalAI#1943
- fix(hermes-2-pro-mistral): correct stopwords by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1947
- fix(welcome): stable model list by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1949
- fix(ci): manually tag latest images by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1948
- fix(seed): generate random seed per-request if -1 is set by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1952
- fix regression
[#&#8203;1971](https://togithub.com/mudler/LocalAI/issues/1971) by
[@&#8203;fakezeta](https://togithub.com/fakezeta) in
[mudler/LocalAI#1972

##### Exciting New Features 🎉

- feat(aio): add intel profile by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1901
- Enhance autogptq backend to support VL models by
[@&#8203;thiner](https://togithub.com/thiner) in
[mudler/LocalAI#1860
- feat(assistant): Assistant and AssistantFiles api by
[@&#8203;christ66](https://togithub.com/christ66) in
[mudler/LocalAI#1803
- feat: Openvino runtime for transformer backend and streaming support
for Openvino and CUDA by
[@&#8203;fakezeta](https://togithub.com/fakezeta) in
[mudler/LocalAI#1892
- feat: Token Stream support for Transformer, fix: missing package for
OpenVINO by [@&#8203;fakezeta](https://togithub.com/fakezeta) in
[mudler/LocalAI#1908
- feat(welcome): add simple welcome page by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1912
- fix(build): better CI logging and correct some build failure modes in
Makefile by [@&#8203;cryptk](https://togithub.com/cryptk) in
[mudler/LocalAI#1899
- feat(webui): add partials, show backends associated to models by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1922
- feat(swagger): Add swagger API doc by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1926
- feat(build): adjust number of parallel make jobs by
[@&#8203;cryptk](https://togithub.com/cryptk) in
[mudler/LocalAI#1915
- feat(swagger): update by [@&#8203;mudler](https://togithub.com/mudler)
in
[mudler/LocalAI#1929
- feat: first pass at improving logging by
[@&#8203;cryptk](https://togithub.com/cryptk) in
[mudler/LocalAI#1956
- fix(llama.cpp): set better defaults for llama.cpp by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1961

##### 📖 Documentation and examples

- docs(aio-usage): update docs to show examples by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1921

##### 👒 Dependencies

- ⬆️ Update docs version mudler/LocalAI by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1903
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1904
- ⬆️ Update M0Rf30/go-tiny-dream by
[@&#8203;M0Rf30](https://togithub.com/M0Rf30) in
[mudler/LocalAI#1911
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1913
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1914
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1923
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1924
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1928
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1933
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1934
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1937
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1941
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1953
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1958
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1959
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1964

##### Other Changes

- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1927
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1960
- fix(hermes-2-pro-mistral): correct dashes in template to suppress
newlines by [@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1966
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1969
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1970
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1973

##### New Contributors

- [@&#8203;thiner](https://togithub.com/thiner) made their first
contribution in
[mudler/LocalAI#1860

**Full Changelog**:
mudler/LocalAI@v2.11.0...v2.12.1

###
[`v2.12.0`](https://togithub.com/mudler/LocalAI/releases/tag/v2.12.0)

[Compare
Source](https://togithub.com/mudler/LocalAI/compare/v2.11.0...v2.12.0)

I'm happy to announce the v2.12.0 LocalAI release is out!

##### 🌠  Landing page and Swagger

Ever wondered what to do after LocalAI is up and running? Integration
with a simple web interface has been started, and you can see now a
landing page when hitting the LocalAI front page:

![Screenshot from 2024-04-07
14-43-26](https://togithub.com/mudler/LocalAI/assets/2420543/e7aea8de-4385-45ae-b52e-db8154495493)

You can also now enjoy Swagger to try out the API calls directly:


![swagger](https://togithub.com/mudler/LocalAI/assets/2420543/6405ab11-2908-45ff-b635-38e4456251d6)

##### 🌈 AIO images changes

Now the default model for CPU images is
https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF -
pre-configured for functions and tools API support!
If you are an Intel-GPU owner, the Intel profile for AIO images is now
available too!

##### 🚀  OpenVINO and transformers enhancements

Now there is support for OpenVINO and transformers got token streaming
support now thanks to [@&#8203;fakezeta](https://togithub.com/fakezeta)!

To try OpenVINO, you can use the example available in the documentation:
https://localai.io/features/text-generation/#examples

##### 🎈 Lot of small improvements behind the scenes!

Thanks for our outstanding community, we have enhanced several areas:

- The build time of LocalAI was speed up significantly! thanks to
[@&#8203;cryptk](https://togithub.com/cryptk) for the efforts in
enhancing the build system
- [@&#8203;thiner](https://togithub.com/thiner) worked hardly to get
Vision support for AutoGPTQ
- ... and much more! see down below for a full list, be sure to star
LocalAI and give it a try!

##### 📣 Spread the word!

First off, a massive thank you (again!) to each and every one of you
who've chipped in to squash bugs and suggest cool new features for
LocalAI. Your help, kind words, and brilliant ideas are truly
appreciated - more than words can say!

And to those of you who've been heros, giving up your own time to help
out fellow users on Discord and in our repo, you're absolutely amazing.
We couldn't have asked for a better community.

Just so you know, LocalAI doesn't have the luxury of big corporate
sponsors behind it. It's all us, folks. So, if you've found value in
what we're building together and want to keep the momentum going,
consider showing your support. A little shoutout on your favorite social
platforms using @&#8203;LocalAI_OSS and @&#8203;mudler_it or joining our
sponsors can make a big difference.

Also, if you haven't yet joined our Discord, come on over! Here's the
link: https://discord.gg/uJAeKSAGDy

Every bit of support, every mention, and every star adds up and helps us
keep this ship sailing. Let's keep making LocalAI awesome together!

Thanks a ton, and here's to more exciting times ahead with LocalAI!

##### What's Changed

##### Bug fixes 🐛

- fix: downgrade torch by [@&#8203;mudler](https://togithub.com/mudler)
in
[mudler/LocalAI#1902
- fix(aio): correctly detect intel systems by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1931
- fix(swagger): do not specify a host by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1930
- fix(tools): correctly render tools response in templates by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1932
- fix(grammar): respect JSONmode and grammar from user input by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1935
- fix(hermes-2-pro-mistral): add stopword for toolcall by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1939
- fix(functions): respect when selected from string by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1940
- fix: use exec in entrypoint scripts to fix signal handling by
[@&#8203;cryptk](https://togithub.com/cryptk) in
[mudler/LocalAI#1943
- fix(hermes-2-pro-mistral): correct stopwords by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1947
- fix(welcome): stable model list by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1949
- fix(ci): manually tag latest images by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1948
- fix(seed): generate random seed per-request if -1 is set by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1952
- fix regression
[#&#8203;1971](https://togithub.com/mudler/LocalAI/issues/1971) by
[@&#8203;fakezeta](https://togithub.com/fakezeta) in
[mudler/LocalAI#1972

##### Exciting New Features 🎉

- feat(aio): add intel profile by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1901
- Enhance autogptq backend to support VL models by
[@&#8203;thiner](https://togithub.com/thiner) in
[mudler/LocalAI#1860
- feat(assistant): Assistant and AssistantFiles api by
[@&#8203;christ66](https://togithub.com/christ66) in
[mudler/LocalAI#1803
- feat: Openvino runtime for transformer backend and streaming support
for Openvino and CUDA by
[@&#8203;fakezeta](https://togithub.com/fakezeta) in
[mudler/LocalAI#1892
- feat: Token Stream support for Transformer, fix: missing package for
OpenVINO by [@&#8203;fakezeta](https://togithub.com/fakezeta) in
[mudler/LocalAI#1908
- feat(welcome): add simple welcome page by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1912
- fix(build): better CI logging and correct some build failure modes in
Makefile by [@&#8203;cryptk](https://togithub.com/cryptk) in
[mudler/LocalAI#1899
- feat(webui): add partials, show backends associated to models by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1922
- feat(swagger): Add swagger API doc by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1926
- feat(build): adjust number of parallel make jobs by
[@&#8203;cryptk](https://togithub.com/cryptk) in
[mudler/LocalAI#1915
- feat(swagger): update by [@&#8203;mudler](https://togithub.com/mudler)
in
[mudler/LocalAI#1929
- feat: first pass at improving logging by
[@&#8203;cryptk](https://togithub.com/cryptk) in
[mudler/LocalAI#1956
- fix(llama.cpp): set better defaults for llama.cpp by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1961

##### 📖 Documentation and examples

- docs(aio-usage): update docs to show examples by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1921

##### 👒 Dependencies

- ⬆️ Update docs version mudler/LocalAI by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1903
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1904
- ⬆️ Update M0Rf30/go-tiny-dream by
[@&#8203;M0Rf30](https://togithub.com/M0Rf30) in
[mudler/LocalAI#1911
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1913
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1914
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1923
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1924
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1928
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1933
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1934
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1937
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1941
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1953
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1958
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1959
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1964

##### Other Changes

- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1927
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1960
- fix(hermes-2-pro-mistral): correct dashes in template to suppress
newlines by [@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1966
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1969
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1970
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1973

##### New Contributors

- [@&#8203;thiner](https://togithub.com/thiner) made their first
contribution in
[mudler/LocalAI#1860

**Full Changelog**:
mudler/LocalAI@v2.11.0...v2.12.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yODEuMiIsInVwZGF0ZWRJblZlciI6IjM3LjI4MS4yIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIiwibGFiZWxzIjpbImF1dG9tZXJnZSIsInVwZGF0ZS9kb2NrZXIvZ2VuZXJhbC9ub24tbWFqb3IiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoGPTQ backend can't load local model files
2 participants