Support for sending images into OpenAI chat API #4827

kabachuha · 2023-12-06T12:04:08Z

This PR aims to handle 'image_url's (base64 or remote file) supplied to the messages history if a user want to use GPT-Vision-like features by converting it to base64 html tags supported by the multimodal extension

closes #4603

Checklist:

I have read the Contributing guidelines.

oobabooga · 2023-12-07T00:12:02Z

Could you create a simple curl command with an example for testing purposes?

kabachuha · 2023-12-07T08:39:50Z

Ofc, I'm planning to test this Friday. Thursday is a busy day for me

kabachuha · 2023-12-08T13:15:04Z

Works! (note: my api port is 5001 for compat)

curl http://127.0.0.1:5001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "image_url": "https://avatars.githubusercontent.com/u/112222186?v=4"
      },
      {
        "role": "user",
        "content": "What is unusual about this image?"
      }
    ],
    "mode": "chat",
    "character": "Example"
  }'

Raw response:

{"id":"chatcmpl-1702041177702761984","object":"chat.completions","created":1702041177,"model":"TheBloke_llava-v1.5-13B-GPTQ","choices":[{"index":0,"finish_reason":"length","message":{"role":"assistant","content":"Well, for one thing there is a computer engineer wearing a blue shirt with a frog on one should. And then, there's an actual frog strapped to the other shoulder, which makes this sight look unusual."}}],"usage":{"prompt_tokens":30216,"completion_tokens":49,"total_tokens":30265}}

Extracted message:

Well, for one thing there is a computer engineer wearing a blue shirt with a frog on one should. And then, there's an actual frog strapped to the other shoulder, which makes this sight look unusual.

@oobabooga

kabachuha · 2023-12-09T08:53:39Z

Base64 works too

import base64
import json
import requests

img = open('image.jpg', 'rb')
img_bytes = img.read()
img_base64 = base64.b64encode(img_bytes).decode('utf-8')
data = { "messages": [
        {
            "role": "user",
            "image_url": f"data:image/jpeg;base64,{img_base64}"
        },
        {
            "role": "user",
            "content": "what is unusual about this image?"
        }
    ],
    "mode": "chat",
    "character": "Example"
}
response = requests.post('http://127.0.0.1:5001/v1/chat/completions', json=data)
print(response.text)

Raw response:
{"id":"chatcmpl-1702111944216246528","object":"chat.completions","created":1702111944,"model":"TheBloke_llava-v1.5-13B-GPTQ","choices":[{"index":0,"finish_reason":"length","message":{"role":"assistant","content":"Yeah, this is big brain time."}}],"usage":{"prompt_tokens":27118,"completion_tokens":9,"total_tokens":27127}}

Extracted message:

Yeah, this is big brain time.

kabachuha · 2023-12-15T16:10:17Z

@oobabooga are you ready to test/merge this PR?

ercanozer

This is looking great! Any reason to wait for this to get merged?

oobabooga · 2023-12-23T01:45:12Z

Sorry for taking so long to review! The PR is perfect and I'm impressed that you managed to add multimodal functionality to the API with so few added lines. Well done.

I used these commands for testing:

Load the model:

python server.py \
  --model liuhaotian_llava-v1.5-13b \
  --load-in-4bit \
  --multimodal-pipeline llava-v1.5-13b \
  --api

HTTP request:

curl http://127.0.0.1:5000/v1/chat/completions     
  -H "Content-Type: application/json"     
  -d '{
    "messages": [
      {
        "role": "user",
        "image_url": "https://avatars.githubusercontent.com/u/112222186?v=4"
      },
      {
        "role": "user",
        "content": "What is unusual about this image?"
      }
    ]
  }'

@kabachuha a related open problem is that multimodal doesn't work with the llama.cpp loader at the moment. For transformers, the extension gets the embeddings and does the whole process manually; for llama.cpp it may be doable by adapting the following code in llama-cpp-python somehow:

https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#multi-modal-models

For llamacpp_HF it would be necessary to get the correct embeddings method from https://llama-cpp-python.readthedocs.io/en/latest/api-reference/ and attach it to the llamacpp_HF class.

Both of these are very difficult but if you are interested in digging into that, that would be welcome.

Victorivus · 2024-04-05T18:09:44Z

I am trying to test this and I feel the models I've tried ignore the images. I have tried FuYu-8B such as McGill-NLP_fuyu-8b-weblinx, adept_fuyu-8b or adept-hf-collab_fuyu-8b.

So I tried with the ones you mentioned, and I can't manage to get them running, how do you do so?

ValueError: Unrecognized configuration class <class 'transformers.models.llava.configuration_llava.LlavaConfig'> for this kind of AutoModel: AutoModelForCausalLM.

Edit: Solved it with PR #5038 , there is a way to solving this modifying modules/models.py, probably will be fixed soon

kabachuha added 2 commits December 6, 2023 14:58

multimodal api image url parsing support

a5abe10

use regex for base64 prefix subbing

614232a

image format fixup

1442cf2

kabachuha marked this pull request as ready for review December 8, 2023 13:15

ercanozer approved these changes Dec 21, 2023

View reviewed changes

oobabooga added 3 commits December 22, 2023 17:24

Very minor style changes

7f81db2

Merge branch 'dev' into kabachuha-mm-api

d77cd56

Add @kabachuha's examples to the documentation

cca1f0f

oobabooga changed the base branch from main to dev December 23, 2023 01:34

oobabooga added 2 commits December 22, 2023 17:36

Update doc

5e0b532

Add space

ea1d6c7

oobabooga merged commit dbe4385 into oobabooga:dev Dec 23, 2023

kabachuha deleted the mm-api branch December 23, 2023 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for sending images into OpenAI chat API #4827

Support for sending images into OpenAI chat API #4827

kabachuha commented Dec 6, 2023

oobabooga commented Dec 7, 2023

kabachuha commented Dec 7, 2023

kabachuha commented Dec 8, 2023 •

edited

Loading

kabachuha commented Dec 9, 2023

kabachuha commented Dec 15, 2023

ercanozer left a comment

oobabooga commented Dec 23, 2023

Victorivus commented Apr 5, 2024 •

edited

Loading

Support for sending images into OpenAI chat API #4827

Support for sending images into OpenAI chat API #4827

Conversation

kabachuha commented Dec 6, 2023

Checklist:

oobabooga commented Dec 7, 2023

kabachuha commented Dec 7, 2023

kabachuha commented Dec 8, 2023 • edited Loading

kabachuha commented Dec 9, 2023

kabachuha commented Dec 15, 2023

ercanozer left a comment

Choose a reason for hiding this comment

oobabooga commented Dec 23, 2023

Victorivus commented Apr 5, 2024 • edited Loading

kabachuha commented Dec 8, 2023 •

edited

Loading

Victorivus commented Apr 5, 2024 •

edited

Loading