Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for sending images into OpenAI chat API #4827

Merged
merged 8 commits into from
Dec 23, 2023

Conversation

kabachuha
Copy link
Contributor

This PR aims to handle 'image_url's (base64 or remote file) supplied to the messages history if a user want to use GPT-Vision-like features by converting it to base64 html tags supported by the multimodal extension

closes #4603

Checklist:

@oobabooga
Copy link
Owner

Could you create a simple curl command with an example for testing purposes?

@kabachuha
Copy link
Contributor Author

Ofc, I'm planning to test this Friday. Thursday is a busy day for me

@kabachuha
Copy link
Contributor Author

kabachuha commented Dec 8, 2023

Works! (note: my api port is 5001 for compat)

curl http://127.0.0.1:5001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "image_url": "https://avatars.githubusercontent.com/u/112222186?v=4"
      },
      {
        "role": "user",
        "content": "What is unusual about this image?"
      }
    ],
    "mode": "chat",
    "character": "Example"
  }'

Raw response:

{"id":"chatcmpl-1702041177702761984","object":"chat.completions","created":1702041177,"model":"TheBloke_llava-v1.5-13B-GPTQ","choices":[{"index":0,"finish_reason":"length","message":{"role":"assistant","content":"Well, for one thing there is a computer engineer wearing a blue shirt with a frog on one should. And then, there's an actual frog strapped to the other shoulder, which makes this sight look unusual."}}],"usage":{"prompt_tokens":30216,"completion_tokens":49,"total_tokens":30265}}

Extracted message:

Well, for one thing there is a computer engineer wearing a blue shirt with a frog on one should. And then, there's an actual frog strapped to the other shoulder, which makes this sight look unusual.

image

@oobabooga

@kabachuha kabachuha marked this pull request as ready for review December 8, 2023 13:15
@kabachuha
Copy link
Contributor Author

Base64 works too

import base64
import json
import requests

img = open('image.jpg', 'rb')
img_bytes = img.read()
img_base64 = base64.b64encode(img_bytes).decode('utf-8')
data = { "messages": [
        {
            "role": "user",
            "image_url": f"data:image/jpeg;base64,{img_base64}"
        },
        {
            "role": "user",
            "content": "what is unusual about this image?"
        }
    ],
    "mode": "chat",
    "character": "Example"
}
response = requests.post('http://127.0.0.1:5001/v1/chat/completions', json=data)
print(response.text)

Raw response:
{"id":"chatcmpl-1702111944216246528","object":"chat.completions","created":1702111944,"model":"TheBloke_llava-v1.5-13B-GPTQ","choices":[{"index":0,"finish_reason":"length","message":{"role":"assistant","content":"Yeah, this is big brain time."}}],"usage":{"prompt_tokens":27118,"completion_tokens":9,"total_tokens":27127}}

Extracted message:

Yeah, this is big brain time.

image

@kabachuha
Copy link
Contributor Author

@oobabooga are you ready to test/merge this PR?

Copy link
Contributor

@ercanozer ercanozer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Any reason to wait for this to get merged?

@oobabooga oobabooga changed the base branch from main to dev December 23, 2023 01:34
@oobabooga
Copy link
Owner

Sorry for taking so long to review! The PR is perfect and I'm impressed that you managed to add multimodal functionality to the API with so few added lines. Well done.

I used these commands for testing:

  • Load the model:
python server.py \
  --model liuhaotian_llava-v1.5-13b \
  --load-in-4bit \
  --multimodal-pipeline llava-v1.5-13b \
  --api
  • HTTP request:
curl http://127.0.0.1:5000/v1/chat/completions     
  -H "Content-Type: application/json"     
  -d '{
    "messages": [
      {
        "role": "user",
        "image_url": "https://avatars.githubusercontent.com/u/112222186?v=4"
      },
      {
        "role": "user",
        "content": "What is unusual about this image?"
      }
    ]
  }'

@kabachuha a related open problem is that multimodal doesn't work with the llama.cpp loader at the moment. For transformers, the extension gets the embeddings and does the whole process manually; for llama.cpp it may be doable by adapting the following code in llama-cpp-python somehow:

https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#multi-modal-models

For llamacpp_HF it would be necessary to get the correct embeddings method from https://llama-cpp-python.readthedocs.io/en/latest/api-reference/ and attach it to the llamacpp_HF class.

Both of these are very difficult but if you are interested in digging into that, that would be welcome.

@oobabooga oobabooga merged commit dbe4385 into oobabooga:dev Dec 23, 2023
@kabachuha kabachuha deleted the mm-api branch December 23, 2023 12:14
@Victorivus
Copy link
Contributor

Victorivus commented Apr 5, 2024

I am trying to test this and I feel the models I've tried ignore the images. I have tried FuYu-8B such as McGill-NLP_fuyu-8b-weblinx, adept_fuyu-8b or adept-hf-collab_fuyu-8b.

So I tried with the ones you mentioned, and I can't manage to get them running, how do you do so?

ValueError: Unrecognized configuration class <class 'transformers.models.llava.configuration_llava.LlavaConfig'> for this kind of AutoModel: AutoModelForCausalLM.

Edit: Solved it with PR #5038 , there is a way to solving this modifying modules/models.py, probably will be fixed soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multimodal api is not longer available
4 participants