Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for llava, how to send image rather than image_path? #212

Open
lss15151161 opened this issue Feb 21, 2024 · 2 comments
Open

for llava, how to send image rather than image_path? #212

lss15151161 opened this issue Feb 21, 2024 · 2 comments

Comments

@lss15151161
Copy link

current example(srt_example_llava.py) request like this:
state = image_qa.run(
image_path="./images/cat.jpeg",
question="What is this?",
max_new_tokens=64
)

but, if i want to send request like below, what should I do?
request = {
"text": ["What is this?".encode()],
"images": [open("./images/cat.jpeg", "rb").read()]
}

@mostlyuseful
Copy link

According to sglang.utils.encode_image_base64, image_path is interpreted in one of three ways:

  • A str instance is interpreted as the path to an image
  • bytes can be used to inject arbitrary data (e.g. from a Path(IMAGE_PATH).read_bytes() call)
  • Any other instance type is interpreted as a Pillow image and will be first converted to PNG before dumping its bytes.

This means that any of these calls should work:

image_qa.run(image_path="./path/to/image.jpg", ...)
image_qa.run(image_path=open("./path/to/image.jpg", "rb").read(), ...)
image_qa.run(image_path=Image.open("./path/to/image.jpg"), ...)

I recommend to set a breakpoint in sglang.lang.interpreter.StreamExecutor._execute_image to understand what is happening under the covers.

@matallanas
Copy link

In case you want to send directly the request you need to encode the image in base64 due to is Json serializable. For example you create a function to encode the image that you want:

def encode_image(image_path):
    with open(image_path, 'rb') as image_file:
        # Read the file
        image_data = image_file.read()
        # Encode the image
        encoded_image = base64.b64encode(image_data)
        # Convert to a UTF-8 string
        encoded_string = encoded_image.decode('utf-8')
        return encoded_string

And then you can create the payload and send it to the sglang server like this:

json_data = {
        "text": '<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\nWhat is this?<|im_end|><|im_start|>assistant\n',
        "image_data": encode_image("./images/cat.jpeg"),
        "sampling_params": {
            "temperature": 0,
            "max_new_tokens": 256,
       },
 }
response = requests.post(
        url + "/generate",
        json=json_data
)

Remeber that LLaVa used chatml template for messages. With this you can send directly the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants