Support multi-modal models #746

arian81 · 2023-10-10T01:14:13Z

This is one of the best open source multi modals based on llama 7 currently. It would nice to be able to host it in ollama.
https://llava-vl.github.io/

ryansereno · 2023-10-10T17:47:47Z

Came here looking for this, to see if the discussion had begun surrounding this.
Curious to see what will be required to make this happen.

Edit: Progress is being made upstream in llama.cpp to support this.

spielhoelle · 2023-10-13T08:53:30Z

The PR @ryansereno mentioned is merged and in master now. How can we run this in ollama?

marscod · 2023-10-15T22:15:04Z

I could successfully run llava-v1.5-7b and it is available at: https://ollama.ai/marscod/llava but I have to map an image parameter to llama.cpp's image parameter. Maybe within the prompt?

chigkim · 2023-10-16T03:11:43Z

It would be good to have file reader command in the prompt like /read file.jpg for this.

hugh-min · 2023-10-17T07:52:41Z

I could successfully run llava-v1.5-7b and it is available at: https://ollama.ai/marscod/llava but I have to map an image parameter to llama.cpp's image parameter. Maybe within the prompt?

Could you elaborate on how to map an image within ollama?

Bortus-AI · 2023-10-28T18:40:59Z

I could successfully run llava-v1.5-7b and it is available at: https://ollama.ai/marscod/llava but I have to map an image parameter to llama.cpp's image parameter. Maybe within the prompt?

Could you elaborate on how to map an image within ollama?

I would like to know as well. Thanks

tmc · 2023-10-29T07:02:40Z

it seems a couple of interface design decisions are are play: 1) how to represent this in the http api and 2) what the user/cli interface should be.

I want to note/highlight that the folks hacking on iTerm2 have done some work that may be relevant in the cli context here: https://iterm2.com/documentation-images.html

For the HTTP interface I'd suggest taking some inspiration of how OpenAI is folding in image data may be useful. I did a bit of protocol decoding and the TL;DR of how they do it is upload to blob store then include a special message type in the completion message list.

There's also a/the consideration of if it's an ollama concern to allow annotation of an incoming image to support highlighting part of the image. That feels a bit out of scope to start but perhaps the design should keep that in mind.

sausheong · 2023-11-03T14:44:23Z

I could successfully run llava-v1.5-7b and it is available at: https://ollama.ai/marscod/llava but I have to map an image parameter to llama.cpp's image parameter. Maybe within the prompt?

Could you elaborate on how to map an image within ollama?

I would like to know as well. Thanks

Me too, can explain how to map an image within ollama?

itsPreto · 2023-11-11T01:43:11Z

Love that this is marked as closed but everyone still clueless over here lol

orkutmuratyilmaz · 2023-11-18T17:29:30Z

@marscod thanks for importing the model. Can you type an example of API call, on the model page?

mangiucugna · 2023-11-21T11:12:26Z

So I figured how to use it, here's the code snippet:

with open("image.jpg", "rb") as f:
      encoded_string = base64.b64encode(f.read()).decode('utf-8')
  data = {"model": "marscod/llava", "prompt": f"USER: {encoded_string} {prompt}\nASSISTANT:", }
  try:
    response = requests.post(url="http://127.0.0.1:11434/api/generate", headers={"Content-Type": "application/json"}, json=data, stream=True)
  except Exception as e:
   # manage exception
  output = ""
  for chunk in response.text.split('\n'):
    chunk = json_repair.loads(chunk)
    if isinstance(chunk, dict):
      output += chunk.get("response") or ""

However it also throws this error: {"error":"error reading llm response: bufio.Scanner: token too long"}

For reference, I prefer using llama.cpp directly with bakllava-1 (way more precise) and the syntax there looks like this:

with open("image.jpg", "rb") as f:
      encoded_string = base64.b64encode(f.read()).decode('utf-8')
  image_data = [{"data": encoded_string, "id": 42}]
  data = {"prompt": f"USER:[img-42] {prompt}.\nASSISTANT:", "n_predict": 4000, "image_data": image_data, "stream": True}
  try:
    response = requests.post(url="http://localhost:8080/completion", headers={"Content-Type": "application/json"}, json=data, stream=True)
  except Exception as e:
    # Manage exception
  output = ""
  for chunk in response.iter_content(chunk_size=128):
    content = chunk.decode().strip().split('\n\n')[0]
    try:
        content_split = content.split('data: ')
        if len(content_split) > 1:
            content_json = json_repair.loads(content_split[1])
            output += content_json["content"]
            yield output
    except Exception as e:
       # Manage exception

This is taken from: https://github.com/mangiucugna/local_multimodal_ai

Hope this helps!

ryansereno · 2023-11-21T12:18:08Z

@mangiucugna thank you, will give it a try.
Hadn't heard of Bakllava before, very excited to try it.

mangiucugna · 2023-11-21T17:37:59Z

I imported bakllava-1 locally and did some tests and it performs so badly when compared to the llama.cpp implementation that is unusable.
I suspect that something is going wrong and the data arriving to the model is corrupted and that somehow {"error":"error reading llm response: bufio.Scanner: token too long"} is related.

Happy to share my Modelfile and link to the gguf for anyone to try to reproduce

Kreijstal · 2023-12-09T12:23:56Z

https://github.com/Mozilla-Ocho/llamafile llamafile supports llava-1.5 it would be nice if ollama supported it too

mak448a · 2023-12-15T02:05:46Z

Since this is now added, I can't figure out how to upload an image to the model. When I follow the instructions at: https://github.com/jmorganca/ollama/releases/tag/v0.1.15, it describes something completely different than what was in the picture. I'm on Linux.

arian81 · 2023-12-15T03:00:55Z

Since this is now added, I can't figure out how to upload an image to the model. When I follow the instructions at: https://github.com/jmorganca/ollama/releases/tag/v0.1.15, it describes something completely different than what was in the picture. I'm on Linux.

You probably haven't updated to the latest version of Ollama if you're getting a bunch of Chinese characters as the output.

orkutmuratyilmaz · 2023-12-16T07:18:49Z

I guess that we can consider this issue as completed :)

prologic · 2023-12-26T04:34:34Z

When I try this I get:

$ ollama run llama2
>>> What's in this image? /Users/prologic/Downloads/IMG_1325.png

I cannot directly view or analyze the image you provided as it is a personal file located on a local computer. However, I can provide some general
information about images and how they can be analyzed.
...

And I'm using the l atest version of ollama:

$ ollama --version
ollama version is 0.1.17

pdevine · 2023-12-26T06:20:10Z

@prologic llama2 isn't a multimodal model. You should try:

$ ollama run llava

prologic · 2023-12-26T06:32:39Z

Ahh! Thanks. When I tried to search for multimodel models the search turend up empty. This is why I wasn't able to figure this out so easily :/ There should be a way to list for and search for multimodel models, even with ollama search (does this sub-command exist?)

schuster-rainer · 2024-03-06T13:12:20Z

if you want to use it with langchain. here is what you need to add to the HumanMessage:

 HumanMessage(
                content=[
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": f"data:image/jpeg;base64,{img_base64}",
                    },
                ]
            )

jmorganca added the feature request New feature or request label Oct 11, 2023

marscod mentioned this issue Oct 23, 2023

Support image inputs #879

Closed

jmorganca changed the title ~~Support llava multi modal model~~ Support multi-modal models Oct 24, 2023

jmorganca mentioned this issue Nov 13, 2023

Exploring Multimodal LLMs: Incorporating Image as Input in ModelFile. #1109

Closed

orkutmuratyilmaz mentioned this issue Nov 18, 2023

Feature: Ollama Support SawyerHood/draw-a-ui#52

Open

BruceMacD assigned pdevine Dec 1, 2023

pdevine closed this as completed Dec 16, 2023

serban-razvan-termene mentioned this issue Feb 28, 2024

Adding Whisper by creating Modelfile #2815

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-modal models #746

Support multi-modal models #746

arian81 commented Oct 10, 2023

ryansereno commented Oct 10, 2023 •

edited

spielhoelle commented Oct 13, 2023

marscod commented Oct 15, 2023 •

edited

chigkim commented Oct 16, 2023 •

edited

hugh-min commented Oct 17, 2023

Bortus-AI commented Oct 28, 2023

tmc commented Oct 29, 2023 •

edited

sausheong commented Nov 3, 2023

itsPreto commented Nov 11, 2023

orkutmuratyilmaz commented Nov 18, 2023

mangiucugna commented Nov 21, 2023 •

edited

ryansereno commented Nov 21, 2023

mangiucugna commented Nov 21, 2023

Kreijstal commented Dec 9, 2023

mak448a commented Dec 15, 2023

arian81 commented Dec 15, 2023

orkutmuratyilmaz commented Dec 16, 2023

prologic commented Dec 26, 2023

pdevine commented Dec 26, 2023

prologic commented Dec 26, 2023

schuster-rainer commented Mar 6, 2024

Support multi-modal models #746

Support multi-modal models #746

Comments

arian81 commented Oct 10, 2023

ryansereno commented Oct 10, 2023 • edited

spielhoelle commented Oct 13, 2023

marscod commented Oct 15, 2023 • edited

chigkim commented Oct 16, 2023 • edited

hugh-min commented Oct 17, 2023

Bortus-AI commented Oct 28, 2023

tmc commented Oct 29, 2023 • edited

sausheong commented Nov 3, 2023

itsPreto commented Nov 11, 2023

orkutmuratyilmaz commented Nov 18, 2023

mangiucugna commented Nov 21, 2023 • edited

ryansereno commented Nov 21, 2023

mangiucugna commented Nov 21, 2023

Kreijstal commented Dec 9, 2023

mak448a commented Dec 15, 2023

arian81 commented Dec 15, 2023

orkutmuratyilmaz commented Dec 16, 2023

prologic commented Dec 26, 2023

pdevine commented Dec 26, 2023

prologic commented Dec 26, 2023

schuster-rainer commented Mar 6, 2024

ryansereno commented Oct 10, 2023 •

edited

marscod commented Oct 15, 2023 •

edited

chigkim commented Oct 16, 2023 •

edited

tmc commented Oct 29, 2023 •

edited

mangiucugna commented Nov 21, 2023 •

edited