LLaVA support #1487

Wojtab · 2023-04-23T02:58:48Z

Ok, multimodality is here. To support LLaVA I created an extension, while I can separate it to a different repo with only an extension, I needed text-generation-webui to support overriding the input_ids/input_embeds. While I was at it, I changed extension handling a bit (there should be no need to update anything in the existing extensions, it's mostly backend changes).

To try it:

download my 4-bit quant from huggingface (I haven't tested it with non-quantized version, it works on 3090, but maybe it will even fit on 12GB of VRAM)
run the webui with my extension enabled python3 server.py --model llava-13b-4bit-128g --wbits 4 --group 128 --chat --model_type=llama --extensions llava
Select LLaVA in instruct mode (should also work in chat, but the template is for instruct)
Add "\n###" to custom stopping strings

Here's a video of it in action:

2023-04-23.04-56-35.mp4

Wojtab · 2023-04-23T04:12:14Z

BTW: don't merge it yet!

If it should be merged as a built-in extension then I want to clean up script.py(it's feature complete, but can use some work), and if it shouldn't get merged as a built-in extension I need to remove script.py

BadisG · 2023-04-23T04:36:11Z

I tried it and it doesn't look like you can talk with the model without image, you are obligated to give him one to get it going. That's a shame because I've heard that training the Vicuna model with picture made him smarter, and I wanted to try it out with regular chat

x-legion · 2023-04-23T06:49:21Z

will it work with ggml models?

CarlKenner · 2023-04-23T09:23:46Z

Gradio HTTP request redirected to localhost :)
Loading llava-13b-4bit-128g...
Could not find the quantized model in .pt or .safetensors format, exiting...

Done!
Press any key to continue . . .

Probably because download-model.bat automatically named it wojtab_llava-13b-v0-4bit-128g

CarlKenner · 2023-04-23T09:29:25Z

Using wojtab_llava-13b-v0-4bit-128g instead of llava-13b-4bit-128g, I'm now getting this error:

Gradio HTTP request redirected to localhost :)
Loading wojtab_llava-13b-v0-4bit-128g...
Found the following quantized model: models\wojtab_llava-13b-v0-4bit-128g\llava-13b-v0-4bit-128g.safetensors
Traceback (most recent call last):
  File "D:\AI\oobabooga-windows\text-generation-webui\server.py", line 921, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\models.py", line 148, in load_model
    model = load_quantized(model_name)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 176, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 44, in _load_quant
    model = AutoModelForCausalLM.from_config(config)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 411, in from_config
    return model_class._from_config(config, **kwargs)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1146, in _from_config
    model = cls(config, **kwargs)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in __init__
    self.model = LlamaModel(config)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in __init__
    self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in <listcomp>
    self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 255, in __init__
    self.self_attn = LlamaAttention(config=config)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 178, in __init__
    self.v_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=False)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py", line 96, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 52428800 bytes.

Done!
Press any key to continue . . .

CarlKenner · 2023-04-23T09:53:44Z

Added more virtual memory in Window's Advanced System Settings, by also using my second hard drive for virtual memory.
Now I get this error instead:

Gradio HTTP request redirected to localhost :)
Loading wojtab_llava-13b-v0-4bit-128g...
Found the following quantized model: models\wojtab_llava-13b-v0-4bit-128g\llava-13b-v0-4bit-128g.safetensors
Loading model ...
Done.
Traceback (most recent call last):
  File "D:\AI\oobabooga-windows\text-generation-webui\server.py", line 921, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\models.py", line 148, in load_model
    model = load_quantized(model_name)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 197, in load_quantized
    model = model.to(torch.device('cuda:0'))
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1896, in to
    return super().to(*args, **kwargs)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
    return self._apply(convert)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 844, in _apply
    self._buffers[key] = fn(buf)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.42 GiB already allocated; 0 bytes free; 3.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Done!
Press any key to continue . . .

CyberTimon · 2023-04-23T11:03:13Z

To create a public link, set share=Trueinlaunch(). Traceback (most recent call last): File "/home/cybertimon/miniconda3/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "/home/cybertimon/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api result = await self.call_function( File "/home/cybertimon/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 898, in call_function prediction = await anyio.to_thread.run_sync( File "/home/cybertimon/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/cybertimon/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/cybertimon/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/cybertimon/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 549, in async_iteration return next(iterator) File "/home/cybertimon/Repositorys/text-generation-webui/modules/chat.py", line 222, in cai_chatbot_wrapper for history in chatbot_wrapper(text, state): File "/home/cybertimon/Repositorys/text-generation-webui/modules/chat.py", line 154, in chatbot_wrapper for reply in generate_reply(f"{prompt}{' ' if len(cumulative_reply) > 0 else ''}{cumulative_reply}", state, eos_token=eos_token, stopping_strings=stopping_strings): File "/home/cybertimon/Repositorys/text-generation-webui/modules/text_generation.py", line 225, in generate_reply question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None) File "/home/cybertimon/Repositorys/text-generation-webui/modules/extensions.py", line 91, in apply_extensions return EXTENSION_MAP[typ](*args, **kwargs) File "/home/cybertimon/Repositorys/text-generation-webui/modules/extensions.py", line 74, in _apply_tokenizer_extensions prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds) File "/home/cybertimon/Repositorys/text-generation-webui/extensions/llava/script.py", line 172, in tokenizer_modifier new_input_embeds.append(cur_new_input_embeds) UnboundLocalError: local variable 'cur_new_input_embeds' referenced before assignment

I think it works except for this error. I can load the model, talk to it but when I select an image, I get this

jparmstr · 2023-04-23T13:34:01Z

I'm able to use this with 8 GB VRAM (Geforce 3060 Ti) with the following arguments:

python server.py --model llava-13b-4bit-128g --wbits 4 --group 128 --chat --model_type=llama --extensions llava --pre_layer 29

Reduce "Max prompt size in tokens" to 500 or less, otherwise you'll get OOM errors after the first response.

After further testing I found it's best to use 0 Max Prompt Size, otherwise there's a chance it will respond to multiple images at once. Although the Max Prompt Size setting is 0, the model still gets around 360 tokens of context each time. I guess this must be hard coded.

Edit:
The merged version runs out of memory on my GPU. I added this to settings.json as suggested by the author, and it's working again:

"llava-clip_device": "cpu",
"llava-projector_device": "cpu"

…n the embedding

Wojtab · 2023-04-23T14:32:54Z

@BadisG - I added a commit ~30mins before your last message which fixed it, you could've pulled before that
@faisalhr1997 - maybe, but you will need to convert llama part of LLaVA to ggml, I haven't tried it
@CarlKenner - looks like both error are OOM, first one CPU, second one GPU. Try vicuna-13b without llava extension first to see if it is caused by my changes, or something in your setup
@CyberTimon - I think it could've happened if the prompt had a truncated image, try the most recent version
@jparmstr - nice, as for the prompt - default template has 101 tokens, and each image takes up 258 tokens (2 for start/end, and 256 of actual image embeddings)

jepjoo · 2023-04-23T14:39:33Z

I can chat with the model without an image but as soon as I enter an image and prompt it, it crashes:

"Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
Aborted"

WSL2 installation, ubuntu 22.04. RTX 4090 and plenty of VRAM left unused.

Wojtab · 2023-04-23T15:15:13Z

I can chat with the model without an image but as soon as I enter an image and prompt it, it crashes:

"Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory Aborted"

WSL2 installation, ubuntu 22.04. RTX 4090 and plenty of VRAM left unused.

https://discuss.pytorch.org/t/libcudnn-cnn-infer-so-8-library-can-not-found/164661

jparmstr · 2023-04-23T15:20:13Z

@jparmstr - nice, as for the prompt - default template has 101 tokens, and each image takes up 258 tokens (2 for start/end, and 256 of actual image embeddings)

That makes sense, I figured the image must take some number of tokens.

I did notice that even with context length 0, the model responds to my questions. For example "What is funny about this image?" it will start with "This image is funny because". I wonder how it's ingesting my prompt when I don't leave any space for it in the context.

jepjoo · 2023-04-23T15:34:19Z

I can chat with the model without an image but as soon as I enter an image and prompt it, it crashes:
"Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory Aborted"
WSL2 installation, ubuntu 22.04. RTX 4090 and plenty of VRAM left unused.

https://discuss.pytorch.org/t/libcudnn-cnn-infer-so-8-library-can-not-found/164661

Thanks alot, that fixed it! Should have googled this harder myself...

Wojtab · 2023-04-23T18:52:49Z

Ok, at this point it's cleaned up enough to where I wanted it, so it could maybe get merged. Also, I added a possibility to run CLIP/projector on CPU(or at 32bit in cuda, which is now the new default).
To run them on CPU, add:

    "llava-clip_device": "cpu",
    "llava-projector_device": "cpu"

to settings.json.
To run 16-bit on cuda(old behaviour), add:

    "llava-clip_bits": 16,
    "llava-projector_bits": 16

Clip doesn't look like it supports run_in_8bit, and I feel like the projector doesn't need it, so there is only 16/32 bit (and 32 bit only for CPU).

@jparmstr - you might be able to squeeze some more tokens with CPU CLIP. As for the prompt, you can add print(prompt) after print(f'Embedded {total_embedded} image(s) in {time.time()-start_ts:.2f}s') in script.py

CyberTimon · 2023-04-23T18:59:43Z

`(base) cybertimon@server:~/Repositorys/text-generation-webui$ python3 server.py --model llava-13b-4bit-128g --gpu-memory 12 --wbits 4 --model_type llama --groupsize 128 --listen-host 0.0.0.0 --listen --xformers --extension llava --chat --listen-port 21129
Gradio HTTP request redirected to localhost :)
Loading settings from settings.json...
Loading llava-13b-4bit-128g...
Found the following quantized model: models/llava-13b-4bit-128g/pytorch_model.safetensors
Loading model ...
Done.
Using the following device map for the quantized model: {'': 0}
Replaced attention with xformers_attention
Loaded the model in 4.27 seconds.
Loading the extension "llava"... Ok.
Loading the extension "gallery"... Ok.
{'add_all_images_to_prompt': False, 'clip_device': None, 'clip_bits': 32, 'projector_device': None, 'projector_bits': 32} cuda:0 torch.float32 cuda:0 torch.float32
Running on local URL: http://0.0.0.0:21129

To create a public link, set share=True in launch().
Embedded 0 image(s) in 0.99s`

I get embedded 0 images in 0.99s. Maybe this is the problem from earlier. Also it answers only: 88888888....

CyberTimon · 2023-04-23T19:02:46Z

Also, when I change the settings to use cpu ´{'add_all_images_to_prompt': False, 'clip_device': 'cpu', 'clip_bits': 32, 'projector_device': 'cpu', 'projector_bits': 32} cpu torch.float32 cpu torch.float32´

I still get only 888888 as answer.

Wojtab · 2023-04-23T19:17:03Z

@CyberTimon remove settings.json, then restart webui, clear the history, and try with this image: https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/examples/extreme_ironing.jpg, with "What is unusual about this image?" prompt, exactly as in my video.
If it still gives a garbage output, then try if this model works for you(without llava extension enabled): https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g

oobabooga · 2023-04-23T19:22:09Z

Very impressive @Wojtab, I'll try to review and merge it soon. Quick question: I remember reading on the LLaVA README that a custom version of transformers was needed. How did you get it working with the standard transformers?

Wojtab · 2023-04-23T19:33:15Z

@oobabooga If you load the original LLaVA on standard transformers it works, but instead of loading the entire model, it just loads LLaMA part, so it can be used for text-based inference without any modifications.
The modified transformers add image input, then the projector, and then it feeds the embeddings to standard finetuned LLaMA. As there were no modifications to LLaMA architecture, I load it as a standard model, in standard transformers, and just use custom embeddings, by running the image->CLIP->projector pipeline by myself in LLaVAEmbedder, instead of in modified transformers

CyberTimon · 2023-04-23T19:43:42Z

@CyberTimon remove settings.json, then restart webui, clear the history, and try with this image: https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/examples/extreme_ironing.jpg, with "What is unusual about this image?" prompt, exactly as in my video. If it still gives a garbage output, then try if this model works for you(without llava extension enabled): https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g

Your a hero! Works perfect now. I had to delete the settings.json

CyberTimon · 2023-04-23T19:48:17Z

Oh I saw what the issue was. When selecting max_new_tokens over 1600 it generates only garbage.

oobabooga · 2023-04-23T20:23:30Z

It worked for me, but I had to use the tokenizer files that come with wojtab/llava-13b-v0-4bit-128g tokenizer instead of the generic LLaMA tokenizer described here. The web UI has the option of loading the same tokenizer for all LlamaForCausalLM from models/llama-tokenizer as a way of ensuring that the files are up to date (many models on Hugging Face use outdated tokenizer files). Does LLaVA use a custom tokenizer?

Base LLaMa tokenizer	wojtab/llava-13b-v0-4bit-128g tokenizer

The modifications to the extensions framework look good to me and are highly appreciated, thanks for taking the time to read the existing code base in detail.
About merging/not merging script.py itself into the repository: I think that this is a good example that future extensions can use as a starting point, so I vote for merging it.

Wojtab · 2023-04-23T21:09:17Z

Regarding tokenizer: there are 4 new tokens, so I don't think the generic one will work:

{
  "<im_end>": 32002,
  "<im_patch>": 32000,
  "<im_start>": 32001,
  "[PAD]": 32003
}

IMO we can merge it here now, give me like 30 minutes, I'll add a description of the extension.
I just fixed the issue @CyberTimon had, the image could be truncated in the middle. It is still broken in vast majority of cases, unless the prompt is like that:

but at least there is a warning in logs, so maybe there won't be 20 issues about it. (btw, you can set the image placement inside prompt by adding <image>)

oobabooga · 2023-04-23T21:18:11Z

A comment: the Extensions doc page says

Additionally, the extension can set value to be a callback, in the form of def cb(text: str, visible_text: str) -> [str, str]. See the send_pictures extension above for an example.

But the send_pictures extension does not use a callback.

Wojtab · 2023-04-23T22:35:16Z

@oobabooga ok, I added the docs, also reworded it in Extensions. One more change: I set it to auto-recognize LLaVA as llama-based model

oobabooga · 2023-04-23T23:32:07Z

I have removed this addition because I found it unnecessary, as the chatbot_wrapper function already updates the history (if that was a mistake, please let me know and I'll revert it).

         yield chat_html_wrapper(shared.history['visible'], state['name1'], state['name2'], state['mode'])
     else:
         # Yield ' ...'
-        last_visible_user = shared.history['visible'][-1][0]
         yield chat_html_wrapper(shared.history['visible'][:-1] + [[shared.history['visible'][-1][0], shared.history['visible'][-1][1] + ' ...']], state['name1'], state['name2'], state['mode'])
         for history in chatbot_wrapper(shared.history['internal'][-1][0], state, _continue=True):
-            shared.history['visible'][-1] = [last_visible_user, history[-1][1]]
             yield chat_html_wrapper(shared.history['visible'], state['name1'], state['name2'], state['mode'])

Also made some minor changes and improvements. Thanks for submitting this PR, I would never have come up with the LLaVA adaptation on my own and the reworked extensions framework is a huge improvement to this project.

oobabooga · 2023-04-23T23:40:40Z

For reference, these are the commands to download and run the model:

python download-model.py wojtab/llava-13b-v0-4bit-128g
python3 server.py --model wojtab_llava-13b-v0-4bit-128g --chat  --extensions llava

VRAM usage peaked at 11106MiB for a single generation.

Wojtab · 2023-04-23T23:54:49Z

@oobabooga thanks for the review and merge. Now, this addition was necessary, for some reason continue replaces visible_text with internal text on the message from user, so now instead of it being <img src="data:image/jpeg;base64,{base64string}> visible text becomes the internal representation: <image:{base64string}>.
Now, thinking about it, it might've been a stupid idea to separate them, as I can parse both of them as easily, but if you stop the prompt, then click continue, the image will disappear for the user

Wojtab · 2023-04-23T23:59:17Z

@oobabooga actually, instead of reverting it, I'll open a separate PR where both of the representations are the same

oobabooga · 2023-04-24T00:04:57Z

I'll wait for your PR then. I might have used ['internal'] instead of ['visible'] somewhere.

Wojtab · 2023-04-24T00:09:35Z

#1507

Ph0rk0z · 2023-04-24T12:57:44Z

I get some errors sometimes, depending on what I choose for parameters.

LLaVA - Embedded 1 image(s) in 0.05s
Traceback (most recent call last):
  File "/home/mint/gptq2/text-generation-webui-testing/modules/callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/home/mint/gptq2/text-generation-webui-testing/modules/text_generation.py", line 257, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2537, in sample
    next_token_scores = logits_processor(input_ids, next_token_logits)
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 92, in __call__
    scores = processor(input_ids, scores)
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 231, in __call__
    score = torch.gather(scores, 1, self.encoder_input_ids)
RuntimeError: gather(): Expected dtype int64 for index

With characters this would be really interesting if it could be wrangled and less of an "AALM"

it works if I add "Assistant" to the stopping strings! Not perfectly of course but its a trip.. it can describe the items as the character or just output nonsense. Luck of the draw.

jackylee1 · 2023-04-25T03:33:01Z

I'm able to use this with 8 GB VRAM (Geforce 3060 Ti) with the following arguments:

python server.py --model llava-13b-4bit-128g --wbits 4 --group 128 --chat --model_type=llama --extensions llava --pre_layer 29

Reduce "Max prompt size in tokens" to 500 or less, otherwise you'll get OOM errors after the first response.

After further testing I found it's best to use 0 Max Prompt Size, otherwise there's a chance it will respond to multiple images at once. Although the Max Prompt Size setting is 0, the model still gets around 360 tokens of context each time. I guess this must be hard coded.

Edit: The merged version runs out of memory on my GPU. I added this to settings.json as suggested by the author, and it's working again:
"llava-clip_device": "cpu",
"llava-projector_device": "cpu"

hi,can you share the setting.json file.i have same hardware

yhyu13 · 2023-05-02T05:35:00Z

@oobabooga @Wojtab

Dockerfile is not updated with LLaVA, we still get this error

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory

The solution is as https://discuss.pytorch.org/t/could-not-load-library-libcudnn-cnn-infer-so-8/175139/1 suggested
is to run apt install cuda-11-8

But cuda will prompt something like country code, which cannot be handled by "-y" by apt-get, so we need

ENV DEBIAN_FRONTEND=noninteractive

The final docker file changes looks like something below

# yhyu13 : add cmd non interactive due to cuda requires some prompt input
# https://stackoverflow.com/questions/63476497/docker-build-with-ubuntu-18-04-image-hangs-after-prompting-for-country-of-origin
ENV DEBIAN_FRONTEND=noninteractive

# yhyu13 : install additional packages 
# https://discuss.pytorch.org/t/could-not-load-library-libcudnn-cnn-infer-so-8/175139/2
RUN apt-get update && \
    apt-get install -y cuda-11-8 && \
    apt-get install -y libportaudio2 libasound-dev git python3 python3-pip make g++

The result :

Then, I am able to run LLaVA in within the local docker

Wojciech Bacza added 3 commits April 23, 2023 04:32

initial LLaVA support

03ee6b8

cleanup

4490b21

remove print

60af089

This was referenced Apr 23, 2023

Request: LLaVA: Large Language and Vision Assitant Visual instruction tuning towards large language and vision models with GPT-4 level capabilities. #1355

Closed

I made it work on a single 3090 haotian-liu/LLaVA#40

Closed

fix crash when there are no images

5719c1d

small fixes, also add a checkbox whether to include multiple images i…

e5ceec2

…n the embedding

add some settings, do some cleanup

064cdbb

remove leftover debug print, replace it with proper logs on init

2727c6a

fix 2 small bugs

1780724

Wojtab added 2 commits April 24, 2023 00:28

Add docs, remove the need to specify --model for LLaVA

7002fcb

Merge branch 'main' into llava-support

df9b2b7

oobabooga added 4 commits April 23, 2023 19:55

Auto-set instruct mode, don't apply the universal tokenizer

dc19d4d

Sort imports

747a010

Remove unnecessary addition

0997cb1

Clear the picture after submitting with Enter

51af7d8

oobabooga merged commit 12212cf into oobabooga:main Apr 23, 2023

Wojtab mentioned this pull request Apr 25, 2023

Fix dtypes returned by llava extension tokenizer_modifier #1547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA support #1487

LLaVA support #1487

Wojtab commented Apr 23, 2023

Wojtab commented Apr 23, 2023 •

edited

Loading

BadisG commented Apr 23, 2023

x-legion commented Apr 23, 2023

CarlKenner commented Apr 23, 2023

CarlKenner commented Apr 23, 2023

CarlKenner commented Apr 23, 2023

CyberTimon commented Apr 23, 2023

jparmstr commented Apr 23, 2023 •

edited

Loading

Wojtab commented Apr 23, 2023

jepjoo commented Apr 23, 2023 •

edited

Loading

Wojtab commented Apr 23, 2023

jparmstr commented Apr 23, 2023

jepjoo commented Apr 23, 2023

Wojtab commented Apr 23, 2023 •

edited

Loading

CyberTimon commented Apr 23, 2023 •

edited

Loading

CyberTimon commented Apr 23, 2023

Wojtab commented Apr 23, 2023

oobabooga commented Apr 23, 2023

Wojtab commented Apr 23, 2023

CyberTimon commented Apr 23, 2023

CyberTimon commented Apr 23, 2023

oobabooga commented Apr 23, 2023

Wojtab commented Apr 23, 2023

oobabooga commented Apr 23, 2023

Wojtab commented Apr 23, 2023

oobabooga commented Apr 23, 2023

oobabooga commented Apr 23, 2023

Wojtab commented Apr 23, 2023

Wojtab commented Apr 23, 2023

oobabooga commented Apr 24, 2023

Wojtab commented Apr 24, 2023

Ph0rk0z commented Apr 24, 2023 •

edited

Loading

jackylee1 commented Apr 25, 2023

yhyu13 commented May 2, 2023

LLaVA support #1487

LLaVA support #1487

Conversation

Wojtab commented Apr 23, 2023

Wojtab commented Apr 23, 2023 • edited Loading

BadisG commented Apr 23, 2023

x-legion commented Apr 23, 2023

CarlKenner commented Apr 23, 2023

CarlKenner commented Apr 23, 2023

CarlKenner commented Apr 23, 2023

CyberTimon commented Apr 23, 2023

jparmstr commented Apr 23, 2023 • edited Loading

Wojtab commented Apr 23, 2023

jepjoo commented Apr 23, 2023 • edited Loading

Wojtab commented Apr 23, 2023

jparmstr commented Apr 23, 2023

jepjoo commented Apr 23, 2023

Wojtab commented Apr 23, 2023 • edited Loading

CyberTimon commented Apr 23, 2023 • edited Loading

CyberTimon commented Apr 23, 2023

Wojtab commented Apr 23, 2023

oobabooga commented Apr 23, 2023

Wojtab commented Apr 23, 2023

CyberTimon commented Apr 23, 2023

CyberTimon commented Apr 23, 2023

oobabooga commented Apr 23, 2023

Wojtab commented Apr 23, 2023

oobabooga commented Apr 23, 2023

Wojtab commented Apr 23, 2023

oobabooga commented Apr 23, 2023

oobabooga commented Apr 23, 2023

Wojtab commented Apr 23, 2023

Wojtab commented Apr 23, 2023

oobabooga commented Apr 24, 2023

Wojtab commented Apr 24, 2023

Ph0rk0z commented Apr 24, 2023 • edited Loading

jackylee1 commented Apr 25, 2023

yhyu13 commented May 2, 2023

Wojtab commented Apr 23, 2023 •

edited

Loading

jparmstr commented Apr 23, 2023 •

edited

Loading

jepjoo commented Apr 23, 2023 •

edited

Loading

Wojtab commented Apr 23, 2023 •

edited

Loading

CyberTimon commented Apr 23, 2023 •

edited

Loading

Ph0rk0z commented Apr 24, 2023 •

edited

Loading