Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaVA support #1487

Merged
merged 14 commits into from
Apr 23, 2023
Merged

LLaVA support #1487

merged 14 commits into from
Apr 23, 2023

Conversation

Wojtab
Copy link
Contributor

@Wojtab Wojtab commented Apr 23, 2023

Ok, multimodality is here. To support LLaVA I created an extension, while I can separate it to a different repo with only an extension, I needed text-generation-webui to support overriding the input_ids/input_embeds. While I was at it, I changed extension handling a bit (there should be no need to update anything in the existing extensions, it's mostly backend changes).

To try it:

  1. download my 4-bit quant from huggingface (I haven't tested it with non-quantized version, it works on 3090, but maybe it will even fit on 12GB of VRAM)
  2. run the webui with my extension enabled python3 server.py --model llava-13b-4bit-128g --wbits 4 --group 128 --chat --model_type=llama --extensions llava
  3. Select LLaVA in instruct mode (should also work in chat, but the template is for instruct)
  4. Add "\n###" to custom stopping strings

Here's a video of it in action:

2023-04-23.04-56-35.mp4

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

BTW: don't merge it yet!

If it should be merged as a built-in extension then I want to clean up script.py(it's feature complete, but can use some work), and if it shouldn't get merged as a built-in extension I need to remove script.py

@BadisG
Copy link
Contributor

BadisG commented Apr 23, 2023

I tried it and it doesn't look like you can talk with the model without image, you are obligated to give him one to get it going. That's a shame because I've heard that training the Vicuna model with picture made him smarter, and I wanted to try it out with regular chat

@x-legion
Copy link

will it work with ggml models?

@CarlKenner
Copy link
Contributor

Gradio HTTP request redirected to localhost :)
Loading llava-13b-4bit-128g...
Could not find the quantized model in .pt or .safetensors format, exiting...

Done!
Press any key to continue . . .

Probably because download-model.bat automatically named it wojtab_llava-13b-v0-4bit-128g

@CarlKenner
Copy link
Contributor

Using wojtab_llava-13b-v0-4bit-128g instead of llava-13b-4bit-128g, I'm now getting this error:

Gradio HTTP request redirected to localhost :)
Loading wojtab_llava-13b-v0-4bit-128g...
Found the following quantized model: models\wojtab_llava-13b-v0-4bit-128g\llava-13b-v0-4bit-128g.safetensors
Traceback (most recent call last):
  File "D:\AI\oobabooga-windows\text-generation-webui\server.py", line 921, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\models.py", line 148, in load_model
    model = load_quantized(model_name)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 176, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 44, in _load_quant
    model = AutoModelForCausalLM.from_config(config)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 411, in from_config
    return model_class._from_config(config, **kwargs)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1146, in _from_config
    model = cls(config, **kwargs)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in __init__
    self.model = LlamaModel(config)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in __init__
    self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in <listcomp>
    self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 255, in __init__
    self.self_attn = LlamaAttention(config=config)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 178, in __init__
    self.v_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=False)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py", line 96, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 52428800 bytes.

Done!
Press any key to continue . . .

@CarlKenner
Copy link
Contributor

Added more virtual memory in Window's Advanced System Settings, by also using my second hard drive for virtual memory.
Now I get this error instead:

Gradio HTTP request redirected to localhost :)
Loading wojtab_llava-13b-v0-4bit-128g...
Found the following quantized model: models\wojtab_llava-13b-v0-4bit-128g\llava-13b-v0-4bit-128g.safetensors
Loading model ...
Done.
Traceback (most recent call last):
  File "D:\AI\oobabooga-windows\text-generation-webui\server.py", line 921, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\models.py", line 148, in load_model
    model = load_quantized(model_name)
  File "D:\AI\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 197, in load_quantized
    model = model.to(torch.device('cuda:0'))
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1896, in to
    return super().to(*args, **kwargs)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
    return self._apply(convert)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 844, in _apply
    self._buffers[key] = fn(buf)
  File "D:\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.42 GiB already allocated; 0 bytes free; 3.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Done!
Press any key to continue . . .

@CyberTimon
Copy link

To create a public link, set share=Trueinlaunch(). Traceback (most recent call last): File "/home/cybertimon/miniconda3/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "/home/cybertimon/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api result = await self.call_function( File "/home/cybertimon/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 898, in call_function prediction = await anyio.to_thread.run_sync( File "/home/cybertimon/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/cybertimon/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/cybertimon/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/cybertimon/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 549, in async_iteration return next(iterator) File "/home/cybertimon/Repositorys/text-generation-webui/modules/chat.py", line 222, in cai_chatbot_wrapper for history in chatbot_wrapper(text, state): File "/home/cybertimon/Repositorys/text-generation-webui/modules/chat.py", line 154, in chatbot_wrapper for reply in generate_reply(f"{prompt}{' ' if len(cumulative_reply) > 0 else ''}{cumulative_reply}", state, eos_token=eos_token, stopping_strings=stopping_strings): File "/home/cybertimon/Repositorys/text-generation-webui/modules/text_generation.py", line 225, in generate_reply question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None) File "/home/cybertimon/Repositorys/text-generation-webui/modules/extensions.py", line 91, in apply_extensions return EXTENSION_MAP[typ](*args, **kwargs) File "/home/cybertimon/Repositorys/text-generation-webui/modules/extensions.py", line 74, in _apply_tokenizer_extensions prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds) File "/home/cybertimon/Repositorys/text-generation-webui/extensions/llava/script.py", line 172, in tokenizer_modifier new_input_embeds.append(cur_new_input_embeds) UnboundLocalError: local variable 'cur_new_input_embeds' referenced before assignment

I think it works except for this error. I can load the model, talk to it but when I select an image, I get this

@jparmstr
Copy link
Contributor

jparmstr commented Apr 23, 2023

I'm able to use this with 8 GB VRAM (Geforce 3060 Ti) with the following arguments:

python server.py --model llava-13b-4bit-128g --wbits 4 --group 128 --chat --model_type=llama --extensions llava --pre_layer 29

Reduce "Max prompt size in tokens" to 500 or less, otherwise you'll get OOM errors after the first response.

After further testing I found it's best to use 0 Max Prompt Size, otherwise there's a chance it will respond to multiple images at once. Although the Max Prompt Size setting is 0, the model still gets around 360 tokens of context each time. I guess this must be hard coded.

Edit:
The merged version runs out of memory on my GPU. I added this to settings.json as suggested by the author, and it's working again:

"llava-clip_device": "cpu",
"llava-projector_device": "cpu"

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

@BadisG - I added a commit ~30mins before your last message which fixed it, you could've pulled before that
@faisalhr1997 - maybe, but you will need to convert llama part of LLaVA to ggml, I haven't tried it
@CarlKenner - looks like both error are OOM, first one CPU, second one GPU. Try vicuna-13b without llava extension first to see if it is caused by my changes, or something in your setup
@CyberTimon - I think it could've happened if the prompt had a truncated image, try the most recent version
@jparmstr - nice, as for the prompt - default template has 101 tokens, and each image takes up 258 tokens (2 for start/end, and 256 of actual image embeddings)

@jepjoo
Copy link

jepjoo commented Apr 23, 2023

I can chat with the model without an image but as soon as I enter an image and prompt it, it crashes:

"Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
Aborted"

WSL2 installation, ubuntu 22.04. RTX 4090 and plenty of VRAM left unused.

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

I can chat with the model without an image but as soon as I enter an image and prompt it, it crashes:

"Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory Aborted"

WSL2 installation, ubuntu 22.04. RTX 4090 and plenty of VRAM left unused.

https://discuss.pytorch.org/t/libcudnn-cnn-infer-so-8-library-can-not-found/164661

@jparmstr
Copy link
Contributor

@jparmstr - nice, as for the prompt - default template has 101 tokens, and each image takes up 258 tokens (2 for start/end, and 256 of actual image embeddings)

That makes sense, I figured the image must take some number of tokens.

I did notice that even with context length 0, the model responds to my questions. For example "What is funny about this image?" it will start with "This image is funny because". I wonder how it's ingesting my prompt when I don't leave any space for it in the context.

@jepjoo
Copy link

jepjoo commented Apr 23, 2023

I can chat with the model without an image but as soon as I enter an image and prompt it, it crashes:
"Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory Aborted"
WSL2 installation, ubuntu 22.04. RTX 4090 and plenty of VRAM left unused.

https://discuss.pytorch.org/t/libcudnn-cnn-infer-so-8-library-can-not-found/164661

Thanks alot, that fixed it! Should have googled this harder myself...

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

Ok, at this point it's cleaned up enough to where I wanted it, so it could maybe get merged. Also, I added a possibility to run CLIP/projector on CPU(or at 32bit in cuda, which is now the new default).
To run them on CPU, add:

    "llava-clip_device": "cpu",
    "llava-projector_device": "cpu"

to settings.json.
To run 16-bit on cuda(old behaviour), add:

    "llava-clip_bits": 16,
    "llava-projector_bits": 16

Clip doesn't look like it supports run_in_8bit, and I feel like the projector doesn't need it, so there is only 16/32 bit (and 32 bit only for CPU).

@jparmstr - you might be able to squeeze some more tokens with CPU CLIP. As for the prompt, you can add print(prompt) after print(f'Embedded {total_embedded} image(s) in {time.time()-start_ts:.2f}s') in script.py

@CyberTimon
Copy link

CyberTimon commented Apr 23, 2023

`(base) cybertimon@server:~/Repositorys/text-generation-webui$ python3 server.py --model llava-13b-4bit-128g --gpu-memory 12 --wbits 4 --model_type llama --groupsize 128 --listen-host 0.0.0.0 --listen --xformers --extension llava --chat --listen-port 21129
Gradio HTTP request redirected to localhost :)
Loading settings from settings.json...
Loading llava-13b-4bit-128g...
Found the following quantized model: models/llava-13b-4bit-128g/pytorch_model.safetensors
Loading model ...
Done.
Using the following device map for the quantized model: {'': 0}
Replaced attention with xformers_attention
Loaded the model in 4.27 seconds.
Loading the extension "llava"... Ok.
Loading the extension "gallery"... Ok.
{'add_all_images_to_prompt': False, 'clip_device': None, 'clip_bits': 32, 'projector_device': None, 'projector_bits': 32} cuda:0 torch.float32 cuda:0 torch.float32
Running on local URL: http://0.0.0.0:21129

To create a public link, set share=True in launch().
Embedded 0 image(s) in 0.99s`

I get embedded 0 images in 0.99s. Maybe this is the problem from earlier. Also it answers only: 88888888....

@CyberTimon
Copy link

Also, when I change the settings to use cpu ´{'add_all_images_to_prompt': False, 'clip_device': 'cpu', 'clip_bits': 32, 'projector_device': 'cpu', 'projector_bits': 32} cpu torch.float32 cpu torch.float32´

I still get only 888888 as answer.

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

@CyberTimon remove settings.json, then restart webui, clear the history, and try with this image: https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/examples/extreme_ironing.jpg, with "What is unusual about this image?" prompt, exactly as in my video.
If it still gives a garbage output, then try if this model works for you(without llava extension enabled): https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g

@oobabooga
Copy link
Owner

Very impressive @Wojtab, I'll try to review and merge it soon. Quick question: I remember reading on the LLaVA README that a custom version of transformers was needed. How did you get it working with the standard transformers?

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

@oobabooga If you load the original LLaVA on standard transformers it works, but instead of loading the entire model, it just loads LLaMA part, so it can be used for text-based inference without any modifications.
The modified transformers add image input, then the projector, and then it feeds the embeddings to standard finetuned LLaMA. As there were no modifications to LLaMA architecture, I load it as a standard model, in standard transformers, and just use custom embeddings, by running the image->CLIP->projector pipeline by myself in LLaVAEmbedder, instead of in modified transformers

@CyberTimon
Copy link

@CyberTimon remove settings.json, then restart webui, clear the history, and try with this image: https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/examples/extreme_ironing.jpg, with "What is unusual about this image?" prompt, exactly as in my video. If it still gives a garbage output, then try if this model works for you(without llava extension enabled): https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g

Your a hero! Works perfect now. I had to delete the settings.json

@CyberTimon
Copy link

Oh I saw what the issue was. When selecting max_new_tokens over 1600 it generates only garbage.

@oobabooga
Copy link
Owner

  • It worked for me, but I had to use the tokenizer files that come with wojtab/llava-13b-v0-4bit-128g tokenizer instead of the generic LLaMA tokenizer described here. The web UI has the option of loading the same tokenizer for all LlamaForCausalLM from models/llama-tokenizer as a way of ensuring that the files are up to date (many models on Hugging Face use outdated tokenizer files). Does LLaVA use a custom tokenizer?
Base LLaMa tokenizer wojtab/llava-13b-v0-4bit-128g tokenizer
uh good
  • The modifications to the extensions framework look good to me and are highly appreciated, thanks for taking the time to read the existing code base in detail.
  • About merging/not merging script.py itself into the repository: I think that this is a good example that future extensions can use as a starting point, so I vote for merging it.

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

Regarding tokenizer: there are 4 new tokens, so I don't think the generic one will work:

{
  "<im_end>": 32002,
  "<im_patch>": 32000,
  "<im_start>": 32001,
  "[PAD]": 32003
}

IMO we can merge it here now, give me like 30 minutes, I'll add a description of the extension.
I just fixed the issue @CyberTimon had, the image could be truncated in the middle. It is still broken in vast majority of cases, unless the prompt is like that:
image
but at least there is a warning in logs, so maybe there won't be 20 issues about it. (btw, you can set the image placement inside prompt by adding <image>)

@oobabooga
Copy link
Owner

A comment: the Extensions doc page says

Additionally, the extension can set value to be a callback, in the form of def cb(text: str, visible_text: str) -> [str, str]. See the send_pictures extension above for an example.

But the send_pictures extension does not use a callback.

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

@oobabooga ok, I added the docs, also reworded it in Extensions. One more change: I set it to auto-recognize LLaVA as llama-based model

@oobabooga
Copy link
Owner

I have removed this addition because I found it unnecessary, as the chatbot_wrapper function already updates the history (if that was a mistake, please let me know and I'll revert it).

         yield chat_html_wrapper(shared.history['visible'], state['name1'], state['name2'], state['mode'])
     else:
         # Yield ' ...'
-        last_visible_user = shared.history['visible'][-1][0]
         yield chat_html_wrapper(shared.history['visible'][:-1] + [[shared.history['visible'][-1][0], shared.history['visible'][-1][1] + ' ...']], state['name1'], state['name2'], state['mode'])
         for history in chatbot_wrapper(shared.history['internal'][-1][0], state, _continue=True):
-            shared.history['visible'][-1] = [last_visible_user, history[-1][1]]
             yield chat_html_wrapper(shared.history['visible'], state['name1'], state['name2'], state['mode'])

Also made some minor changes and improvements. Thanks for submitting this PR, I would never have come up with the LLaVA adaptation on my own and the reworked extensions framework is a huge improvement to this project.

@oobabooga oobabooga merged commit 12212cf into oobabooga:main Apr 23, 2023
@oobabooga
Copy link
Owner

For reference, these are the commands to download and run the model:

python download-model.py wojtab/llava-13b-v0-4bit-128g
python3 server.py --model wojtab_llava-13b-v0-4bit-128g --chat  --extensions llava

VRAM usage peaked at 11106MiB for a single generation.

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

@oobabooga thanks for the review and merge. Now, this addition was necessary, for some reason continue replaces visible_text with internal text on the message from user, so now instead of it being <img src="data:image/jpeg;base64,{base64string}> visible text becomes the internal representation: <image:{base64string}>.
Now, thinking about it, it might've been a stupid idea to separate them, as I can parse both of them as easily, but if you stop the prompt, then click continue, the image will disappear for the user

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 23, 2023

@oobabooga actually, instead of reverting it, I'll open a separate PR where both of the representations are the same

@oobabooga
Copy link
Owner

I'll wait for your PR then. I might have used ['internal'] instead of ['visible'] somewhere.

@Wojtab
Copy link
Contributor Author

Wojtab commented Apr 24, 2023

#1507

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Apr 24, 2023

I get some errors sometimes, depending on what I choose for parameters.

LLaVA - Embedded 1 image(s) in 0.05s
Traceback (most recent call last):
  File "/home/mint/gptq2/text-generation-webui-testing/modules/callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/home/mint/gptq2/text-generation-webui-testing/modules/text_generation.py", line 257, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2537, in sample
    next_token_scores = logits_processor(input_ids, next_token_logits)
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 92, in __call__
    scores = processor(input_ids, scores)
  File "/home/mint/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 231, in __call__
    score = torch.gather(scores, 1, self.encoder_input_ids)
RuntimeError: gather(): Expected dtype int64 for index

With characters this would be really interesting if it could be wrangled and less of an "AALM"

bonkz

it works if I add "Assistant" to the stopping strings! Not perfectly of course but its a trip.. it can describe the items as the character or just output nonsense. Luck of the draw.

@jackylee1
Copy link

I'm able to use this with 8 GB VRAM (Geforce 3060 Ti) with the following arguments:

python server.py --model llava-13b-4bit-128g --wbits 4 --group 128 --chat --model_type=llama --extensions llava --pre_layer 29

Reduce "Max prompt size in tokens" to 500 or less, otherwise you'll get OOM errors after the first response.

After further testing I found it's best to use 0 Max Prompt Size, otherwise there's a chance it will respond to multiple images at once. Although the Max Prompt Size setting is 0, the model still gets around 360 tokens of context each time. I guess this must be hard coded.

Edit: The merged version runs out of memory on my GPU. I added this to settings.json as suggested by the author, and it's working again:

"llava-clip_device": "cpu",
"llava-projector_device": "cpu"

hi,can you share the setting.json file.i have same hardware

@yhyu13
Copy link
Contributor

yhyu13 commented May 2, 2023

@oobabooga @Wojtab

Dockerfile is not updated with LLaVA, we still get this error

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory

The solution is as https://discuss.pytorch.org/t/could-not-load-library-libcudnn-cnn-infer-so-8/175139/1 suggested
is to run apt install cuda-11-8

But cuda will prompt something like country code, which cannot be handled by "-y" by apt-get, so we need

ENV DEBIAN_FRONTEND=noninteractive

The final docker file changes looks like something below

# yhyu13 : add cmd non interactive due to cuda requires some prompt input
# https://stackoverflow.com/questions/63476497/docker-build-with-ubuntu-18-04-image-hangs-after-prompting-for-country-of-origin
ENV DEBIAN_FRONTEND=noninteractive

# yhyu13 : install additional packages 
# https://discuss.pytorch.org/t/could-not-load-library-libcudnn-cnn-infer-so-8/175139/2
RUN apt-get update && \
    apt-get install -y cuda-11-8 && \
    apt-get install -y libportaudio2 libasound-dev git python3 python3-pip make g++ 

The result :

2023-05-02_13-33

Then, I am able to run LLaVA in within the local docker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.