[Question]: How to use multi modal large model in local env base llama_index? #9483

llf10811020205 · 2023-12-13T09:33:24Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

How to use multi modal large model in local env base llama_index?

dosubot · 2023-12-13T09:33:39Z

🤖

Hello @llf10811020205!

Welcome to the LlamaIndex repository. I'm Dosu, your friendly bot assistant here to lend a hand while our human maintainers are away. I'm equipped to assist with bug resolution, answer questions, and help you find your way around the project. Just to keep things clear, I'm not a software developer or technical support rep - I'm a bot here to guide you.

Ready when you are to dive into your issue! Let's get started.

To utilize the multi-modal large model in a local environment using LlamaIndex, you can use the OpenAIMultiModal class provided in the LlamaIndex framework. Here's a basic example of how you can use it:

from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.llms.types import ChatMessage, MessageRole
from llama_index.schema import ImageDocument

# Initialize the OpenAIMultiModal class
multi_modal = OpenAIMultiModal(
    model="gpt-4-vision-preview",  # The Multi-Modal model to use from OpenAI
    temperature=0.5,  # The temperature to use for sampling
    max_new_tokens=300,  # The maximum numbers of tokens to generate
    image_detail="low",  # The level of details for image in API calls
    api_key="your_openai_api_key",  # The OpenAI API key
    api_base="https://api.openai.com",  # The base URL for OpenAI API
)

# Define the prompt and image documents
prompt = "Describe the image"
image_documents = [ImageDocument(url="image_url")]  # Replace "image_url" with the actual URL of the image

# Use the complete method to get the model's response
response = multi_modal.complete(prompt, image_documents)

# Print the response text
print(response.text)

In this example, we're using the complete method which takes a text prompt and a list of ImageDocument objects (representing the images to be processed) and returns a CompletionResponse object. The text attribute of the CompletionResponse object contains the model's response.

Please replace "your_openai_api_key" with your actual OpenAI API key and "image_url" with the actual URL of the image you want to process. You can refer to the source code of the OpenAIMultiModal class in the LlamaIndex repository for more details.

Before you start, make sure you have the prerequisites for setting up a local environment for LlamaIndex. You need Python version 3.8.1 or higher but less than 4.0, and several Python packages. You can find the full list of prerequisites in the pyproject.toml file in the LlamaIndex repository.

If you encounter any issues or need further assistance, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

hatianzhang · 2023-12-13T15:18:29Z

you can use llama cpp to run some local models using llama.cpp
For llava: https://github.com/abetlen/llama-cpp-python/blob/8e44a32075de4aba2fc9877d4a2a34a0e7314c0d/README.md?plain=1#L248

feel free to make a pr to add llama.cpp for supporting Llava model.
You can follow https://github.com/run-llama/llama_index/blob/main/llama_index/llms/llama_cpp.py

dosubot · 2024-03-13T16:02:54Z

Hi, @llf10811020205,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, you were seeking guidance on utilizing a multi-modal large model in a local environment based on the llama_index. Dosu-bot provided a detailed response on how to use the OpenAIMultiModal class in the LlamaIndex framework to achieve this, including a code example and references to relevant files in the repository. Additionally, hatianzhang suggested using llama cpp to run local models and provided links for reference.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!

llf10811020205 added the question Further information is requested label Dec 13, 2023

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 13, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 20, 2024

dosubot bot mentioned this issue May 20, 2024

[Question]: Run multimodal LLM locally #13606

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: How to use multi modal large model in local env base llama_index? #9483

[Question]: How to use multi modal large model in local env base llama_index? #9483

llf10811020205 commented Dec 13, 2023

dosubot bot commented Dec 13, 2023 •

edited

Loading

hatianzhang commented Dec 13, 2023

dosubot bot commented Mar 13, 2024

[Question]: How to use multi modal large model in local env base llama_index? #9483

[Question]: How to use multi modal large model in local env base llama_index? #9483

Comments

llf10811020205 commented Dec 13, 2023

Question Validation

Question

dosubot bot commented Dec 13, 2023 • edited Loading

Sources

hatianzhang commented Dec 13, 2023

dosubot bot commented Mar 13, 2024

dosubot bot commented Dec 13, 2023 •

edited

Loading