Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How to use multi modal large model in local env base llama_index? #9483

Closed
1 task done
llf10811020205 opened this issue Dec 13, 2023 · 3 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@llf10811020205
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

How to use multi modal large model in local env base llama_index?

@llf10811020205 llf10811020205 added the question Further information is requested label Dec 13, 2023
Copy link

dosubot bot commented Dec 13, 2023

🤖

Hello @llf10811020205!

Welcome to the LlamaIndex repository. I'm Dosu, your friendly bot assistant here to lend a hand while our human maintainers are away. I'm equipped to assist with bug resolution, answer questions, and help you find your way around the project. Just to keep things clear, I'm not a software developer or technical support rep - I'm a bot here to guide you.

Ready when you are to dive into your issue! Let's get started.

To utilize the multi-modal large model in a local environment using LlamaIndex, you can use the OpenAIMultiModal class provided in the LlamaIndex framework. Here's a basic example of how you can use it:

from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.llms.types import ChatMessage, MessageRole
from llama_index.schema import ImageDocument

# Initialize the OpenAIMultiModal class
multi_modal = OpenAIMultiModal(
    model="gpt-4-vision-preview",  # The Multi-Modal model to use from OpenAI
    temperature=0.5,  # The temperature to use for sampling
    max_new_tokens=300,  # The maximum numbers of tokens to generate
    image_detail="low",  # The level of details for image in API calls
    api_key="your_openai_api_key",  # The OpenAI API key
    api_base="https://api.openai.com",  # The base URL for OpenAI API
)

# Define the prompt and image documents
prompt = "Describe the image"
image_documents = [ImageDocument(url="image_url")]  # Replace "image_url" with the actual URL of the image

# Use the complete method to get the model's response
response = multi_modal.complete(prompt, image_documents)

# Print the response text
print(response.text)

In this example, we're using the complete method which takes a text prompt and a list of ImageDocument objects (representing the images to be processed) and returns a CompletionResponse object. The text attribute of the CompletionResponse object contains the model's response.

Please replace "your_openai_api_key" with your actual OpenAI API key and "image_url" with the actual URL of the image you want to process. You can refer to the source code of the OpenAIMultiModal class in the LlamaIndex repository for more details.

Before you start, make sure you have the prerequisites for setting up a local environment for LlamaIndex. You need Python version 3.8.1 or higher but less than 4.0, and several Python packages. You can find the full list of prerequisites in the pyproject.toml file in the LlamaIndex repository.

If you encounter any issues or need further assistance, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@hatianzhang
Copy link
Contributor

you can use llama cpp to run some local models using llama.cpp
For llava: https://github.com/abetlen/llama-cpp-python/blob/8e44a32075de4aba2fc9877d4a2a34a0e7314c0d/README.md?plain=1#L248

feel free to make a pr to add llama.cpp for supporting Llava model.
You can follow https://github.com/run-llama/llama_index/blob/main/llama_index/llms/llama_cpp.py

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 13, 2024
Copy link

dosubot bot commented Mar 13, 2024

Hi, @llf10811020205,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, you were seeking guidance on utilizing a multi-modal large model in a local environment based on the llama_index. Dosu-bot provided a detailed response on how to use the OpenAIMultiModal class in the LlamaIndex framework to achieve this, including a code example and references to relevant files in the repository. Additionally, hatianzhang suggested using llama cpp to run local models and provided links for reference.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!

@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants