Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How to use Own Model with Chroma Vector DB #8745

Closed
1 task done
taynotfound opened this issue Nov 7, 2023 · 3 comments
Closed
1 task done

[Question]: How to use Own Model with Chroma Vector DB #8745

taynotfound opened this issue Nov 7, 2023 · 3 comments
Labels
question Further information is requested

Comments

@taynotfound
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Basically what the Title Says.
On MacBook Pro 2019 i9, 8th gen. 32GB RAM, Vega Pro 20

Here is my Code:

from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt

model_url = "https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q5_K_M.gguf"
llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path="/home/tay/AGVMastril/em_german_13b_v01-GGUF.gguf",
    temperature=0.1,
    max_new_tokens=4048,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    # model_kwargs={"n_gpu_layers": 1},
    # transform inputs into Llama2 format
    # messages_to_prompt=messages_to_prompt,
    # completion_to_prompt=completion_to_prompt,
    verbose=True,
    
)
response_iter = llm.stream_complete(
    "Schreib mir eine kleine Geschichte über eine Katze."
)
for response in response_iter:
    print(response.delta, end="", flush=True)
print(response.text)

@taynotfound taynotfound added the question Further information is requested label Nov 7, 2023
Copy link

dosubot bot commented Nov 7, 2023

🤖

Hello @taynotfound,

I'm Dosu, a bot designed to assist you with LlamaIndex. I can help you debug issues, answer any questions you might have, and guide you on how to contribute to the repository. While we're waiting for a human maintainer to join us, I'm here to assist you.

I'm currently reviewing your issue and will get back to you with a response soon.

Copy link

dosubot bot commented Nov 7, 2023

🤖

Based on the code you've provided, it seems like you're trying to use a custom model with the LlamaCPP class in the LlamaIndex library. The LlamaCPP class is designed to interact with custom models, specifically those that are compatible with the llama_cpp library.

In your code, you've specified the model_path parameter in the LlamaCPP constructor. This should be the path to your custom model. If you want to use a model from Hugging Face, you can use the model_url parameter instead.

Here's a brief explanation of some of the parameters you're using:

  • temperature: This controls the randomness of the output. A higher value will make the output more random, while a lower value will make it more deterministic.
  • max_new_tokens: This is the maximum number of tokens to generate.
  • context_window: This is the maximum number of context tokens for the model. It's set lower than the model's actual context window to allow for some wiggle room.
  • generate_kwargs: These are keyword arguments used for generation. You can pass any additional arguments for the model's generate method here.
  • verbose: This controls whether to print verbose output.

The stream_complete method is used to generate completions for a given prompt. In your case, the prompt is "Schreib mir eine kleine Geschichte über eine Katze." The method returns an iterator of CompletionResponse objects, which you can print to see the generated text.

Regarding the use of your own model with Chroma Vector DB, you can refer to the code provided in the context. It shows how to use the ChromaReader class to load data from a persisted Chroma collection and query it using a SummaryIndex.

If you're still having trouble, could you please provide more details about the issue you're facing? For example, are you getting any error messages? What is the expected output and how does it differ from the actual output? This information will help me provide a more accurate solution.

Sources:

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@logan-markewich
Copy link
Collaborator

@taynotfound you just have to put the LLM into the service context and off you go. You probably also want to use a local embedding model too.

service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-base-en-v1.5")

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants