This repo holds the code for helper functions for Ollama with Langchain.
This has been tested on a Ubuntu 24 server. You'll need:
- Ollama installed
- the Ollama models you'll need to install are:
default_llm_model_name = "ajindal/llama3.1-storm:8b"
default_long_context_llm_model_name = "mistral-nemo"
default_reasoning_llm_model_name = "gemma2:9b"
default_sql_llm_model_name = "qwen2.5-coder"
default_tools_llm_model_name = "qwen2.5"The num_ctx parameter is the size of the context window used to generate the next token, here we use values for a 20GB VRAM GPU.
- Python 3 with
venvandpipinstalled
pip install yollama
1st parameter is the use case (Literal["default", "long-context", "reasoning", "sql", "tools"]), which determines the model and context size.
2nd parameter is a boolean to determine if the LLM should return a JSON object or a string.
from yollama import get_llm
llm = get_llm()
# invoke default model (ajindal/llama3.1-storm:8b) in JSON mode
print(llm.invoke({"input_query": some_user_input}))# invoke Mistral-Nemo model in text mode with some lengthy context
print(get_llm("long-context", False).invoke({"document": some_big_text}))# invoke Gemma2 model in text mode, to get answers requiring more reasoning abilities
print(get_llm("reasoning", False).invoke({"question": """Owen left a tray of lemon cakes unattended in the staff room for an hour and one of the cakes went missing. Three people are suspects, and here are the facts:
Person A is allergic to citrus but was seen in the area
Person B loves lemon cake but wasn't near the location
Person C has stolen food before and similar incidents increased after they arrived
Who is most likely responsible and why?"""}))# invoke Qwen2.5-Coder model in text mode, to get a raw SQL query
print(get_llm("sql", False).invoke({"query": "create a SQL query to get the latest headlines"}))I've decided to let go of OpenAI to:
- preserve my privacy
- save money
As local models are getting better and better, I'm confident that I'll be able to have a greater experience in the future, even with my small 20GB VRAM GPU. 🤔 Of course, it's not fast, but hey everything comes with tradeoffs right? 💪
🤝 I haven't made up my mind on contribution guidelines yet. I guess we'll update them as you contribute!