Skip to content
This repository was archived by the owner on Aug 1, 2025. It is now read-only.

yactouat/yollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yollama

what is this ?

This repo holds the code for helper functions for Ollama with Langchain.

prerequisites

This has been tested on a Ubuntu 24 server. You'll need:

  • Ollama installed
  • the Ollama models you'll need to install are:
default_llm_model_name = "ajindal/llama3.1-storm:8b"
default_long_context_llm_model_name = "mistral-nemo"
default_reasoning_llm_model_name = "gemma2:9b"
default_sql_llm_model_name = "qwen2.5-coder"
default_tools_llm_model_name = "qwen2.5"

The num_ctx parameter is the size of the context window used to generate the next token, here we use values for a 20GB VRAM GPU.

  • Python 3 with venv and pip installed

basic usage

pip install yollama

get_llm()

1st parameter is the use case (Literal["default", "long-context", "reasoning", "sql", "tools"]), which determines the model and context size.

2nd parameter is a boolean to determine if the LLM should return a JSON object or a string.

from yollama import get_llm

llm = get_llm()

# invoke default model (ajindal/llama3.1-storm:8b) in JSON mode
print(llm.invoke({"input_query": some_user_input}))
# invoke Mistral-Nemo model in text mode with some lengthy context
print(get_llm("long-context", False).invoke({"document": some_big_text}))
# invoke Gemma2 model in text mode, to get answers requiring more reasoning abilities
print(get_llm("reasoning", False).invoke({"question": """Owen left a tray of lemon cakes unattended in the staff room for an hour and one of the cakes went missing. Three people are suspects, and here are the facts:
Person A is allergic to citrus but was seen in the area
Person B loves lemon cake but wasn't near the location
Person C has stolen food before and similar incidents increased after they arrived
Who is most likely responsible and why?"""}))
# invoke Qwen2.5-Coder model in text mode, to get a raw SQL query
print(get_llm("sql", False).invoke({"query": "create a SQL query to get the latest headlines"}))

philosophy

fully local models

I've decided to let go of OpenAI to:

  • preserve my privacy
  • save money

As local models are getting better and better, I'm confident that I'll be able to have a greater experience in the future, even with my small 20GB VRAM GPU. 🤔 Of course, it's not fast, but hey everything comes with tradeoffs right? 💪

contributions guidelines

🤝 I haven't made up my mind on contribution guidelines yet. I guess we'll update them as you contribute!

About

code for helper functions for Ollama with Langchain

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages