<a href="https://colab.research.google.com/github/gitmystuff/AgenticAI/blob/main/06%20-%20Exploring_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring LLMs

## Ollama

**Ollama** is an open-source tool that allows you to easily **download, run, and manage large language models (LLMs) locally on your own computer** (macOS, Windows, or Linux).

Think of it as a simple command-line interface (and API) that acts like **Docker for LLMs**. It handles all the complex setup, including model quantization (making models smaller and faster), so you can quickly use popular open models like Llama, Mistral, and Gemma privately and without relying on cloud services.

* https://ollama.com/


## LMStudio

**LM Studio** is a user-friendly, all-in-one desktop application (with a Graphical User Interface, or GUI) for running Large Language Models (LLMs) locally on your computer (Windows, macOS, or Linux).

It simplifies the entire process by allowing you to:
1.  **Discover and download** pre-quantized, open-source models (like Llama, Mistral, and Gemma) directly from Hugging Face.
2.  **Chat and experiment** with these models in a dedicated, private chat interface.
3.  **Serve a local API** that is compatible with the OpenAI API format, making it easy to integrate the local LLMs into agentic frameworks like LangChain and CrewAI.

It is particularly popular for its **ease of use** and strong focus on **data privacy**, as all processing happens entirely on your machine.

* https://lmstudio.ai/

In [None]:
import os
import json
import requests
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

## LLM Setup

In [None]:
load_dotenv(override=True)

def is_service_running(url):
    """
    Checks if a service is running by attempting to connect to its URL.
    """
    try:
        response = requests.get(url, timeout=5)
        # Ollama and LM Studio return "Ollama is running" or similar on their base URL
        # A 200 status code indicates the server is up.
        if response.status_code == 200:
            return True
    except requests.exceptions.ConnectionError:
        return False
    except requests.exceptions.Timeout:
        return False
    return False

# Check for Ollama
ollama_url = 'http://localhost:11434'
if is_service_running(ollama_url):
    print("Ollama is running")
else:
    print("Ollama is not running")

# Check for LM Studio
lmstudio_url = 'http://localhost:1234'
if is_service_running(lmstudio_url):
    print("LM Studio is running")
else:
    print("LM Studio is not running")

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
hf_token = os.getenv('HF_TOKEN')

if openai_api_key:
    print(f"OpenAI API Key exists")
else:
    print("OpenAI API Key not set")

if anthropic_api_key:
    print(f"Anthropic API Key exists")
else:
    print("Anthropic API Key not set")

if google_api_key:
    print(f"Google API Key exists")
else:
    print("Google API Key not set")

if deepseek_api_key:
    print(f"DeepSeek API Key exists")
else:
    print("DeepSeek API Key not set")

if groq_api_key:
    print(f"Groq API Key exists")
else:
    print("Groq API Key not set")

if hf_token:
    print(f"Hugging Face Token exists")
else:
    print("Hugging Face Token not set")


Ollama is running
LM Studio is running
OpenAI API Key exists
Anthropic API Key exists
Google API Key exists
DeepSeek API Key not set
Groq API Key exists
Hugging Face Token exists


## Evaluating LLMs

In [None]:
# make a request
llms = []
responses = []

request = "You are an AI tasked with writing a single, one-sentence instruction for a human to prevent a paradox in a time-travel scenario. What is that instruction? "
messages = [{"role": "user", "content": request}]
messages

[{'role': 'user',
  'content': 'You are an AI tasked with writing a single, one-sentence instruction for a human to prevent a paradox in a time-travel scenario. What is that instruction? '}]

In [None]:
# openai
openai = OpenAI() # assumes OPENAI_API_KEY is set in environment
model = "gpt-4o-mini"

response = openai.chat.completions.create(model=model, messages=messages)
result = response.choices[0].message.content

display(Markdown(result))
llms.append(model)
responses.append(result)

Do not engage with your past self under any circumstances, as any interaction could lead to unforeseen consequences that may alter your own existence.

In [None]:
# claude
model = "claude-3-7-sonnet-latest"

claude = Anthropic()
response = claude.messages.create(model=model, messages=messages, max_tokens=1000)
result = response.content[0].text

display(Markdown(result))
llms.append(model)
responses.append(result)

Avoid interacting with your past self or altering events you know have already occurred, as this could create a temporal paradox.

In [None]:
# gemini, may need enable Generative AI API in Google Cloud Console
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model, messages=messages)
result = response.choices[0].message.content

display(Markdown(result))
llms.append(model)
responses.append(result)

Do not interact in any way that would prevent your own birth or your decision to time travel.


In [None]:
# groq
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model, messages=messages)
result = response.choices[0].message.content

display(Markdown(result))
llms.append(model)
responses.append(result)

If you are about to interact with your past self or any event that has already occurred, do not give or receive any information that would cause your past self or anyone else to take an action that you have already witnessed or know to be a part of established events.

In [None]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model = "llama3.2"

response = ollama.chat.completions.create(model=model, messages=messages)
result = response.choices[0].message.content

display(Markdown(result))
llms.append(model)
responses.append(result)

To avoid a paradox arising from interacting with your earlier time-self while on a journey through the timestream, refrain from altering any events or taking actions that could have been the immediate cause of initiating this same trip.

In [None]:
mistral = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
model = "mistral_instruct_7b"

response = mistral.chat.completions.create(model=model, messages=messages)
result = response.choices[0].message.content

display(Markdown(result))
llms.append(model)
responses.append(result)

 Do not create an event in the past that would alter the circumstances leading to your presence in the current moment in time.

In [None]:
text_prep = ""

for index, response in enumerate(responses):
    text_prep += f"# Response from llm {index+1}\n\n"
    text_prep += response + "\n\n"

## Evaluation

In [None]:
evaluator = f"""You are evaluating responses from {len(llms)} LLMs.
Each model has been given this question:

{request}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best llm number", "second best llm number", "third best llm number", ...]}}

Here are the responses from each llm:

{text_prep}

Now, please respond with the ranked order of the llms using JSON, nothing else. Do not include markdown formatting or code blocks."""

print(evaluator)

You are evaluating responses from 6 LLMs.
Each model has been given this question:

You are an AI tasked with writing a single, one-sentence instruction for a human to prevent a paradox in a time-travel scenario. What is that instruction? 

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best llm number", "second best llm number", "third best llm number", ...]}

Here are the responses from each llm:

# Response from llm 1

Do not engage with your past self under any circumstances, as any interaction could lead to unforeseen consequences that may alter your own existence.

# Response from llm 2

Avoid interacting with your past self or altering events you know have already occurred, as this could create a temporal paradox.

# Response from llm 3

Do not interact in any way that would prevent your own birth or your decision to time travel.


# Re

In [None]:
evaluator_messages = [{"role": "user", "content": evaluator}]

openai = OpenAI()
response = openai.chat.completions.create(
    model="o3-mini",
    messages=evaluator_messages,
)
results = response.choices[0].message.content
print(results)

{"results": ["1", "2", "6", "4", "5", "3"]}


In [None]:
results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    llm = llms[int(result)-1]
    print(f"Rank {index+1}: {llm}")

Rank 1: gpt-4o-mini
Rank 2: claude-3-7-sonnet-latest
Rank 3: mistral_instruct_7b
Rank 4: llama-3.3-70b-versatile
Rank 5: llama3.2
Rank 6: gemini-2.0-flash
