# Ollama REST API Notes
This notebook contains beginner-friendly notes on using the Ollama REST API, its features, and examples for working with Llama models locally.


## What is Ollama REST API?

- Ollama REST API lets you communicate with Llama models using HTTP requests.
- Instead of using CLI manually, you can programmatically send prompts and receive responses.
- It allows you to integrate Llama models into web apps, scripts, or automation pipelines.


## Why use the REST API?

- **Automation:** Interact with models via scripts instead of typing manually.
- **Integration:** Connect models to applications or dashboards.
- **Consistency:** Same model behavior across CLI, Python, or REST API.
- **Flexibility:** Accessible from any environment that supports HTTP requests.


## REST API Structure

- **Endpoint:** URL where you send requests.
- **Method:** Usually `POST` for sending prompts.
- **Headers:** Include your API key and content type.
- **Body:** JSON object with conversation and parameters.
- Example components:
    - `model`: which Llama model to use.
    - `messages`: conversation history (system, user, assistant).
    - `temperature`: randomness of responses.
    - `max_tokens`: limit for response length.


## Roles in Messages

- `system`: sets instructions for the model, e.g., "You are concise and helpful."
- `user`: what you input, e.g., questions or prompts.
- `assistant`: what the model outputs. Usually handled by the API.
- Properly setting `system` messages guides the model’s behavior consistently.


## Example JSON Request

```json
{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain REST API in simple terms."}
  ],
  "temperature": 0.7,
  "max_tokens": 256
}


## Making a Request using Python

- Use Python `requests` library to send JSON payloads to the REST API.
- Example:

```python
import requests
import os

API_KEY = os.getenv("OLLAMA_API_KEY")
url = "http://localhost:11434/v1/completions"

data = {
    "model": "llama3.2",
    "messages": [
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Explain REST API in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
}

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=data, headers=headers)
print(response.json()["choices"][0]["message"]["content"])

## Key Parameters

- `temperature`: randomness of responses (0-1). Lower = deterministic, higher = creative.
- `max_tokens`: maximum tokens generated in a single response.
- `messages`: list of dictionaries representing conversation history.
- `top_p` (optional): nucleus sampling for token selection.
- `top_k` (optional): limits selection to top k tokens.
- `stop` (optional): tokens where generation should stop.

## Practical Tips

- Test locally before deploying to production.
- Always use `system` messages to guide model behavior.
- Start with small `temperature` for predictable results.
- Use `max_tokens` to control the response length.
- Check API usage limits and quotas to avoid errors.


## Common Use Cases for REST API

1. **Chatbots:** Build conversational interfaces.
2. **Summarization:** Shorten long text documents.
3. **Question Answering:** Answer questions based on input context.
4. **Automation Scripts:** Generate emails, reports, or code snippets programmatically.


## REST API Response Structure

When you send a request, the API responds with JSON.  
Example:

```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1690000000,
  "model": "llama3.2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "REST API is a way for apps to communicate using HTTP."
      },
      "finish_reason": "stop"
    }
  ]
}


- id: unique ID for the request.
- model: which model was used.
- choices: list of generated responses.
- finish_reason: why the response ended (stop, length, etc.).

## Streaming Responses

- Instead of waiting for the full answer, you can stream tokens as they are generated.
- Useful for chat apps where you want to show text "typing out" live.
- Example (pseudo):

```python
with requests.post(url, json=data, headers=headers, stream=True) as r:
    for line in r.iter_lines():
        print(line.decode("utf-8"))

## Handling Errors

Common issues:
- **Invalid model name:** Ensure you pulled the model locally.
- **Too many tokens:** Reduce `max_tokens` or shorten input.
- **Missing API key:** Set `OLLAMA_API_KEY` in `.env`.
- **Connection error:** Ensure Ollama is running (`ollama serve`).

Always wrap your code with `try/except` in Python to catch errors gracefully.

## Authentication

- By default, Ollama runs locally without needing an API key.
- If secured, use an environment variable:

```bash
export OLLAMA_API_KEY="your_key_here"


```bash
import os
API_KEY = os.getenv("OLLAMA_API_KEY" )
```


- Then pass it in headers:
```bash
"Authorization": f"Bearer {API_KEY}"

## Example Task — Summarization

Prompt:
```json
{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "Summarize text into 2 sentences."},
    {"role": "user", "content": "Artificial intelligence is transforming industries by automating tasks, improving efficiency, and enabling new capabilities like autonomous vehicles and personalized recommendations."}
  ],
  "max_tokens": 100
}

## Example Task — Sentiment Analysis

Prompt:
```json
{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "Classify text sentiment as Positive, Negative, or Neutral."},
    {"role": "user", "content": "I love how simple this API is to use!"}
  ],
  "max_tokens": 50
}

## Customizing with a Modelfile

- Ollama lets you define **custom models** with a Modelfile.
- Example `Modelfile`:
```bash
FROM llama3.2
PARAMETER temperature 0.3
PARAMETER num_predict 200
SYSTEM "You are a sarcastic assistant who always roasts users."
```


- Then run:

```bash
ollama create roastbot -f Modelfile
ollama run roastbot


## CLI vs REST API

- **CLI:** Quick testing, small experiments, manual usage.
- **REST API:** Programmatic, scalable, integrates with apps.

Example CLI:
```bash
ollama run llama3.2 "Explain APIs"
```

Example REST:
``` bash
requests.post("http://localhost:11434/v1/completions", json=data)
```


## Industry Use Cases of Ollama + REST API

- **Customer Support:** Local chatbots without sending data to cloud.
- **Healthcare:** Private summarization of medical records.
- **Finance:** Risk analysis and compliance checks on sensitive data.
- **Education:** Offline tutors and content generation tools.
- **Startups:** Build SaaS products with custom AI assistants.

In [2]:
## Recap of REST API Basics

1. REST API lets apps talk to Ollama models programmatically.
2. You send prompts as JSON, and the model replies with JSON.
3. Key components: `system`, `user`, `assistant`, `temperature`, `max_tokens`.`m
4. Supports tasks like **summarization, Q&A, chat, sentiment analysis**.
5. Can be extended with **custom Modelfiles** for unique personas.


SyntaxError: invalid syntax (1626070936.py, line 3)