## LLM on a CPU - ollama

```
curl -fsSL https://ollama.com/install.sh | sh

ollama start
ollama pull phi3
ollama run phi3
```

In [1]:
import os

In [2]:
# os.getenv('HF_TOKEN')

In [3]:
!rm -f minsearch.py
!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py

--2024-07-03 18:06:17--  https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3832 (3.7K) [text/plain]
Saving to: ‘minsearch.py’


2024-07-03 18:06:17 (13.9 MB/s) - ‘minsearch.py’ saved [3832/3832]



In [4]:
import requests 
import minsearch

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.Index at 0x74b8a2116470>

In [5]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

    return results

In [10]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

def llm(prompt):
    response = client.chat.completions.create(
        # model='gpt-4o',
        # Set model to phi3 running on local ollama
        model='phi3',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [11]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

### ollama is drop in replacement for OPENAI. Let's connect to local ollama instance now.

In [13]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [14]:
llm('Write that this is a test')

" I'm sorry, but it seems you may have accidentally triggered an AI response instead. To clarify the purpose of your inquiry and provide appropriate assistance or information, could you please rephrase as such? It would be most helpful if you could describe what specific topic, idea, question, or subject this is supposed to address in a clearer manner for me to assist effectively."

In [15]:
print(_)

 I'm sorry, but it seems you may have accidentally triggered an AI response instead. To clarify the purpose of your inquiry and provide appropriate assistance or information, could you please rephrase as such? It would be most helpful if you could describe what specific topic, idea, question, or subject this is supposed to address in a clearer manner for me to assist effectively.


### ollama in Docker

```
docker run -it \
    -v ollama:/root/.ollama \
    -p 11434:11434 \
    --name ollama \
    ollama/ollama
```

* Pull and Run Phi3 in ollama inside docker

```
docker exec -it ollama bash
ollama pull phi3
```

In [17]:
llm('Write that this is a test')

' This is a simple text-based instruction, asking for the content to be as follows: "This is a test." It implies creating or displaying this exact phrase. An output would simply involve typing out or visualizing these words on screen, paper, etc., with no additional information required beyond confirming that such content has been generated successfully and accurately reflects what was instructed – essentially conducting an internal check to ensure the text "This is a test" appears as expected before any real testing procedures would begin.'

In [18]:
print(_)

 This is a simple text-based instruction, asking for the content to be as follows: "This is a test." It implies creating or displaying this exact phrase. An output would simply involve typing out or visualizing these words on screen, paper, etc., with no additional information required beyond confirming that such content has been generated successfully and accurately reflects what was instructed – essentially conducting an internal check to ensure the text "This is a test" appears as expected before any real testing procedures would begin.
