## Preparations

### Imports

In [1]:
import os

from dotenv import load_dotenv
from huggingface_hub import login
from openai import OpenAI
from transformers import AutoTokenizer

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


### Preparing OpenAI client

In [2]:
client = OpenAI(
    api_key='ollama',
    base_url='http://localhost:11434/v1/',
)

In [3]:
def llm(prompt: str, temperature: float = 1.0) -> str:
    """
    Generates a response from the 'gemma:2b' model based on the provided prompt.

    This method sends a user message to the 'gemma:2b' model via a chat completion request,
    specifying the desired level of randomness through the `temperature` parameter,
    and returns the content of the first choice made by the model.

    Parameters
    ----------
    prompt : str
        The user's message or prompt to which the model will respond.
    temperature : float, optional
        Controls the randomness of the model's output, where lower values make the output more deterministic.
        Default is 1.0.

    Returns
    -------
    str
        The content of the model's response as a string.
    """
    response = client.chat.completions.create(
        model='gemma:2b',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=temperature,
    )
    
    return response.choices[0].message.content

### Preparing Huggingface environment

In [4]:
load_dotenv('../.env')
login(token=os.getenv('HF_TOKEN'))

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/nikolai/.cache/huggingface/token
Login successful


## [Homework questions](https://github.com/DataTalksClub/llm-zoomcamp/blob/main/cohorts/2024/02-open-source/homework.md)

### [Q1. Running Ollama with Docker](https://github.com/DataTalksClub/llm-zoomcamp/blob/main/cohorts/2024/02-open-source/homework.md#q1-running-ollama-with-docker)

Ollama version: 0.1.48

### [Q2. Downloading an LLM](https://github.com/DataTalksClub/llm-zoomcamp/blob/main/cohorts/2024/02-open-source/homework.md#q2-downloading-an-llm)

### [Q3. Running the LLM](https://github.com/DataTalksClub/llm-zoomcamp/blob/main/cohorts/2024/02-open-source/homework.md#q3-running-the-llm)

In [5]:
query = '10*10'

In [6]:
print(llm(query))

Sure. Here's a rewritten response that adheres to the given guidelines:




### [Q4. Donwloading the weights](https://github.com/DataTalksClub/llm-zoomcamp/blob/main/cohorts/2024/02-open-source/homework.md#q4-donwloading-the-weights)

Q4 answer: 1.6 G (best closest is 1.7 G)

### [Q5. Adding the weights](https://github.com/DataTalksClub/llm-zoomcamp/blob/main/cohorts/2024/02-open-source/homework.md#q5-adding-the-weights)

Q5 answer: ollama_files /root/.ollama

### [Q6. Serving it](https://github.com/DataTalksClub/llm-zoomcamp/blob/main/cohorts/2024/02-open-source/homework.md#q6-serving-it)

In [7]:
prompt = 'What\'s the formula for energy?'

In [8]:
result = llm(prompt, temperature=0.0)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
tokens = tokenizer.encode(result)

print(f'Q6 answer: {len(tokens)} completion tokens recieved.')

Q6 answer: 281 completion tokens recieved.
