# Client-server example

In this script, we test the ollama framework instead of the raw HF approach. This is a WS option.

## Client-server using JSON API

First, we generate the program that implements a specific problem using a client-server architecture. The client sends a request to the server, and the server returns the result. The communication is done using JSON.

In [None]:
import requests
import json
import time

url = 'http://deeperthought.cse.chalmers.se:5050/api/generate'
data = {
    "model": "llama3.1:70b",
    "prompt": "Write a program to calculate Fibonacci numbers in Python.",
    "stream": False
}
headers = {'Content-Type': 'application/json'}

start_time = time.time()  # Start the timer

response = requests.post(url, data=json.dumps(data), headers=headers)

end_time = time.time()  # End the timer

json_data = json.loads(response.text)

strProgram = json_data['response']

print(strProgram)
print(f"Time taken for the request: {end_time - start_time} seconds")

## Using a local model

Now, we use a local model to do the same thing. Please note that the model is much smaller because of the local hardware limitations. 

This code uses the Phi-3-mini model, which is a smaller version of the Phi-3 model. 

In [2]:
# download the Phi-3-mini model from HF 
# and use the instruct pipeline 

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import time

modelP = "microsoft/Phi-3-mini-128k-instruct"

modelInstr = AutoModelForCausalLM.from_pretrained(
    modelP, 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
    attn_implementation='eager',
)

tokenizerInstr = AutoTokenizer.from_pretrained(modelP)

messages = [
        {"role": "user", "content": "Write a program to calculate Fibonacci numbers in Python."},
]

pipe = pipeline(
    "text-generation",
    model=modelP,
    tokenizer=tokenizerInstr,
    trust_remote_code=True,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
}

start_time = time.time()  # Start the timer

output = pipe(messages, **generation_args)

end_time = time.time()  # End the timer

print(output)
print(f"Time taken for the request: {end_time - start_time} seconds")

`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

You are not running the flash-attention implementation, expect numerical differences.


[{'generated_text': ' Certainly! Below is a Python program that calculates Fibonacci numbers. This program includes both an iterative approach and a recursive approach for calculating Fibonacci numbers. The iterative approach is more efficient in terms of time complexity, especially for large numbers.\n\n### Iterative Approach\n\n```python\ndef fibonacci_iterative(n):\n    """\n    Calculate the nth Fibonacci number using an iterative approach.\n    \n    :param n: Index of the Fibonacci sequence to calculate\n    :return: The nth Fibonacci number\n    """\n    if n <= 0:\n        return "Input must be a positive integer"\n    elif n == 1:\n        return 0\n    elif n == 2:\n        return 1\n    \n    a, b = 0, 1\n    for _ in range(2, n):\n        a, b = b, a + b\n    return b\n\n# Example usage\nn = 10\nprint(f"The {n}th Fibonacci number (iterative): {fibonacci_iterative(n)}")\n```\n\n### Recursive Approach\n\n```python\ndef fibonacci_recursive(n):\n    """\n    Calculate the nth F