# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [12]:
# imports
import os
import ollama
from openai import OpenAI
from IPython.display import display, Markdown, update_display
from dotenv import load_dotenv

In [16]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

def set_prompt(question):
    return [
        {
            'role':'system',
            'content':'You are computer science and programming tutor,\
            your job is give explanations or solution along with easy understandable examples,\
            give response as markdown format especially proper markdown to highlight code'
        },
        {
            'role':'user',
            'content':question
        }
    ]

In [6]:
# set up environment
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key.startswith('sk-proj-') and len(api_key)>10:
    print('Api key is Okay')
else:
    print('something wrong with apikey')
openai = OpenAI()

def printer(data):
    display(Markdown(data))

Api key is Okay


In [20]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [17]:
# Get gpt-4o-mini to answer, with streaming
def stream_GPT(question):
    stream = openai.chat.completions.create(
        model=MODEL_GPT,
        messages=set_prompt(question),
        stream=True
    )

    response = ''
    display_handle = display(Markdown(''), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace('```', '').replace('markdown', '')
        update_display(Markdown(response),display_id=display_handle.display_id)
stream_GPT(question)

Certainly! Let's break down the code you've provided to understand its purpose and functionality.

### Code Explanation

The line of code in question is:

python
yield from {book.get("author") for book in books if book.get("author")}


#### Components Breakdown:

1. **Set Comprehension**: 
   The expression inside the curly braces `{}` is a **set comprehension**. A set comprehension creates a set, which is a collection type that holds only unique items. The expression generates a set of authors from the `books` collection:
   
   - **`book.get("author")`**: This retrieves the value associated with the key `"author"` from each `book` dictionary in the `books` list. If the key does not exist, `get` will return `None`.
   
2. **Looping through `books`**:
   The `for book in books` part goes through each `book` in the `books` iterable (which is assumed to be a list of dictionaries).

3. **Conditional Filtering**:
   The `if book.get("author")` portion ensures that only books with an existing author (non-`None` or non-empty strings) are included in the set. This means you won't get any empty or `None` authors in your final set.

4. **Yielding Values**:
   Lastly, `yield from` is a Python statement that is typically used within a generator function. It allows the generator to yield all values from another iterable. In this case, it will yield each unique author from the set that was generated.

### Full Summary

1. This line creates a set of unique authors from a list of books (ignoring duplicates and any books that do not have an author).
2. It then yields those authors one by one.

### Example

Here's a complete example to illustrate how it works:

python
def get_unique_authors(books):
    yield from {book.get("author") for book in books if book.get("author")}

# Example books list
books = [
    {"title": "Book One", "author": "Author A"},
    {"title": "Book Two", "author": "Author B"},
    {"title": "Book Three", "author": "Author A"},  # Duplicate author
    {"title": "Book Four"},  # No author
    {"title": "Book Five", "author": "Author C"},
]

# Using the generator function
for author in get_unique_authors(books):
    print(author)


### Expected Output
In this example, the output will be:


Author A
Author B
Author C


Here, each unique author is printed once despite "Author A" being present twice in the list of books and "Book Four" being ignored as it does not have an author.

In [21]:
# Get Llama 3.2 to answer
def call_llama(question):
    llama_response = ollama.chat(
        model=MODEL_LLAMA,
        messages=set_prompt(question)
    )
    return llama_response['message']['content']
    
printer(call_llama(question))

**Generator Expression Explanation**
=====================================

The given code snippet is a generator expression that uses the `yield from` keyword to delegate iteration to another iterable. Let's break it down:

### Code Explanation

```python
yield from {book.get("author") for book in books if book.get("author")}
```

This line of code is using several Python features:

1. **Generator Expression**: The expression `{...}` is a generator expression, which is similar to a list comprehension but returns an iterator instead of a list.
2. **`yield from`**: This keyword is used to delegate iteration to another iterable. When `yield from` encounters the first `yield` in the nested iterable, it will yield its value and then move on to the next item in the outer generator expression.
3. **Dictionary Comprehension**: The inner `{...}` is a dictionary comprehension, which creates a new dictionary with key-value pairs based on the given conditions.

### What the Code Does

In summary, this code does the following:

* Iterates over the `books` iterable (assuming it's a list or similar).
* For each book, checks if the "author" key exists in the book's dictionary using `.get("author")`. If it doesn't exist, the book is skipped.
* For books that do have an "author", it yields the author's value.

### Why `yield from`?

Using `yield from` is beneficial when you want to write a generator function that iterates over another iterable. This approach has several advantages:

* **Efficient**: Generator expressions are memory-efficient because they don't store all the values in memory at once.
* **Flexible**: You can use `yield from` with any iterable, making it a powerful tool for complex iteration scenarios.

### Example Use Case

Here's an example use case where this code might be used:
```python
books = [
    {"title": "Book 1", "author": "Author A"},
    {"title": "Book 2", "author": None},
    {"title": "Book 3", "author": "Author C"}
]

authors = yield from {book.get("author") for book in books if book.get("author")}
print(authors)  # Output: ["Author A", "Author C"]
```
In this example, the code generates a list of authors from the `books` iterable using the dictionary comprehension and `yield from`.

In [22]:
question = """
Can you explain me how streaming works which is causing typewriter like animation in below openai code?
def stream_GPT(question):
    stream = openai.chat.completions.create(
        model=MODEL_GPT,
        messages=set_prompt(question),
        stream=True
    )

    response = ''
    display_handle = display(Markdown(''), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace('```', '').replace('markdown', '')
        update_display(Markdown(response),display_id=display_handle.display_id)

Can I generate similiar typewriter animation when calling `llama model` using `ollama` library?
"""

In [23]:
printer(call_llama(question))

**Streaming in OpenAI's GPT Model**
=====================================

The code you provided uses OpenAI's GPT model to create a chat completion service. The `stream=True` parameter in the `completions.create()` method allows for streaming, which enables the server to return the response one chunk at a time, rather than waiting for the entire response.

**How Streaming Works**
------------------------

Here's a simplified explanation of how streaming works:

1.  **Chunking**: The GPT model breaks down the response into smaller chunks, typically around 4096 characters.
2.  **Sending chunks**: The server sends each chunk to your application as soon as it is generated.
3.  **Receiving and processing chunks**: Your application receives each chunk, processes it (e.g., removes markdown formatting), and updates the display accordingly.

**Typewriter Animation**
------------------------

The typewriter animation you see in the example code is created by updating the `response` variable on each chunk received from the server. This causes the text to appear as if it's being typed.

Here's a breakdown of how this works:

*   The initial response is an empty string (`''`).
*   On each iteration, the current response is updated with the new chunk.
*   Any markdown formatting or code blocks are removed from the response using `replace('```', '').replace('markdown', '')`.
*   The updated response is then displayed on the screen.

**Streaming in LLaMA Model Using Ollama Library**
---------------------------------------------------

To create a similar typewriter animation when calling an LLaMA model using the Ollama library, you can use a streaming approach as well. Here's an example:

```python
import ollama

# Initialize the LLaMA model and chat context
model = ollama.LlamaModel("LLaMA")
context = {}

# Set up the display handle
display_handle = ollama.display.Display()

while True:
    # Prompt the user for input
    question = input("Enter a question: ")

    # Create a new chat session
    context = model.create_session(question)

    # Initialize the response
    response = ""

    # Iterate over each chunk of the response
    for chunk in context:
        # Append the chunk to the response
        response += chunk["text"]

        # Remove any markdown formatting or code blocks from the response
        response = response.replace("```", "").replace("# ", "")

        # Update the display handle with the new response
        display_handle.update(Markdown(response))

    # Wait for a short period before checking again
    import time
    time.sleep(0.5)
```

In this example, we use an infinite loop to continuously prompt the user for input and create new chat sessions with the LLaMA model. On each iteration, we append the response chunk to the `response` variable, remove any markdown formatting or code blocks, and update the display handle with the new response.

By using a streaming approach, we can achieve the same typewriter animation effect as in the original example code.

In [24]:
stream_GPT(question)

Certainly! In the provided code, the concept of streaming is used to create a typewriter-like animation effect when displaying text generated by an AI model (in this case, OpenAI's models). Let's break down the code and understand how it works:

### Explanation of the Code

1. **Streaming Response**:
   python
   stream = openai.chat.completions.create(
       model=MODEL_GPT,
       messages=set_prompt(question),
       stream=True
   )
   
   This line initiates a streaming call to the OpenAI API. The `stream=True` argument means that the response will be sent back in chunks rather than all at once.

2. **Display Setup**:
   python
   display_handle = display(Markdown(''), display_id=True)
   
   This sets up an initial display that can be updated later, using the Jupyter notebook's display functionality.

3. **Processing the Stream**:
   python
   for chunk in stream:
       response += chunk.choices[0].delta.content or ''
       response = response.replace('', '').replace('', '')
       update_display(Markdown(response), display_id=display_handle.display_id)
   
   - This loop processes each chunk of the response as it arrives.
   - Each chunk's content is added to a `response` string.
   - The display is updated in real-time with the current content of `response`, creating the typewriter effect as the AI produces text.

### Creating a Similar Animation with `Ollama` Library

If you want to create a similar typewriter animation effect when calling a Llama model using the `ollama` library, you can do so by implementing a similar strategy. Here's a general outline of how you could achieve this:

1. **Initialize Streaming**:
   You need a streaming option with the `ollama` library, similar to how you did it with the OpenAI API.

2. **Display Content in Chunks**:
   While you receive the chunks, concatenate them and update the display incrementally.

Here’s an illustrative example of how you might implement this:

python
import ollama  # Assume Ollama provides necessary imports
from IPython.display import display, Markdown, update_display

def stream_llama(question):
    stream = ollama.api.chat(
        model="llama",
        messages=[{"role": "user", "content": question}],
        stream=True
    )

    response = ''
    display_handle = display(Markdown(''), display_id=True)

    for chunk in stream:
        # Assuming each chunk has a similar structure as in OpenAI's API
        response += chunk['choices'][0]['delta']['content'] or ''
        response = response.replace('', '').replace('', '')
        update_display(Markdown(response), display_id=display_handle.display_id)

# Usage
stream_llama("What is the future of AI?")


### Key Considerations
- **Ensure Streaming is Supported**: Check if the version of the `ollama` library you are using supports streaming responses.
- **Chunk Handling**: Ensure that the structure of the chunks matches the one expected in the loop (i.e., `chunk['choices'][0]['delta']['content']`).
- **Display Mechanism**: This example uses a Jupyter Notebook for display. If you are working in a different environment (e.g., a web application), the display mechanisms might differ.

By following the above structure, you should be able to achieve a typewriter-like animation with the `ollama` library similar to the one seen with the OpenAI GPT model.

In [28]:
def stream_llama(question):
    stream = ollama.chat(
        model=MODEL_LLAMA,
        messages=set_prompt(question),
        stream = True
    )

    response = ''
    display_handle = display(Markdown(''), display_id=True)
    for chunk in stream:
        response += chunk['message']['content'] or ''
        # response = response.replace('```','').replace('markdown','')
        update_display(Markdown(response),display_id=display_handle.display_id)
stream_llama(question)

Streaming and Typewriter Animation
=====================================

The typewriter-like animation in the provided OpenAI code is achieved through streaming. Streaming allows you to receive the response from the API one chunk at a time, rather than waiting for the entire response.

Here's what happens when we create a stream using `openai.chat.completions.create(stream=True)`:

1.  The API sends the first chunk of the response.
2.  Our code processes this chunk and updates the display with the new text.
3.  We receive the next chunk from the API.
4.  We repeat steps 2-3 until we've received the entire response.

This creates a seamless, continuous updating effect on the screen, similar to a typewriter animation.

Streaming in Python
--------------------

In Python, you can use the `stream` module or libraries like `asyncio` and `aiohttp` to create streams. However, for simplicity, we'll focus on using the existing API methods provided by OpenAI.

To achieve a similar effect when calling an `llama model` using the `transformers` library (which provides a Python interface for the Hugging Face models), you can use the following code:

```python
import torch
from transformers import LLaMAForCausalLM, LLaMAScheduler

def stream_llama(question):
    # Initialize the model and scheduler
    model = LLaMAForCausalLM.from_pretrained('llama')
    scheduler = LLaMAScheduler()

    # Create an iterator for streaming responses
    response_iterator = iter(model.generate(
        input_ids=torch.tensor([question]), 
        max_length=100, 
        num_returned_sequences=1, 
        no_repeat_ngram_size=2, 
        early_stopping=True, 
        max_model_size=1024 * 1024 * 50
    ))

    response = ''
    for i, response in enumerate(response_iterator):
        # Process the response and update the display (assuming display is a function that updates the UI)
        response_text = model.decode_ids(response[0])
        if response_text:
            response += ' '.join(response_text) + '\n'

# Example usage
question = "What is the capital of France?"
stream_llama(question)
```

In this code:

*   We use `model.generate` to create an iterator for streaming responses.
*   The `max_length`, `num_returned_sequences`, and `early_stopping` arguments are used to control the output length, sequence generation, and stopping criteria, respectively.

Note that the above example uses a simple approach to process the response. You can enhance this by adding more sophisticated text processing techniques, such as filtering or summarization.

By using streaming, you can achieve a similar typewriter-like animation when calling an `llama model` using the `transformers` library.

In [29]:
stream_llama("What is a factory in terms of design patterns of OOPS programming?")

**Factory Design Pattern**
==========================

The Factory design pattern is a creational design pattern that provides an interface for creating objects without specifying the exact class of object that will be created.

**Problem**
-----------

In traditional object-oriented programming (OOP), we would typically create objects by instantiating classes. However, this approach has several drawbacks:

*   It can lead to tightly coupled code, where the client code is dependent on a specific class implementation.
*   It can make it difficult to add new functionality or change existing behavior without modifying existing code.

**Solution**
------------

The Factory design pattern solves these problems by introducing an abstraction layer between the client code and the concrete object classes. This abstraction allows us to create objects without specifying their exact class, decoupling the client code from specific implementations.

**Key Components**
-------------------

A typical Factory design pattern consists of:

*   **Factory Interface**: Defines the interface for creating objects.
*   **Concrete Factories**: Implement the factory interface and provide specific object creation logic.
*   **Product Classes**: Represent the actual objects being created.
*   **Client Code**: Uses the factory interface to create objects without knowing their concrete class.

**Example**
------------

Suppose we want to write a program that creates different types of vehicles (e.g., cars, trucks, motorcycles) without specifying the exact implementation details. We can use the Factory design pattern as follows:

```markdown
# Vehicle.java (Factory Interface)

public interface Vehicle {
    void drive();
}
```

```markdown
# Car.java (Product Class)

public class Car implements Vehicle {
    @Override
    public void drive() {
        System.out.println("Driving a car");
    }
}
```

```markdown
# Truck.java (Product Class)

public class Truck implements Vehicle {
    @Override
    public void drive() {
        System.out.println("Driving a truck");
    }
}
```

```markdown
# Motorcycle.java (Product Class)

public class Motorcycle implements Vehicle {
    @Override
    public void drive() {
        System.out.println("Driving a motorcycle");
    }
}
```

```markdown
# VehicleFactory.java (Concrete Factory)

public class VehicleFactory {
    public static Vehicle createVehicle(String type) {
        if (type.equals("car")) {
            return new Car();
        } else if (type.equals("truck")) {
            return new Truck();
        } else if (type.equals("motorcycle")) {
            return new Motorcycle();
        } else {
            throw new RuntimeException("Unsupported vehicle type");
        }
    }
}
```

```markdown
# ClientCode.java

public class Main {
    public static void main(String[] args) {
        Vehicle car = VehicleFactory.createVehicle("car");
        car.drive(); // Output: Driving a car

        Vehicle truck = VehicleFactory.createVehicle("truck");
        truck.drive(); // Output: Driving a truck
    }
}
```

In this example:

*   The `Vehicle` interface defines the contract for all vehicles.
*   The `Car`, `Truck`, and `Motorcycle` classes implement the `Vehicle` interface to provide specific implementations.
*   The `VehicleFactory` class implements the factory interface, providing a way to create objects without specifying their concrete class.
*   The client code uses the `VehicleFactory` to create vehicles of different types without knowing their exact implementation details.

**Benefits**
------------

The Factory design pattern provides several benefits:

*   **Decoupling**: It decouples object creation from specific implementations, making it easier to add new functionality or change existing behavior.
*   **Flexibility**: It allows for the creation of objects with different properties and behaviors without modifying existing code.
*   **Reusability**: It promotes reusing code across multiple contexts by providing a standardized interface for object creation.

By using the Factory design pattern, you can create more flexible, maintainable, and scalable software systems that are easier to extend and modify over time.