In [1]:
import httpx
import os
from openai import OpenAI
from api_key import api_key

endpoint="https://models.arcee.ai/v1"

model="coder"

client = OpenAI(
    base_url=endpoint,
    api_key=api_key,
    http_client=httpx.Client(http2=True)
)

In [2]:
def print_streaming_response(response):
    num_tokens=0
    for message in response:
        if len(message.choices) > 0:
            num_tokens+=1
            print(message.choices[0].delta.content, end="")
    print(f"\n\nNumber of tokens: {num_tokens}")

In [3]:
response = client.chat.completions.create(
  model=model,
  messages=[
      {'role': 'user', 
       'content': """Explain the difference between logit-based distillation and hidden state distillation. Show an example for both with Pytorch code, 
       with BERT-Large as the teacher model, and BERT-Base as the student model.
       """
      }   
  ],
  temperature=0.9,
  stream=True,
  max_tokens=16384
)

print_streaming_response(response)

Certainly! Knowledge Distillation is a technique used to transfer knowledge from a large, complex model (teacher) to a smaller, simpler model (student). Two common methods of knowledge distillation in the context of transformer models like BERT are **logit-based distillation** and **hidden state distillation**.

### Logit-Based Distillation

In logit-based distillation, the knowledge is transferred by matching the teacher's and student's logits (the pre-softmax activations) rather than their final softmax probabilities. The idea is to train the student to produce logits that are close to the teacher's logits using a temperature-scaled cross-entropy loss.

#### Example with PyTorch

Here's an example of logit-based distillation using BERT-Large as the teacher and BERT-Base as the student.

```python
import torch
import torch.nn as nn
import torch.optim as optim
from transformers import BertTokenizer, BertForSequenceClassification, BertConfig

# Load teacher and student models
teacher_mo

In [4]:
code_example = """
def print_streaming_response(response):
    num_tokens=0
    for message in response:
        if len(message.choices) > 0:
            num_tokens+=1
            print(message.choices[0].delta.content, end="")
    print(f"\n\nNumber of tokens: {num_tokens}")
"""

response = client.chat.completions.create(
  model=model,
  messages=[
      {'role': 'user', 
       'content': f"Improve the following code: {code_example}. Explain why your changes are an improvement."
      }   
  ],
  temperature=0.9,
  stream=True,
  max_tokens=2048
)

print_streaming_response(response)

Certainly! Let's improve the provided code for printing a streaming response. Here's a revised version:

```python
def print_streaming_response(response):
    num_tokens = 0
    for message in response:
        choices = message.get('choices', [])
        for choice in choices:
            delta_content = choice.get('delta', {}).get('content', '')
            if delta_content:
                num_tokens += 1
                print(delta_content, end="")
    print(f"\n\nNumber of tokens: {num_tokens}")
```

### Explanation of Improvements:

1. **Error Handling with `get`:**
   - **Original Code:** `if len(message.choices) > 0:` assumes `message.choices` always exists, which can cause a `KeyError` if `choices` is missing.
   - **Improved Code:** `choices = message.get('choices', [])` safely retrieves the `choices` list, defaulting to an empty list if `choices` is not present. This prevents potential `KeyError`.

2. **Iterating Over Multiple Choices:**
   - **Original Code:** Assumes there