# ChatGPT API Interaction

In [31]:
import weave
import json
from openai import OpenAI

openai_key = os.environ.get("OPENAI_API_KEY")

@weave.op() # 🐝 Decorator to track requests
def extract_fruit(sentence: str) -> dict:
    
    client = OpenAI(api_key=openai_key)
    system_prompt = "Parse sentences into a JSON dict with keys: verb count, adj count and total word cound."

    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": sentence}
      ],
      temperature=0.7,
      response_format={"type": "json_object"}
    )

    extracted = response.choices[0].message.content

    return json.loads(extracted)

weave.init('intro-example') # 🐝

sentence = "There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy."

extract_fruit(sentence)

🍩 https://wandb.ai/golyuvalg-ben-gurion-university-of-the-negev/intro-example/r/call/0194e789-1011-7053-92ce-2d2a20200f16


{'verb count': 3, 'adj count': 4, 'total word count': 26}

# **LLM** : from the inside

- ### Module Types

    - Parameters
    - Gradients
    - Optimizer States
    - Activations

- ### Layer Types

    - Convolutional Layer
    - Subsampling Layer
    - Functional Layer
    - Fully connected Layer
    - Gaussian Layer


# **PEFT** : Parameter Efficient Training




- ### **PEFT Types**

    - **Selective** - subset of layers

    - **Additive** - add layers on top

    - **Reparameterization** - add layers in between to simplify the computation

        - LoRA - only add the layers to reduce computations

        - QLoRA - LoRA + quantization technics

        - DoRA - Adjusts the rank of the low-rank space based on the magnitude of the components

</br>

- ### **Quantization Types**

    | Quantization Type                 | Explaination                                                                                  | 
    |-----------------------------------|-----------------------------------------------------------------------------------------------|
    | NF4                               | normalized 4-bit floating point quantization                                                  |
    | FP16 (half precision)             | 16-bit floating point quantization (cut memory usage in half)                                 |
    | BF16 (BFloat16)                   | 16-bit floating point (larger exponent range - better numerical stability)                    |
    | Int8                              | often applied in post-training (sometimes with calibration steps)                             |
    | 4/8-bit quantization              | only for extremely large models (dangerous for maintaining accuracy)                          |
    |                                   |                                                                                               |               
    | Dynamic Quantization              | quantize weight on the fly during inference                                                   |
    | Static Quantization (PTQ)         | quantize after the model is fully trained                                                     |
    | Quantization Aware Training (QAT) | generally yielding the best performance when retraining is possible                           |
    | Mixed Precision Training          | Using lower precision for most calculations while preserving higher precision where needed.   |

</br>

- ### **Others**

    - Soft Prompts
    - LoftQ


## **PEFT Demo**

Load Pretrained Model

In [13]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model_pretrained = AutoModelForSequenceClassification("distilbert-base-uncased")

id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = dict((v,k) for k,v in id2label.items())

Transform to Quantized Pretraied Model

In [None]:

from transformers import BitsAndBytesConfig
from peft import prepare_model_for_kbit_training
import torch

# quantization configuration

quantization_configuration = BitsAndBytesConfig(
    load_in_4bit=True,                                      # quantize the model to 4-bits when loaded
    bnb_4bit_quant_type="nf4",                              # use NF4 data type for weights (initialized from a normal distribution)
    bnb_4bit_use_double_quant=True,                         # nested quantization scheme (to double quantize the weights)
    bnb_4bit_compute_dtype=torch.bfloat16,                  # use bfloat16 for faster computation
    llm_int8_skip_modules=["classifier", "pre_classifier"]  # Don't convert the "classifier" and "pre_classifier" layers to 8-bit
)

# NOT ready for quantized training yet ... 

model_quantized = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased",
                                                                 id2label=id2label,
                                                                 label2id=label2id,
                                                                 num_labels=2,
                                                                 quantization_config=quantization_configuration
                                                                )

# Ready for quantized training

model_quantized = prepare_model_for_kbit_training(model_quantized)


Transform to QLoRA Pretrained Model

In [None]:
from peft import LoraConfig, TaskType, get_peft_model

# LoRA configurations

lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,                 # Specify the task type as sequence classification
    r=8,                                        # Rank of the low-rank matrices
    lora_alpha=16,                              # Scaling factor
    lora_dropout=0.1,                           # Dropout rate  
    target_modules=['q_lin','k_lin','v_lin']    # which modules
)

# Apply LoRA on the quantized model

model_qlora = get_peft_model(model_quantized, lora_config)

Apply LoftQ 

In [None]:
from peft import replace_lora_weights_loftq

replace_lora_weights_loftq(model_qlora)

Train the model

In [14]:
from transformers import TrainingArguments, Trainer

# Mockups
tokenized_train, tokenized_test, compute_metrics = None,None,None

# training arguments

training_args = TrainingArguments(
    output_dir="./results_qlora",
    num_train_epochs=10,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    learning_rate=2e-5,
    evaluation_strategy="epoch",
    weight_decay=0.01
)

# QLoRA trainer

trainer_qlora = Trainer(
    model=model_qlora,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics)

# Train the model

trainer_qlora.train()

NameError: name 'TrainingArguments' is not defined

## Task Types

- Text Generation

- Text Translation

- Text Summerization

- Text Classification

- Question Answering

- Zero Shot Classification :  Classify text into categories without training data

- Named Entity Recognition (NER) : extracting structured info out of unstructured text

- Fill-Mask (???)

- Feature Extraction (???)


## Prepare For Tasks

In [None]:
# imports

import torch
from transformers import pipeline
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# suppress warnings 

def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')



## Text Classification

### No pipeline - sentiment analysis

In [4]:
# Load the tokenizer and model

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

# Sample text

text = "Congratulations! You've won a free ticket to the Bahamas. Reply WIN to claim."

# Tokenize the input text

inputs = tokenizer(text, return_tensors="pt")

# performing inference (grad needed only in training - not inference)

with torch.no_grad():
    outputs = model(**inputs)

# Convert logits to probabilities

probs = torch.softmax(outputs.logits, dim=-1)

# Get the predicted class

predicted_class = torch.argmax(probs, dim=-1)

# Map the predicted class to the label

labels = ["NEGATIVE", "POSITIVE"]
predicted_label = labels[predicted_class]

print(f"Predicted label: {predicted_label}")

Predicted label: POSITIVE


### Pipeline - sentiment analysis


In [6]:
# Load a general text classification model
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")

# Classify a sample text
result = classifier("Congratulations! You've won a free ticket to the Bahamas. Reply WIN to claim.")
print(result)

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9997586607933044}]


### Pipeline - language detection


In [7]:
from transformers import pipeline

classifier = pipeline("text-classification", model="papluca/xlm-roberta-base-language-detection")
result = classifier("Bonjour, comment ça va?")
print(result)

Device set to use cpu


[{'label': 'fr', 'score': 0.9934879541397095}]


## Text Generation

### No Pipeline - GPT2 

In [None]:
# Load the tokenizer and model

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Prompt

prompt = "Once upon a time"

# Tokenize the input text

inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
output_ids = model.generate(
    inputs.input_ids, 
    attention_mask=inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    max_length=50, 
    num_return_sequences=1
)

# Decode the generated text
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(generated_text)

### Pipeline - GPT2


In [8]:
# Initialize the text generation pipeline with GPT-2
generator = pipeline("text-generation", model="gpt2")

# Generate text based on a given prompt
prompt = "Once upon a time"
result = generator(prompt, max_length=50, num_return_sequences=1, truncation=True)

# Print the generated text
print(result[0]['generated_text'])

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time he looked like a man and a lion, and once upon a time he was such as would have to be put at the head of another human being. He had the blood of a great lion and the blood of a wolf and


### Pipeline - T5


In [9]:
# Initialize the text generation pipeline with T5
generator = pipeline("text2text-generation", model="t5-small")

# Generate text based on a given prompt
prompt = "translate English to French: How are you?"
result = generator(prompt, max_length=50, num_return_sequences=1)

# Print the generated text
print(result[0]['generated_text'])

Device set to use cpu


Comment êtes-vous?


## Fill Mask

### Pipeline - Bert

In [12]:
fill_mask = pipeline("fill-mask", model="bert-base-uncased")
prompt = "The capital of France is [MASK]."
result = fill_mask(prompt)
print(result)

BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another archite

[{'score': 0.4167883098125458, 'token': 3000, 'token_str': 'paris', 'sequence': 'the capital of france is paris.'}, {'score': 0.07141656428575516, 'token': 22479, 'token_str': 'lille', 'sequence': 'the capital of france is lille.'}, {'score': 0.06339260935783386, 'token': 10241, 'token_str': 'lyon', 'sequence': 'the capital of france is lyon.'}, {'score': 0.04444736987352371, 'token': 16766, 'token_str': 'marseille', 'sequence': 'the capital of france is marseille.'}, {'score': 0.030297178775072098, 'token': 7562, 'token_str': 'tours', 'sequence': 'the capital of france is tours.'}]


Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a
