`LLM` - Large Language Models

`RNN` - primary approach to generate text

`Transformers` - new approach based on `encoding` and `decoding` the text

Each word in text is assigned `TOKEN` (tokenized). Each `TOKEN` is mapped into `vector`.
Model analises relations between tokens. 
The output of model i a vector with probabilites of each word in the vocabulary.

String -> TOKEN -> vector

'_teacher' -> 3145 -> [-0.0335, 0.0167, 0.0484, ...]

`Encoder` - encodes input sequences into deep representation of structure and meaning of input

`Decoder` - uses encoder's contextual understanding to generate new tokens (does it in a loop till a `stop condition` is reached). 

`Prompt` - text that we feed to the model. Improving the promt might be necassary to get better results. This is called Prompt engineering. Prompt can contain `example prompt-complection pair` to help the model generate response for another similar prompt (one-shot). For smaller models a couple of examples might be more helpful (few-shot).

`Inference` - act of generating text

`Comlection` - output text

`Context window` - limit of words that the model can take as input.

`Inference parameters` - controls that adjust the models behave:
- max new tokens - `limits number of tokens` that model will generate
- top-k - tells model to choose top k words with `highest probability` (lets the model have some randomness in responses nad avoid repetition)
- top-p - limits the random sampling to predictions which `combined probability` does not exceed the p
- temperature - describes shape of the probability model that the LLM will use to generate random responses. `The higher the temperature, the higher the randomness.`

#### Adapt and align model:
- `Prompt engineering` (zero-shot, one-shot, few-shot)
- `Fine tuning`
- `Align with human feedback` - RLHF

#### Models:
- `encoder only` (autoencoder models) sentiment analysis, word classification, eg. BERT
- `encoder-decoder` (sequence to sequence models) translation, text summarization, question answering, eg. T5, BART
- `decoder only` (autoregressive models) text generation, eg. GPT

#### Quantization
`Reduction of memory` required to `store and train` the model by `reducing the precision` of the model `weights`.

#### DDP
`Distributed Data Parallel` - in PyTorch allows tou to copy model into `multiple GPUs` and process there sub-batches of data to train big model. After precessing each batch update of weights models are synchronized and updated on each machine.

#### Chincillla law
Increasing number of parameters in not the only way of improving model. A good way is to `increase the training dataset`eg. LLaMa.

### PROMPT ENGINEERING

In [1]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

In [3]:
example_indices = [41, 205]

In [4]:
model_name='google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

In [6]:
sentence = "What time is it, Tom?"

sentence_encoded = tokenizer(sentence, return_tensors='pt')

sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"][0], 
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

ENCODED SENTENCE:
tensor([ 363,   97,   19,   34,    6, 3059,   58,    1])

DECODED SENTENCE:
What time is it, Tom?


In [7]:
index=205
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

inputs = tokenizer(dialogue, return_tensors='pt')
response = model.generate(inputs["input_ids"], max_new_tokens=50,)
output = tokenizer.decode(response[0], skip_special_tokens=True)

print(f'INPUT PROMPT:\n{dialogue}')
print('---')
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print('---')
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

INPUT PROMPT:
#Person1#: Oh dear, my weight has gone up again.
#Person2#: I am not surprised, you eat too much.
#Person1#: And I suppose sitting at the desk all day at the office doesn't help.
#Person2#: No, I wouldn't think so.
#Person1#: I do wish I could lose weight.
#Person2#: Well, why don't you go on a diet?
#Person1#: I've tried diets before but they've never worked.
#Person2#: Perhaps you should exercise more. Why don't you go to an exercise class.
#Person1#: Yes, maybe I should.
---
BASELINE HUMAN SUMMARY:
#Person2# offers #Person1# some suggestions to lose weight.
---
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
#Person1#: I'm not surprised, you eat too much. #Person2#: I'm not surprised, you eat too much. #Person1#: I wish I could lose weight. #



#### Zero Shot Inference with an Instruction Prompt

In [8]:
index=205
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """

inputs = tokenizer(prompt, return_tensors='pt')
response = model.generate(inputs["input_ids"], max_new_tokens=50,)
output = tokenizer.decode(response[0], skip_special_tokens=True)

print(f'INPUT PROMPT:\n{dialogue}')
print('---')
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print('---')
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

INPUT PROMPT:
#Person1#: Oh dear, my weight has gone up again.
#Person2#: I am not surprised, you eat too much.
#Person1#: And I suppose sitting at the desk all day at the office doesn't help.
#Person2#: No, I wouldn't think so.
#Person1#: I do wish I could lose weight.
#Person2#: Well, why don't you go on a diet?
#Person1#: I've tried diets before but they've never worked.
#Person2#: Perhaps you should exercise more. Why don't you go to an exercise class.
#Person1#: Yes, maybe I should.
---
BASELINE HUMAN SUMMARY:
#Person2# offers #Person1# some suggestions to lose weight.
---
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
#Person1#: I'm not sure what to do.



#### Zero Shot Inference with the Prompt Template from FLAN-T5

In [9]:
index=205
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
Dialogue:

{dialogue}

What was going on?
    """

inputs = tokenizer(prompt, return_tensors='pt')
response = model.generate(inputs["input_ids"], max_new_tokens=50,)
output = tokenizer.decode(response[0], skip_special_tokens=True)

print(f'INPUT PROMPT:\n{dialogue}')
print('---')
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print('---')
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

INPUT PROMPT:
#Person1#: Oh dear, my weight has gone up again.
#Person2#: I am not surprised, you eat too much.
#Person1#: And I suppose sitting at the desk all day at the office doesn't help.
#Person2#: No, I wouldn't think so.
#Person1#: I do wish I could lose weight.
#Person2#: Well, why don't you go on a diet?
#Person1#: I've tried diets before but they've never worked.
#Person2#: Perhaps you should exercise more. Why don't you go to an exercise class.
#Person1#: Yes, maybe I should.
---
BASELINE HUMAN SUMMARY:
#Person2# offers #Person1# some suggestions to lose weight.
---
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1 is overweight and has a lot of food.



#### One Shot Inference

In [11]:
example_index=40
example_dialogue = dataset['test'][example_index]['dialogue']
example_summary = dataset['test'][example_index]['summary']
index=205
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
Dialogue:

{example_dialogue}

What was going on?

{example_summary}

Dialogue:

{dialogue}

What was going on?
    """

inputs = tokenizer(prompt, return_tensors='pt')
response = model.generate(inputs["input_ids"], max_new_tokens=50,)
output = tokenizer.decode(response[0], skip_special_tokens=True)

print(f'INPUT PROMPT:\n{example_dialogue}')
print('---')
print(f'BASELINE HUMAN SUMMARY:\n{example_summary}')
print('---')
print(f'INPUT PROMPT:\n{dialogue}')
print('---')
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print('---')
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---
INPUT PROMPT:
#Person1#: Oh dear, my weight has gone up again.
#Person2#: I am not surprised, you eat too much.
#Person1#: And I suppose sitting at the desk all day at the office doesn't help.
#Person2#: No, I wouldn't think so.
#Person1#: I do wish I could lose weight.
#Person2#: Well, why don't you go on a diet?
#Person1#: I've tried diets before but they've never worked.
#Person2#: Perhaps you should exercise more. Why don't you go to an exercise class.
#Person1#: Yes, maybe I should.
---
BASELINE HUM

#### Few Shot Inference

In [29]:
example_index=[40,80]
prompt=""
examples=dict()
for i in example_index:
    example_dialogue = dataset['test'][i]['dialogue']
    example_summary = dataset['test'][i]['summary']
    examples[f'dialogue{i}']=example_dialogue
    examples[f'summary{i}']=example_summary
    prompt+=f"""
    Dialogue:

    {example_dialogue}

    What was going on?

    {example_summary}
    """

index=205
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt += f"""
Dialogue:

{dialogue}

What was going on?
    """

inputs = tokenizer(prompt, return_tensors='pt')
response = model.generate(inputs["input_ids"], max_new_tokens=50,)
output = tokenizer.decode(response[0], skip_special_tokens=True)
print(f'INPUT PROMPT:\n{examples["dialogue40"]}')
print('---')
print(f'BASELINE HUMAN SUMMARY:\n{examples["summary40"]}')
print('---')
print(f'INPUT PROMPT:\n{examples["dialogue80"]}')
print('---')
print(f'BASELINE HUMAN SUMMARY:\n{examples["summary80"]}')
print('---')
print(f'INPUT PROMPT:\n{dialogue}')
print('---')
print(f'BASELINE HUMAN SUMMARY:\n{summary}')
print('---')
print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---
INPUT PROMPT:
#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Perso