##Generative AI : summarize Dialogues of DiologueSum dataset
#####Exploring different input texts and their impact on output by prompt engineering, zero shot, one shot & few shot inferences.
#####At then end, exploration of different configuration parameters and their impact on outputs

###1. Installing Python `libraries`

In [0]:
!pip install --disable-pip-version-check torch

In [0]:
!pip install --disable-pip-version-check torchdata

In [0]:
!pip install transformers

In [0]:
!pip install datasets==2.18.0 --quiet

In [0]:
import datasets
print(datasets.__version__)

In [0]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

###2. Summarize Dialogue without Prompt engineering

In [0]:
hugging_dataset = 'knkarthick/dialogsum'
dataset = load_dataset(hugging_dataset)

In [0]:
### Reading sample of dialogues
sample_indices = [40, 200]
dash_line = '_'.join('' for x in range(100))
for i, index in enumerate(sample_indices):
  print(dash_line)
  print('sample ', i+1)
  print(dash_line)
  print('Input dialogue:')
  print(dataset['test'][index]['dialogue'])
  print(dash_line)
  print('human summary:')
  print(dataset['test'][index]['summary'])
  print(dash_line)
  print()

In [0]:
## Reading Flan T5 model
model_name = 'google/flan-t5-base'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

In [0]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = True)

In [0]:
## sample of encoded sentence and decoded vector of the sentence
sentence = "What time is it, Matt?"
sentence_encoded = tokenizer(sentence, return_tensors='pt')
sentence_decoded = tokenizer.decode(sentence_encoded['input_ids'][0], skip_special_tokens=True)
print('Encoded sentence : ')
print(sentence_encoded['input_ids'][0])
print('\n decoded sentence: ')
print(sentence_decoded)

In [0]:
for i, index in enumerate(sample_indices):
  dialogue = dataset['test'][index]['dialogue']
  summary = dataset['test'][index]['summary']

  inputs = tokenizer(dialogue, return_tensors='pt')
  output = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=50,)[0], skip_special_tokens=True)

  print(dash_line)
  print('Sample ', i+1)
  print(dash_line)
  print('Input prompt: \n', dialogue)
  print(dash_line)
  print('Baseline human summary: \n', summary)
  print(dash_line)
  print('Model generation - without prompt engineering: \n', output)

##### Observation : Not a good quality of text summarization

###3. Zero shot inference with an instruction prompt

In [0]:
for i, index in enumerate(sample_indices):
  dialogue = dataset['test'][index]['dialogue']
  summary = dataset['test'][index]['summary']

  prompt = f"""
  Summarize the following conversation.
  {dialogue}
  Summary :
  """

  inputs = tokenizer(prompt, return_tensors='pt')
  output = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=50,)[0], skip_special_tokens=True)

  print(dash_line)
  print('Sample ', i+1)
  print(dash_line)
  print('Input prompt: \n', prompt)
  print(dash_line)
  print('Baseline human summary: \n', summary)
  print(dash_line)
  print('Model generation - without prompt engineering: \n', output)

In [0]:
for i, index in enumerate(sample_indices):
  dialogue = dataset['test'][index]['dialogue']
  summary = dataset['test'][index]['summary']

  prompt = f"""
 Dialogue:
  {dialogue}
  What was going on ?
  """

  inputs = tokenizer(prompt, return_tensors='pt')
  output = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=50,)[0], skip_special_tokens=True)

  print(dash_line)
  print('Sample ', i+1)
  print(dash_line)
  print('Input prompt: \n', prompt)
  print(dash_line)
  print('Baseline human summary: \n', summary)
  print(dash_line)
  print('Model generation - without prompt engineering: \n', output)

##### Observation : Not a good quality of text summarization

###4. One shot inference

In [0]:
def make_promt(sample_indices, sample_index_to_summarize):
  prompt = ''
  for index in sample_indices:
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    prompt += f"""
    Dialogue:
    {dialogue}
    What was going on?
    {summary}
    """
  dialogue = dataset['test'][sample_index_to_summarize]['dialogue']
  prompt += f"""
  Dialogue:
  {dialogue}
  What was going on?
  """
  return prompt

In [0]:
sample_indices = [40]
sample_index_to_summarize = 200

one_shot_prompt = make_promt(sample_indices, sample_index_to_summarize)
print(one_shot_prompt)

In [0]:
summary = dataset['test'][sample_index_to_summarize]['summary']
inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens = 50)[0], skip_special_tokens = True)
print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - one shot : ', {output})

##### Observation : Better quality of text summarization and more specific in comparsion to zero shot

###5. Few shot inference

In [0]:
### Adding three shots
sample_indices = [40, 80, 120]
sample_index_to_summarize = 200
few_shot_prompt = make_promt(sample_indices, sample_index_to_summarize)
print(few_shot_prompt)

In [0]:
summary = dataset['test'][sample_index_to_summarize]['summary']
inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens = 50)[0], skip_special_tokens = True)
print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

##### Observation : Indeed, btter quality of text summarization in comparison to zero shot, but not that difference in comparsino to one shot

#### Conclusion on feeding shots : adding shots to the model could make the summary better, but still not as expected

###6. Generative configuration Parameters for Inference

In [0]:
generation_config = GenerationConfig(max_new_tokens = 50)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

In [0]:
### decreasing the max of output text by changing the nax_new_tokens value
generation_config = GenerationConfig(max_new_tokens = 30)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

In [0]:
### decreasing the max of output text by changing the nax_new_tokens value
generation_config = GenerationConfig(max_new_tokens = 10)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

#### Conclusion of changing the max token: 
#####This is used for max size of output sequence, not including the tokens in the prompt. Btter to tune which length of output text is desirable by end users without limiting valuable data.

In [0]:
generation_config = GenerationConfig(max_new_tokens = 50, do_sample= True, temperature = 0.1)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

In [0]:
### Chaning the temperature parameter to get away from default Gready search approach
generation_config = GenerationConfig(max_new_tokens = 50, do_sample= True, temperature = 0.5)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

In [0]:
### Chaning the temperature parameter to get away from default Gready search approach
generation_config = GenerationConfig(max_new_tokens = 50, do_sample= True, temperature = 1.0)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

#### Conclusion of changing the temperature parameter: 
#####increasing the temperature value triggers creative responses. But decreasing the temperature triggers conservative reponses which means by runing the code over and over again, the same response will be made. This happens because of default greedy search approach in text generation which always choose the text with highest probability. This parameter should not get too high, otherwise it makes semantic inconsistency

In [0]:
## Greedy search as default search
generation_config = GenerationConfig(max_new_tokens = 50)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

In [0]:
## multinomial sampling
generation_config = GenerationConfig(max_new_tokens = 50, do_sample=True, num_beams=1)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

In [0]:
## Beam search
generation_config = GenerationConfig(max_new_tokens = 50, num_beams=5)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

In [0]:
## Beam search multinomial sampling
generation_config = GenerationConfig(max_new_tokens = 50, num_beams=5, do_sample=True)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

In [0]:
## Contrastive search
generation_config = GenerationConfig(max_new_tokens = 50, penalty_alpha=0.6, top_k=4)

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(model.generate(inputs['input_ids'], generation_config=generation_config)[0], skip_special_tokens=True)

print(dash_line)
print('baseline human summary', {summary})
print(dash_line)
print('model generation - few shots : ', {output})

#### Conclusion of different text generation methods: 
#####Deterministic approaches such as greedy search & beam search result in text degenration & undesirebale repition. Though, the stochastic approaches are using randomness during text generation process which end with non-repetitive yet coherent long outputs
more info about text generation methods in below link : https://huggingface.co/docs/transformers/generation_strategies