<a href="https://colab.research.google.com/github/sanjeevhalyal/machinelearningpartice/blob/main/Pegasus_Paraphrasing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In the last week of December 2019, Google research team introduced state of the art summarization model [PEGASUS](https://github.com/google-research/pegasus), which expands Pre-training with Extracted Gap-sentences for Abstractive Summarization.

We finetune same model for paraphrasing task and convert TF checkpoints to pytorch using [this](https://github.com/huggingface/transformers/blob/master/src/transformers/convert_pegasus_tf_to_pytorch.py) script on [transformers](https://github.com/huggingface/transformers) library by huggingface

# **Finetuning**

Please checkout the official source library [here](https://github.com/google-research/pegasus#finetuning-on-downstream-datasets) 

Same finetuning script can be used to train PEGASUS on paraphrasing task with minimal changes.


**input:** Text_to_paraphrase

**target:** paraphrased_text

In [None]:
pip install --upgrade transformers sentencepiece

Requirement already up-to-date: transformers in /usr/local/lib/python3.7/dist-packages (4.6.0)
Requirement already up-to-date: sentencepiece in /usr/local/lib/python3.7/dist-packages (0.1.95)


In [None]:
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
model_name = 'tuner007/pegasus_paraphrase'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
 
def get_response(input_text,num_return_sequences,num_beams):
  batch = tokenizer([input_text],truncation=True,padding='longest',max_length=60, return_tensors="pt").to(torch_device)
  translated = model.generate(**batch,max_length=60,num_beams=num_beams, num_return_sequences=num_return_sequences, temperature=1.5)
  tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
  return tgt_text

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1912529.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=65.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=86.0, style=ProgressStyle(description_w…




## **Example 1:**

In [None]:
context = "The ultimate test of your knowledge is your capacity to convey it to another."
num_return_sequences=10
num_beams=10
get_response(context,num_return_sequences,num_beams)

['The test of your knowledge is your ability to convey it.',
 'The ability to convey your knowledge is the ultimate test of your knowledge.',
 'The ability to convey your knowledge is the most important test of your knowledge.',
 'Your capacity to convey your knowledge is the ultimate test of it.',
 'The test of your knowledge is your ability to communicate it.',
 'Your capacity to convey your knowledge is the ultimate test of your knowledge.',
 'Your capacity to convey your knowledge to another is the ultimate test of your knowledge.',
 'Your capacity to convey your knowledge is the most important test of your knowledge.',
 'The test of your knowledge is how well you can convey it.',
 'Your capacity to convey your knowledge is the ultimate test.']

## **Example 2:** It can also be used to paraphrase questions (was not trained on quora dataset)

In [None]:
context = "Which course should I take to get started in data science?"
num_return_sequences=10
num_beams=10
get_response(context,num_return_sequences,num_beams)

['Which data science course should I take?',
 'Which data science course should I take first?',
 'Should I take a data science course?',
 'Which data science class should I take?',
 'Which data science course should I attend?',
 'I want to get started in data science.',
 'Which data science course should I enroll in?',
 'Which data science course is right for me?',
 'Which data science course is best for me?',
 'Which course should I take to get started?']

## Pegasus for paraphrasing documents coming soon.. 🦄🤗