<a href="https://colab.research.google.com/github/sainitishmitta04/python-projects/blob/main/pegasus_transformer_paraphrasing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  Rewriting the sentence into formal,casual ---> Paraphrasing

### **PEGASUS** **Transformer**

### The PEGASUS model’s pre-training task is very similar to summarization, i.e. important sentences are removed and masked from an input document and are later generated together as one output sequence from the remaining sentences, which is fairly similar to a summary

### I have used PEGASUS model to derive paraphrases from an input sentence,and we can compare how it is different from the input sentence

### PEGASUS is an acronym for Pre-training with Extracted Gap-sentences for Abstractive Summarization Sequence-to-sequence models

### PEGASUS model used here is from Huggingface's transformers library

## **Installing dependencies**

In [None]:
#Ignoring unncessary warnings
import warnings
warnings.filterwarnings("ignore")

In [None]:
!pip install sentence-splitter

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
!pip install SentencePiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## **Setting Up the PEGASUS transformer Model**

In [None]:
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

model_name = 'tuner007/pegasus_paraphrase'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

def get_response(input_text,num_return_sequences):
  batch = tokenizer.prepare_seq2seq_batch([input_text],truncation=True,padding='longest',max_length=60, return_tensors="pt").to(torch_device)
  translated = model.generate(**batch,max_length=60,num_beams=10, num_return_sequences=num_return_sequences, temperature=1.5)
  tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
  return tgt_text

## **Testing the Model**

In [None]:
#test input sentence
text = "I was ranked 2nd as a contributor to the scoial scheduler,with 3 PRs and 55 points."

In [None]:
#printing response
get_response(text,5)

['I received 3 PRs and 55 points as a contributor to the scheduler.',
 'I was ranked 2nd as a contributor to the scheduler, with 3 PRs and 55 points.',
 'I was a contributor to the scheduler with 3 PRs and 55 points.',
 'I was ranked 2nd as a contributor, with 3 PRs and 55 points.',
 'I was ranked 2nd as a contributor to the scheduler with 3 PRs and 55 points.']

### As we can notice, because we had set the number of responses to 5, we got five different paraphrase responses by the model

In [None]:
get_response(text,10)

['I received 3 PRs and 55 points as a contributor to the scheduler.',
 'I was ranked 2nd as a contributor to the scheduler, with 3 PRs and 55 points.',
 'I was a contributor to the scheduler with 3 PRs and 55 points.',
 'I was ranked 2nd as a contributor, with 3 PRs and 55 points.',
 'I was ranked 2nd as a contributor to the scheduler with 3 PRs and 55 points.',
 'I received 3 PRs and 55 points for being a contributor to the scheduler.',
 'I was ranked 2nd as a contributor to the scheduler and received 3 PRs and 55 points.',
 'I was ranked 2nd as a contributor and had 3 PRs and 55 points.',
 'I was ranked 2nd as a contributor to the scheduler and had 3 PRs and 55 points.',
 'I was ranked 2nd as a contributor to the scheduler.']

### Hence, we have rewritten the sentence into formal,casual format

### Similarly, we can input a paragraph instead of a single sentence and splits into a list of sentences and can perform the same operation to obtain the rewritten format of the paragraph