# **How to Paraphrase text in Python with PARROT**


<a href="https://github.com/PrithivirajDamodaran/Parrot"><img width="150" src="https://github.com/PrithivirajDamodaran/Parrot/raw/main/images/Logo.png"></a>

---

### Paraphrasing:- 
A paraphrase is a restatement of the meaning of a text or passage using other words. The term itself is derived via Latin paraphrasis from Greek παράφρασις, meaning "additional manner of expression". The act of paraphrasing is also called "paraphrasis".


### Parrot
Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models. A paraphrase framework is more than just a paraphrasing model.

### Why Parrot?
Huggingface lists 12 paraphrase models, RapidAPI lists 7 fremium and commercial paraphrasers like QuillBot, Rasa has discussed an experimental paraphraser for augmenting text data here, Sentence-transfomers offers a paraphrase mining utility and NLPAug offers word level augmentation with a PPDB (a multi-million paraphrase database). While these attempts at paraphrasing are great, there are still some gaps and paraphrasing is NOT yet a mainstream option for text augmentation in building NLU models....Parrot is a humble attempt to fill some of these gaps.

What is a good paraphrase? Almost all conditioned text generation models are validated on 2 factors, (1) if the generated text conveys the same meaning as the original context (Adequacy) (2) if the text is fluent / grammatically correct english (Fluency). For instance Neural Machine Translation outputs are tested for Adequacy and Fluency. But a good paraphrase should be adequate and fluent while being as different as possible on the surface lexical form. With respect to this definition, the 3 key metrics that measures the quality of paraphrases are:

Adequacy (Is the meaning preserved adequately?)
Fluency (Is the paraphrase fluent English?)
Diversity (Lexical / Phrasal / Syntactical) (How much has the paraphrase changed the original sentence?)
Parrot offers knobs to control Adequacy, Fluency and Diversity as per your needs.

What makes a paraphraser a good augmentor? For training a NLU model we just don't need a lot of utterances but utterances with intents and slots/entities annotated. Typical flow would be:

Given an input utterance + input annotations a good augmentor spits out N output paraphrases while preserving the intent and slots.
The output paraphrases are then converted into annotated data using the input annotations that we got in step 1.
The annotated data created out of the output paraphrases then makes the training dataset for your NLU model.
But in general being a generative model paraphrasers doesn't guarantee to preserve the slots/entities. So the ability to generate high quality paraphrases in a constrained fashion without trading off the intents and slots for lexical dissimilarity makes a paraphraser a good augmentor. More on this in section 3 below

## Installing the parrot

In [1]:
! pip install git+https://github.com/PrithivirajDamodaran/Parrot.git

Collecting git+https://github.com/PrithivirajDamodaran/Parrot.git
  Cloning https://github.com/PrithivirajDamodaran/Parrot.git to /tmp/pip-req-build-wc44knk3
  Running command git clone -q https://github.com/PrithivirajDamodaran/Parrot.git /tmp/pip-req-build-wc44knk3
Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/b0/9e/5b80becd952d5f7250eaf8fc64b957077b12ccfe73e9c03d37146ab29712/transformers-4.6.0-py3-none-any.whl (2.3MB)
[K     |████████████████████████████████| 2.3MB 27.6MB/s 
[?25hCollecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)
[K     |████████████████████████████████| 1.2MB 25.4MB/s 
[?25hCollecting python-Levenshtein
[?25l  Downloading https://files.pythonhosted.org/packages/2a/dc/97f2b63ef0fa1fd78dcb7195aca577804f6b2b51e712516cc0e902a9a201/python-Levenshtein-0.12.2.tar.gz 

## Importing the packages

In [2]:
# Import libraries
from parrot import Parrot
import torch
import warnings
warnings.filterwarnings("ignore")

## Random state for reproducibility

In [3]:
# For reproducibility
def random_state(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

random_state(123)

## Initializing the downloaded model

In [4]:
#Init models (make sure you init ONLY once if you integrate this to your code)
parrot = Parrot(model_tag="prithivida/parrot_paraphraser_on_T5", use_gpu=False)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1373.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1786.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1889.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891737400.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=908.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1629486723.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355863.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=26.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=476.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=437985387.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=112.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=48.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, max=305584576.0), HTML(value='')))




## paraphrasing the phrase

In [5]:
phrases = ["How do we learn data science in efficient way?"]

## paraphrasing

In [6]:
for phrase in phrases:
  print("-"*100)
  print("Input_phrase: ", phrase)
  print("-"*100)
  para_phrases = parrot.augment(input_phrase=phrase)
  for para_phrase in para_phrases:
   print(para_phrase)

----------------------------------------------------------------------------------------------------
Input_phrase:  How do we learn data science in efficient way?
----------------------------------------------------------------------------------------------------
('tell me the best way to learn data science and data models in a quick time?', 46)
('tell me the easiest way to learn data science?', 39)
('how can i learn data science?', 34)
('how can i get better at data science in very short time?', 34)
('what should i do to learn data science?', 32)
('how can we learn data science?', 32)
('how do i learn data science in a practical way?', 23)
('how do i learn data science efficiently?', 20)
('how should we learn data science in an efficient way?', 17)


In [7]:
phrases = ["Best way to write a project report?"]

for phrase in phrases:
  print("-"*100)
  print("Input_phrase: ", phrase)
  print("-"*100)
  para_phrases = parrot.augment(input_phrase=phrase)
  for para_phrase in para_phrases:
   print(para_phrase)

----------------------------------------------------------------------------------------------------
Input_phrase:  Best way to write a project report?
----------------------------------------------------------------------------------------------------
('how do i write a project report?', 20)
('tell me the best way to write a report?', 19)
('what is best way to write a good project report?', 14)
('tell me the best way to write project reports?', 14)
('tell me the best way to write a project report?', 11)
('show the best ways to write a project report?', 11)
('what is best way to write a project report?', 9)


## Multiple phrases at a time

In [9]:
phrases = ["I am plaining to buy plants", 
           "I am not able to run the command",
           "There is no output of the cell"]

for phrase in phrases:
  print("-"*100)
  print("Input_phrase: ", phrase)
  print("-"*100)
  para_phrases = parrot.augment(input_phrase=phrase)
  for para_phrase in para_phrases:
   print(para_phrase)

----------------------------------------------------------------------------------------------------
Input_phrase:  I am plaining to buy plants
----------------------------------------------------------------------------------------------------
("i'm trying to buy some flowers", 29)
("i'm resolute to buy plants", 22)
('i want to buy plants', 21)
("i'm going to buy plants", 19)
("i'm begging to buy plants", 19)
('i am plainly intending to buy plants', 18)
----------------------------------------------------------------------------------------------------
Input_phrase:  I am not able to run the command
----------------------------------------------------------------------------------------------------
("i can't launch the command", 26)
("i can't run the command", 24)
("i'm not able to run this command", 16)
("i'm not able to run the command", 14)
('i am not able to run the command', 12)
----------------------------------------------------------------------------------------------------
I

Piyush Pathak


🔵 **Connect with me:**

[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&style=social&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/piyushpathak03/)
[![Website](https://img.shields.io/badge/Facebook-1877F2?style=for-the-badge&style=social&logo=facebook&logoColor=white)](https://anirudhrapathak3.wixsite.com/piyush)
[![Blogging](https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge&style=social&logo=instagram&logoColor=white)](https://medium.com/@piyushpathak03)