<a href="https://www.kaggle.com/fanbyprinciple/paraphrasing-using-hugging-face?scriptVersionId=88961426" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [15]:
!pip install -q sentencepiece

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [16]:
from transformers import *

# using Pegasus

In [17]:
model_pegasus = PegasusForConditionalGeneration.from_pretrained("tuner007/pegasus_paraphrase")
tokenizer_pegasus = PegasusTokenizerFast.from_pretrained("tuner007/pegasus_paraphrase")

In [18]:
def get_paraphrased_sentences(model, tokenizer, sentence, num_return_sequences=5, num_beams=5):
  # tokenize the text to be form of a list of token IDs
  inputs = tokenizer([sentence], truncation=True, padding="longest", return_tensors="pt")
  # generate the paraphrased sentences
  outputs = model.generate(
    **inputs,
    num_beams=num_beams,
    num_return_sequences=num_return_sequences,
  )
  # decode the generated sentences using the tokenizer to get them back to text
  return tokenizer.batch_decode(outputs, skip_special_tokens=True)

In [19]:
sentence = "Learning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes, and preferences."
get_paraphrased_sentences(model, tokenizer, sentence, num_beams=10, num_return_sequences=10)

['Learning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes',
 'Learning is the process of acquiring new knowledge, understanding, behaviors, skills, values, attitudes',
 'Learning is the process of acquiring new understanding, knowledge, behavior, skills, values, attitudes',
 'Learning is the process of learning new understanding, knowledge, behaviors, skills, values, attitudes and',
 'Learning is the process of gaining new understanding, knowledge, behaviors, skills, values, attitudes',
 'Learning is the process of acquiring new understanding, knowledge, behaviours, skills, values,',
 'Learning is the process of acquiring new understanding, knowledge, habits, skills, values, attitudes',
 'Learning is the process of the acquisition of new understanding, knowledge, behaviors, skills, values,',
 'Learning is the process of acquiring new understandings, knowledge, behaviors, skills, values,',
 'Learning is the process of acquiring new underst

# Using PAWS

In [20]:
tokenizer = AutoTokenizer.from_pretrained("Vamsi/T5_Paraphrase_Paws")
model = AutoModelForSeq2SeqLM.from_pretrained("Vamsi/T5_Paraphrase_Paws")

In [21]:
get_paraphrased_sentences(model, tokenizer, "One of the best ways to learn is to teach what you've already learned")

["One of the best ways to learn is to teach what you've already learned.",
 'One of the best ways to learn is to teach what you have already learned.',
 'One of the best ways to learn is to teach what you already know.',
 'One of the best ways to learn is to teach what you already learned.',
 "One of the best ways to learn is to teach what you've already learned."]

Out of the two pegasus seems promising.

In [22]:
input_text = """
What started as oxford speech on whether Britain owns reparations to its former colonies has culminated into this seething yet persuasive compendium of arguments for the topic.

The book refutes arguments of The Raj apologists that despite its despotic nature it bestowed India with seeds of modern liberal democratic ideals and built infrastructure that led to what India is today. British did this under the white man’s burden to give back, so they claim.

British never cared for the interest of the Indian people. They resorted to tactical policies of divide & rule, discriminatory recruitment, xenophobic enforcement of laws and procedures, to prolong their stay in the country and ensure continued loot of its resources.

Whether it was trade and agriculture policies, infrastructure development including railways and irrigation, criminal justice system or tax regime, at the heart of every policy was the imperial mindset aimed at enriching the coffers of the company or crown government later on.

Unlike the previous despots in the country such as the Mughals or the Delhi Sultans, the British never assimilated in the Indian milieu and never had the intent to do so. This alienation was also reflected in their policies which never benefitted the Indians.

While cricket and English language may be cherished by many Indians, they too were a byproduct of the British presence rather than a conscious effort of their percolation into the Indian society. So were the other legacies such as railways infrastructure or colonial laws.

Finally, it is not important to arrive at the amount of reparation which the British owe to Indian people. More important is atonement and sincere apology that British owe to Indians and other former colonies.
"""

In [24]:
def get_paraphrased_passage( input_text, model=model_pegasus, tokenizer=tokenizer_pegasus, num_return_sequences=5, num_beams=5):
    
    sentences = input_text.split('.')
    
    all_sentences = []
    
    for sentence in sentences:
        # tokenize the text to be form of a list of token IDs
        inputs = tokenizer([sentence], truncation=True, padding="longest", return_tensors="pt")
        # generate the paraphrased sentences
        outputs = model.generate(
            **inputs,
            num_beams=num_beams,
            num_return_sequences=num_return_sequences,
          )
        # decode the generated sentences using the tokenizer to get them back to text
        result = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
        result = result[0].upper() + result[1:]
        all_sentences.append(result)
    
    return "".join(all_sentences)

In [26]:
input_text

'\nWhat started as oxford speech on whether Britain owns reparations to its former colonies has culminated into this seething yet persuasive compendium of arguments for the topic.\n\nThe book refutes arguments of The Raj apologists that despite its despotic nature it bestowed India with seeds of modern liberal democratic ideals and built infrastructure that led to what India is today. British did this under the white man’s burden to give back, so they claim.\n\nBritish never cared for the interest of the Indian people. They resorted to tactical policies of divide & rule, discriminatory recruitment, xenophobic enforcement of laws and procedures, to prolong their stay in the country and ensure continued loot of its resources.\n\nWhether it was trade and agriculture policies, infrastructure development including railways and irrigation, criminal justice system or tax regime, at the heart of every policy was the imperial mindset aimed at enriching the coffers of the company or crown govern

In [25]:
get_paraphrased_passage(input_text)

"This seething yet persuasive compendium of arguments for the topic was the culmination of what started as an Oxford speech..The Raj apologists argue that India was given seeds of modern liberal democratic ideals and built infrastructure that led to what India is today..British did this under the burden of the white man..The interest of the Indian people was never cared for by the British..They use tactical policies to prolong their stay in the country and ensure continued loot of the country's resources..Whether it was trade and agriculture policies, infrastructure development including railways and irrigation, criminal justice system or tax regime, at the heart of every policy was the imperial mindset aimed at enriching the coffers of the company or crown government later on..The British never had the intent to integrate into the Indian culture, unlike the Mughals or the Delhi Sultans..Their policies never benefited the Indians..While cricket and English language may be cherished by 