source: https://kantschants.com/paraphrasing-with-transformer-t5-bart-pegasus

#### BART

In [3]:
# imports
from transformers import BartTokenizer, BartForConditionalGeneration

# Load pre-trained BART model and tokenizer
model_name = 'facebook/bart-base'
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# Set up input sentences
sentences = [
    "She was a storm, not the kind you run from, but the kind you chase.",
    "In the end, we only regret the chances we didn't take.",
    "She wasn't looking for a knight, she was looking for a sword.",
    "I dreamt I am running on sand in the night.",
    "Long long ago, there lived a king and a queen. For a long time, they had no children.",
    "I am typing the best article on paraphrasing with Transformers."
]

# Paraphrase the sentences
for sentence in sentences:
    # Tokenize the input sentence
    input_ids = tokenizer.encode(sentence, return_tensors='pt')

    # Generate paraphrased sentence
    paraphrase_ids = model.generate(input_ids, num_beams=5, max_length=100, early_stopping=True)

    # Decode and print the paraphrased sentence
    paraphrase = tokenizer.decode(paraphrase_ids[0], skip_special_tokens=True)
    print(f"Original:   {sentence}")
    print(f"Paraphrase: {paraphrase}")
    print()


Original:   She was a storm, not the kind you run from, but the kind you chase.
Paraphrase: She was a storm, not the kind you run from, but the kind that you chase.

Original:   In the end, we only regret the chances we didn't take.
Paraphrase: In the end, we only regret the chances we didn't take.

Original:   She wasn't looking for a knight, she was looking for a sword.
Paraphrase: She wasn't looking at a knight, she was looking for a sword.

Original:   I dreamt I am running on sand in the night.
Paraphrase: I dreamt I am running on sand in the night.

Original:   Long long ago, there lived a king and a queen. For a long time, they had no children.
Paraphrase: Long long ago, there lived a king and a queen. For a long time, they had no children.

Original:   I am typing the best article on paraphrasing with Transformers.
Paraphrase: I am typing the best article on paraphrasing with Transformers.



#### T5 (Text-to-Text Transfer Transformer)

In [2]:
!pip install SentencePiece

Collecting SentencePiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: SentencePiece
Successfully installed SentencePiece-0.1.99


In [1]:
# imports
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load pre-trained T5 Base model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-base", model_max_length=1024)
model = T5ForConditionalGeneration.from_pretrained("t5-base")

# Set up input sentences
sentences = [
    "She was a storm, not the kind you run from, but the kind you chase.",
    "In the end, we only regret the chances we didn't take.",
    "She wasn't looking for a knight, she was looking for a sword.",
    "I dreamt I am running on sand in the night.",
    "Long long ago, there lived a king and a queen. For a long time, they had no children.",
    "I am typing the best article on paraphrasing with Transformers."
]

# Paraphrase the sentences
for sentence in sentences:
    # Tokenize the input sentence
    input_ids = tokenizer.encode(sentence, return_tensors='pt')

    # Generate paraphrased sentence
    paraphrase_ids = model.generate(input_ids, num_beams=5, max_length=100, early_stopping=True)

    # Decode and print the paraphrased sentence
    paraphrase = tokenizer.decode(paraphrase_ids[0], skip_special_tokens=True)
    print(f"Original:   {sentence}")
    print(f"Paraphrase: {paraphrase}")
    print()


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Original:   She was a storm, not the kind you run from, but the kind you chase.
Paraphrase: She was a storm, not the kind you run from, but the kind you chase.

Original:   In the end, we only regret the chances we didn't take.
Paraphrase: We only regret the chances we didn't take.

Original:   She wasn't looking for a knight, she was looking for a sword.
Paraphrase: She wasn't looking for a knight, she was looking for a sword.

Original:   I dreamt I am running on sand in the night.
Paraphrase: I dreamt I am running on sand in the night.

Original:   Long long ago, there lived a king and a queen. For a long time, they had no children.
Paraphrase: Long long ago, there lived a king and a queen. Long long ago, they had no children.

Original:   I am typing the best article on paraphrasing with Transformers.
Paraphrase: Today I am typing the best article on paraphrasing with Transformers.



#### Pegasus

In [2]:
# imports
from transformers import PegasusTokenizer, PegasusForConditionalGeneration

# load pre-trained Pegasus Paraphrase model and tokenizer
tokenizer = PegasusTokenizer.from_pretrained("tuner007/pegasus_paraphrase")
model = PegasusForConditionalGeneration.from_pretrained("tuner007/pegasus_paraphrase")

# input sentences
sentences = [
    "She was a storm, not the kind you run from, but the kind you chase.",
    "She wasn't looking for a knight, she was looking for a sword.",
    "In the end, we only regret the chances we didn't take.",
    "I dreamt I am running on sand in the night",
    "Long long ago, there lived a king and a queen. For a long time, they had no children.",
    "I am typing the best article on paraphrasing with Transformers."
]

# Paraphrase the sentences
for sentence in sentences:
    # Tokenize the input sentence
    input_ids = tokenizer.encode(sentence, return_tensors='pt')

    # Generate paraphrased sentence
    paraphrase_ids = model.generate(input_ids, num_beams=5, max_length=100, early_stopping=True)

    # Decode and print the paraphrased sentence
    paraphrase = tokenizer.decode(paraphrase_ids[0], skip_special_tokens=True)
    print(f"Original: {sentence}")
    print(f"Paraphrase: {paraphrase}")
    print()


tokenizer_config.json:   0%|          | 0.00/86.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at tuner007/pegasus_paraphrase and are newly initialized: ['model.encoder.embed_positions.weight', 'model.decoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Original: She was a storm, not the kind you run from, but the kind you chase.
Paraphrase: She was a storm, not the kind you run from, but the kind you chase.

Original: She wasn't looking for a knight, she was looking for a sword.
Paraphrase: She was looking for a sword, not a knight.

Original: In the end, we only regret the chances we didn't take.
Paraphrase: We regret the chances we didn't take.

Original: I dreamt I am running on sand in the night
Paraphrase: I ran on the sand in the night.

Original: Long long ago, there lived a king and a queen. For a long time, they had no children.
Paraphrase: They had no children for a long time.

Original: I am typing the best article on paraphrasing with Transformers.
Paraphrase: I am writing the best article on the subject.



#### Paraphrasing a Paragraph

In [1]:
# imports
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

# Load the Pegasus Paraphrase model and tokenizer
model_name = "tuner007/pegasus_paraphrase"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

# function to paraphrase long texts by adjusting the input length
def paraphrase_paragraph(text):

    # Split the text into sentences
    sentences = text.split(".")
    paraphrases = []

    for sentence in sentences:
        # Clean up sentences

        # remove extra whitespace
        sentence = sentence.strip()

        # filter out empty sentences
        if len(sentence) == 0:
            continue

        # Tokenize the sentence
        inputs = tokenizer.encode_plus(sentence, return_tensors="pt", truncation=True, max_length=512)

        input_ids = inputs["input_ids"]
        attention_mask = inputs["attention_mask"]

        # paraphrase
        paraphrase = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            num_beams=4,
            max_length=100,
            early_stopping=True
        )[0]
        paraphrased_text = tokenizer.decode(paraphrase, skip_special_tokens=True)

        paraphrases.append(paraphrased_text)

    # Combine the paraphrases
    combined_paraphrase = " ".join(paraphrases)

    return combined_paraphrase

# Example usage
text = "As Sir Henry and I sat at breakfast, the sunlight flooded in through the high mullioned windows, throwing watery patches of color from the coats of arms which covered them. The dark panelling glowed like bronze in the golden rays, and it was hard to realize that this was indeed the chamber which had struck such a gloom into our souls upon the evening before. But the evening before, Sir Henry's nerves were still handled the stimulant of suspense, and he came to breakfast, his cheeks flushed in the exhilaration of the early chase."
paraphrase = paraphrase_paragraph(text)
print(paraphrase)


tokenizer_config.json:   0%|          | 0.00/86.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at tuner007/pegasus_paraphrase and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


As Sir Henry and I sat at breakfast, the sunlight flooded in through the high windows, causing watery patches of color from the coats of arms. The dark panelling glowed like bronze in the golden rays, and it was hard to see that it was the chamber which had struck such a gloom into our souls the evening before. The evening before, Sir Henry's nerves were still handled and he came to breakfast, his cheeks flushed from the excitement of the early chase.
