# Paraphrase Generation

In this notebook, we will walk through different ways to create a custom paraphrase generator. We will start with simple technique of synonyms of few POS tags and then diving into the world of conditional generation via different models such as BART, T5 Finetuned, Flan-T5, ChatGPT and Flacon-7B.

We will also cosine similarity across sentence-transformers to detemine similarity across.

In [9]:
!pip install --upgrade torch

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting torch
  Downloading torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting sympy (from torch)
  Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.7/5.7 MB[0m [31m110.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m94.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinu

In [None]:
!pip install -U pip setuptools wheel

In [None]:
!pip install -U 'spacy[cuda11x]'

In [1]:
import nltk
import spacy
import torch
from sentence_transformers import SentenceTransformer
from transformers import BartForConditionalGeneration, BartTokenizer

In [2]:
from sentence_transformers import SentenceTransformer

m2 = SentenceTransformer("sentence-transformers/LaBSE")

In [3]:
context = """A cover letter is a formal document that accompanies your resume when you apply for a job. It serves as
an introduction and provides additional context for your application. Here’s a breakdown of its various
aspects:
Purpose
The primary purpose of a cover letter is to introduce yourself to the hiring manager and to provide context
for your resume. It allows you to elaborate on your qualifications, skills, and experiences in a way that
your resume may not fully capture. It’s also an opportunity to express your enthusiasm for the role and the
company, and to explain why you would be a good fit.
Content
A typical cover letter includes the following sections:
1. Header: Includes your contact information, the date, and the employer’s contact information.
2. Salutation: A greeting to the hiring manager, preferably personalized with their name.
3. Introduction: Briefly introduces who you are and the position you’re applying for.
4. Body: This is the core of your cover letter where you discuss your qualifications, experiences, and
skills that make you suitable for the job. You can also mention how you can contribute to the company.
5. Conclusion: Summarizes your points and reiterates your enthusiasm for the role. You can also include
a call to action, like asking for an interview.
6. Signature: A polite closing (“Sincerely,” “Best regards,” etc.) followed by your name.
Significance in the Job Application Process
The cover letter is often the first document that a hiring manager will read, so it sets the tone for your
entire application. It provides you with a chance to stand out among other applicants and to make a
strong first impression. Some employers specifically require a cover letter, and failing to include one could
result in your application being disregarded.
In summary, a cover letter is an essential component of a job application that serves to introduce you,
elaborate on your qualifications, and make a compelling case for why you should be considered for the
position."""

## Paraphrase Generation via Synonyms

In this approach, we will try to generate the paraphrase based on synonyms of Noun and Verb parts-of-speech to demonstrate simple yet adequate solution for Paraphrase Generation

In [3]:
from nltk.corpus import wordnet

In [4]:
nlp = spacy.load("en_core_web_sm")

In [48]:
context2 = context
doc = nlp(context2)

for token in doc:
    if token.pos_ == 'VERB':
        syns = wordnet.synsets(token.text) 
        syn_val = syns[0].lemmas()[0].name()
        
        if syn_val.lower() != token.text.lower():
            if token.text not in ['Header', 'Salutation', 'Introduction', 'Body', 'Conclusion', 'Signature']:
                context2 = context2.replace(token.text, syn_val)

In [49]:
print(context2)

A cover letter is a formal document that attach_to your resume when you use for a job. It serve as
an introduction and supply additional context for your application. Here’s a breakdown of its various
aspects:
Purpose
The primary purpose of a cover letter is to introduce yourself to the hire manager and to supply context
for your resume. It let you to elaborate on your qualifications, skills, and experiences in a way that
your resume may not fully capture. It’s also an opportunity to express your enthusiasm for the role and the
company, and to explain why you would be a good fit.
Content
A typical cover letter include the following sections:
1. Header: include your contact information, the date, and the employer’s contact information.
2. Salutation: A greeting to the hire manager, preferably personalize with their name.
3. Introduction: Briefly introduces who you are and the position you’re useing for.
4. Body: This is the core of your cover letter where you discus your qualifications,

In [51]:
from sklearn.metrics.pairwise import cosine_similarity

embed1 = m2.encode(context)
embed2 = m2.encode(context2)

print(cosine_similarity([embed1, embed2]))

[[1.        0.9966116]
 [0.9966116 0.9999999]]


In [52]:
context2 = context
doc = nlp(context2)

for token in doc:
    if token.pos_ == 'NOUN':
        syns = wordnet.synsets(token.text) 
        syn_val = syns[0].lemmas()[0].name()
        
        if syn_val.lower() != token.text.lower():
            if token.text not in ['Header', 'Salutation', 'Introduction', 'Body', 'Conclusion', 'Signature']:
                context2 = context2.replace(token.text, syn_val)

In [53]:
print(context2)

A screen letter is a formal document that accompanies your sketch when you apply for a occupation. It serves as
an introduction and provides additional context for your application. Here’s a dislocation of its various
aspect:
Purpose
The primary purpose of a screen letter is to introduce yourself to the hire director and to provide context
for your sketch. It allows you to elaborate on your qualification, skill, and experience in a manner that
your sketch may not fully capture. It’s also an opportunity to express your enthusiasm for the function and the
company, and to explain why you would be a good fit.
Content
A typical screen letter includes the following section:
1. Header: Includes your contact information, the date, and the employer’s contact information.
2. Salutation: A greeting to the hire director, preferably personalized with their name.
3. Introduction: Briefly introduce who you are and the position you’re applying for.
4. Body: This is the core of your screen letter where

In [54]:
from sklearn.metrics.pairwise import cosine_similarity

embed1 = m2.encode(context)
embed2 = m2.encode(context2)

print(cosine_similarity([embed1, embed2]))

[[1.        0.9773418]
 [0.9773418 1.0000002]]


## Paraphrase Generation via BART Model

Here, basically we will generating text from BART model on section basis because the context length of the BART Model is small.

In [18]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = BartTokenizer.from_pretrained('eugenesiow/bart-paraphrase')
model = BartForConditionalGeneration.from_pretrained('eugenesiow/bart-paraphrase').to(device)

In [4]:
context3 = context

context3 = context3.split(".")

In [5]:
context3

['A cover letter is a formal document that accompanies your resume when you apply for a job',
 ' It serves as\nan introduction and provides additional context for your application',
 ' Here’s a breakdown of its various\naspects:\nPurpose\nThe primary purpose of a cover letter is to introduce yourself to the hiring manager and to provide context\nfor your resume',
 ' It allows you to elaborate on your qualifications, skills, and experiences in a way that\nyour resume may not fully capture',
 ' It’s also an opportunity to express your enthusiasm for the role and the\ncompany, and to explain why you would be a good fit',
 '\nContent\nA typical cover letter includes the following sections:\n1',
 ' Header: Includes your contact information, the date, and the employer’s contact information',
 '\n2',
 ' Salutation: A greeting to the hiring manager, preferably personalized with their name',
 '\n3',
 ' Introduction: Briefly introduces who you are and the position you’re applying for',
 '\n4',
 

In [39]:
import gc

output_pass = []

for text in context3:
    text = text.replace('\n', '</s>')
    batch = tokenizer(text, max_length=len(text), return_tensors='pt').to(device)
    generated_ids = model.generate(batch['input_ids'])
    generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
    
    generated_sentence = generated_sentence[0].replace('</s>', '\n')
    output_pass.append(generated_sentence)
    
    gc.collect()
    torch.cuda.empty_cache()
    
    
output_pass = ".".join(output_pass)

In [40]:
output_pass = output_pass.replace('..', '.\n')
print(output_pass)

A cover letter is a formal document that accompanies your resume when you apply for a job.
It serves as an introduction and provides additional context for your application.
The primary purpose of a cover letter is to introduce yourself to the hiring manager and provide context.It allows you to elaborate on your qualifications, skills, and experiences in a way that your.It’s also an opportunity to express your enthusiasm for the role and the company and.A typical cover letter includes the following sections:. Header: Contains your contact information, the date and the employer's contact information.
2.Salutation: A greeting to the hiring manager, preferably personalized with their name.
3.Briefly introduce who you are and the position you are applying for.
4.This is the core of your cover letter where you discuss your qualifications, experiences, and skills.How can I contribute to the company?.5.
Summarizes your points and reiterates your enthusiasm for the role.
You can also include a

In [41]:
del model
del tokenizer

In [42]:
torch.cuda.empty_cache()

In [45]:
len(output_pass.split(' '))

242

In [55]:
from sklearn.metrics.pairwise import cosine_similarity

embed1 = m2.encode(context)
embed2 = m2.encode(output_pass)

print(cosine_similarity([embed1, embed2]))

[[1.        0.9484808]
 [0.9484808 1.0000002]]


## Paraphrase Generation with T5-Base

In [56]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")
model = AutoModelForSeq2SeqLM.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [76]:
output_pass2 = []

for text in context3:
    
    input_ids = tokenizer(
        text,
        return_tensors="pt", padding="longest",
        max_length=len(text)
    ).input_ids
    
    outputs = model.generate(
        input_ids, repetition_penalty=10.0,
        num_return_sequences=1, no_repeat_ngram_size=2,
        num_beams=5, num_beam_groups=5, diversity_penalty=3.0
    )

    res = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    
    output_pass2.append(res[0])
    
    gc.collect()
    torch.cuda.empty_cache()

In [77]:
output_pass2

['A cover letter is a formal document that you write along with your resume when you apply for',
 'By serving as both an introduction and a means of providing additional context for your application, it',
 'The main objective of a cover letter is to introduce yourself and your skills to the hiring manager',
 'The ability to describe your qualifications, skills, and experiences in a way that your resume may',
 'Additionally, it’s a chance to showcase your enthusiasm for the position and the company,',
 'Content 1 The following is the standard material for a cover letter: 1 Cover Letters 1',
 "Included in this Header are your contact details, date, and employer's contact information",
 'The second part of the series is dedicated to a story from A. 2nd grade,',
 "Salutation: A personal greeting, preferably personalized with the hiring manager's name (e",
 "3D printing is a popular choice for any digital content, but it's not available",
 'Can you provide a brief description of yourself an

In [69]:
output_pass2 = ".".join(output_pass2)

In [82]:
from sklearn.metrics.pairwise import cosine_similarity

embed1 = m2.encode(context)
embed2 = m2.encode(output_pass2)

print(cosine_similarity([embed1, embed2]))

[[1.        0.9032941]
 [0.9032941 1.       ]]


## Paraphrase Generation with Flan-T5

In [80]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large")

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [88]:
output_pass3 = []

for text in context3:
    input_ids = tokenizer(
    f"Paraphrase this text such that the chnages are minimal: {text}", return_tensors="pt", padding="longest",
        max_length=len(text)).input_ids
    
    outputs = model.generate(input_ids)
    output_pass3.append(tokenizer.decode(outputs[0], skip_special_tokens=True))



In [89]:
output_pass3

['A cover letter is a formal document that accompanies your resume when you apply',
 'It serves as an introduction and provides additional context for your application',
 'What is a cover letter?',
 'It allows you to elaborate on your qualifications, skills, and experiences in a way that your',
 'It’s also an opportunity to express your enthusiasm for the role and the company, and to',
 'A typical cover letter includes the following sections:',
 'Header: Includes your contact information, the date, and the employer’s contact information',
 'The chnages are minimal 2',
 'Salutation: A greeting to the hiring manager, preferably personalized with their name',
 'The chnages are minimal 3',
 'Introduction: Briefly introduces who you are and the position you’re applying for',
 'The chnages are minimal 4',
 'The body of your cover letter is where you discuss your qualifications, experiences, and skills that make',
 'You can also mention how you can contribute to the company.',
 'The chnages a

In [90]:
output_pass3 = ".".join(output_pass3)

In [91]:
from sklearn.metrics.pairwise import cosine_similarity

embed1 = m2.encode(context)
embed2 = m2.encode(output_pass3)

print(cosine_similarity([embed1, embed2]))

[[1.         0.94099236]
 [0.94099236 1.0000001 ]]


## Paraphrase Generation with ChatGPT

In [92]:
!pip install openai

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting openai
  Downloading openai-0.28.1-py3-none-any.whl.metadata (11 kB)
Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openai
Successfully installed openai-0.28.1


In [93]:
import openai

openai.api_key = ""
openai.api_type = ""
openai.api_base = "" 
openai.api_version = "2023-03-15-preview"

In [95]:
message = [
        {"role": "system", "content": "I want you to act as a custom paraphrase generator. "
                                      "You will paraphrase the given piece of text with minimal changes to the language of text, just making it more cohesive."},
        {"role": "user", "content": "Now I want you to write a paraphase this text about {} in less than 400 words."},
    ]


message[1]['content'] = message[1]['content'].format(context)

In [97]:
completion = openai.ChatCompletion.create(engine='gpt-3-5', messages=message,
                                                      timeout=240, max_tokens=400, n=1, stop=None, temperature=0.2)

In [99]:
output_pass4 = completion['choices'][0]['message']['content']

In [100]:
from sklearn.metrics.pairwise import cosine_similarity

embed1 = m2.encode(context)
embed2 = m2.encode(output_pass4)

print(cosine_similarity([embed1, embed2]))

[[1.         0.90482885]
 [0.90482885 0.99999964]]


## Paraphrase Generation with Falcon-7B

In [6]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model = "tiiuae/falcon-7b-instruct"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model)

In [7]:
import transformers

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)





Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [8]:
import gc

output_pass5 = []

for text in context3:
    temp = []
    sequences = pipeline(
        text,
        max_length=len(text),
        eos_token_id=tokenizer.eos_token_id,
    )
    for seq in sequences:
        temp.append(seq['generated_text'])
        
    gc.collect()
    torch.cuda.empty_cache()
        
    temp = " ".join(temp)
    output_pass5.append(temp)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_tok

In [9]:
output_pass5

['A cover letter is a formal document that accompanies your resume when you apply for a job. It is an opportunity to introduce yourself to the employer and explain why you are a good fit for the job. A cover letter should be no longer than one page and should be tailored to the job you are applying for.\nThe cover letter should be no longer than one page and should be tailored to the job you are applying for. It should be no',
 ' It serves as\nan introduction and provides additional context for your application.\nUser ',
 ' Here’s a breakdown of its various\naspects:\nPurpose\nThe primary purpose of a cover letter is to introduce yourself to the hiring manager and to provide context\nfor your resume. It’s a way to give the hiring manager a snapshot of your skills and experience.\nFormat\nA cover letter should be no longer than one page and should be formatted in a way that is easy to read.\nIt should be typed and double-spaced.\nContent\nThe content of a cover letter should be tailored

In [10]:
output_pass5 = ".".join(output_pass5)

In [5]:
from sklearn.metrics.pairwise import cosine_similarity

embed1 = m2.encode(context)
embed2 = m2.encode(output_pass5)

print(cosine_similarity([embed1, embed2]))

[[1.0000001  0.87230396]
 [0.87230396 0.9999999 ]]
