To solve the paraphrase generation problem using traditional NLP techniques, we can leverage several strategies without relying on deep learning models. Here’s a breakdown of a pipeline you can implement:

1. Synonym Replacement (Word-level Paraphrasing):

    One common technique is to replace words in the input text with their synonyms while maintaining grammatical structure. This can be done using a thesaurus like WordNet.

  Steps:
  1. Tokenize the sentence into words.
  2. For each word, find synonyms using a lexical database like WordNet.
  3. Replace the word with a synonym that fits the context.


In [7]:
%%capture
import nltk
from nltk.corpus import wordnet

# Download WordNet
nltk.download('wordnet')
nltk.download('punkt')
import random

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [8]:
def get_synonyms(word):
    synonyms = []
    for syn in wordnet.synsets(word):
        for lemma in syn.lemmas():
            synonyms.append(lemma.name())
    return set(synonyms)

def synonym_replacement(sentence):
    words = nltk.word_tokenize(sentence)
    paraphrased_sentence = []
    for word in words:
        synonyms = get_synonyms(word)
        if synonyms:
            paraphrased_sentence.append(random.choice(list(synonyms))) # Replace with a synonym
        else:
            paraphrased_sentence.append(word)
    return ' '.join(paraphrased_sentence)

sentence = "The quick brown fox jumps over the lazy dog."
print(synonym_replacement(sentence))


The immediate browned trick spring complete the slothful dog-iron .


funny ;)

2. Rule-based Sentence Restructuring:
Another approach is to use syntactic transformations, like changing the voice of the sentence or using passive constructions.

    Active to Passive:
- Transform active sentences into passive voice and vice versa.


In [22]:
%%capture
import spacy

# Load the spaCy model for advanced parsing
nlp = spacy.load("en_core_web_sm")


In [23]:

def active_to_passive(sentence):
    # Parse the sentence using spaCy to identify grammatical components
    doc = nlp(sentence)

    # Variables to store subject, verb, and object
    subject = None
    verb = None
    obj = None

    # Iterate through tokens and assign subject, verb, and object based on dependency parsing
    for token in doc:
        if token.dep_ == "nsubj":  # Nominal subject
            subject = token
        elif token.dep_ == "dobj":  # Direct object
            obj = token
        elif token.pos_ == "VERB":  # Verb
            verb = token

    # If we can't find a subject, verb, and object, return the original sentence
    if not subject or not verb or not obj:
        return sentence

    # Construct the passive voice sentence
    passive_sentence = f"{obj} was {verb.lemma_}ed by {subject}"

    return passive_sentence

# Example usage
sentence = "The fox chased the rabbit."
print(active_to_passive(sentence))


rabbit was chaseed by fox


3. Shuffling Phrases (Phrase-level Paraphrasing):

Use dependency parsing or chunking to identify meaningful phrases in a sentence (noun phrases, verb phrases, etc.) and then rearrange them to create a new sentence.

Example using spaCy for dependency parsing:

In [24]:
import spacy
nlp = spacy.load("en_core_web_sm")

def phrase_shuffling(sentence):
    doc = nlp(sentence)
    phrases = [chunk.text for chunk in doc.noun_chunks]
    return ' '.join(phrases[::-1])  # Reverse the order of noun phrases for variation

sentence = "The quick brown fox jumps over the lazy dog."
print(phrase_shuffling(sentence))


the lazy dog The quick brown fox


4. Back-Translation (Semi-traditional NLP):

Although not purely a traditional NLP technique, back-translation is a simple way to generate paraphrases. It involves translating a sentence into another language and then back into the original language using a translation API or library.



In [30]:
%%capture
!pip install deep-translator
from deep_translator import GoogleTranslator

In [31]:
def back_translation(sentence, src_lang='en', intermediate_lang='fr'):
    try:
        # Translate from source to intermediate language
        intermediate_translation = GoogleTranslator(source=src_lang, target=intermediate_lang).translate(sentence)

        # Translate back from intermediate to source language
        back_translated = GoogleTranslator(source=intermediate_lang, target=src_lang).translate(intermediate_translation)

        return back_translated
    except Exception as e:
        print(f"Error during translation: {e}")
        return sentence  # Return the original sentence if there's an error

sentence = "The quick brown fox jumps over the lazy dog."
print(back_translation(sentence))

The quick brown fox jumps over the lazy dog.


In [35]:
# Combined Paraphrasing Function
def paraphrase(sentence):
    print(f"Original: {sentence}")

    # Step 1: Synonym Replacement
    # synonym_paraphrase = synonym_replacement(sentence)
    # print(f"After Synonym Replacement: {synonym_paraphrase}")

    # Step 2: Phrase Shuffling
    shuffled_paraphrase = phrase_shuffling(sentence)
    # print(f"After Phrase Shuffling: {shuffled_paraphrase}")

    # Step 3: Active to Passive (if applicable)
    passive_paraphrase = active_to_passive(shuffled_paraphrase)
    # print(f"After Active to Passive: {passive_paraphrase}")

    # Step 4: Back-Translation for final variation
    final_paraphrase = back_translation(passive_paraphrase)
    # print(f"After Back-Translation: {final_paraphrase}")

    return final_paraphrase

# Example usage
sentence = "The quick brown fox jumps over the lazy dog."
paraphrased_sentence = paraphrase(sentence)
print(f"Final Paraphrase: {paraphrased_sentence}")

Original: The quick brown fox jumps over the lazy dog.
Final Paraphrase: the lazy dog ​​the quick brown fox


lets paraphrase the text provided

In [37]:
input_text = """A cover letter is a formal document that accompanies your resume when you apply for a job."""

In [38]:
paraphrased_sentence = paraphrase(input_text)
print(f"Final Paraphrase: {paraphrased_sentence}")

Original: A cover letter is a formal document that accompanies your resume when you apply for a job.
Final Paraphrase: a job your CV an official document a cover letter


its not great. We will have to fine tune a transformer