<a href="https://colab.research.google.com/github/sadrireza/Neural-Networks/blob/main/Synonym%20Suggestion%3A%20Roberta%20-NLTK.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Synonym Suggestion: Roberta -NLTK

#### Section 1: Install Necessary Libraries


In [None]:
!pip install transformers
!pip install nltk



#### Section 2: Import Libraries and Download NLTK Data


In [None]:
import nltk
from transformers import RobertaTokenizer, RobertaForMaskedLM
import torch
from google.colab import files
from nltk.corpus import wordnet
import textwrap

nltk.download('wordnet')
nltk.download('punkt_tab')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

#### Section 3: Upload Text File


In [None]:
uploaded = files.upload()

Saving Sample1.txt to Sample1.txt


#### Section 4: Read and Preprocess the Uploaded Text


In [None]:
filename = next(iter(uploaded))
with open(filename, 'r') as file:
    text = file.read()

#### Section 5: Tokenize Text and Mask Tokens


In [None]:
import random

def mask_random_words(text, p=0.2):
    nltk.download('punkt')
    words = nltk.word_tokenize(text)
    masked_indices = [i for i in range(len(words)) if random.random() < p]
    masked_words = [words[i] for i in masked_indices]
    for i in masked_indices:
        words[i] = "<mask>"
    masked_text = ' '.join(words)
    return masked_text, masked_words

masked_text, masked_words = mask_random_words(text)
print("Masked Text:", masked_text)
print("Masked Words:", masked_words)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Masked Text: Human beings are <mask> of a whole , In <mask> of one essence and <mask> <mask> If one member is afflicted with pain , Other <mask> uneasy <mask> remain . If you have no <mask> for human <mask> , The name of human you can not retain .
Masked Words: ['members', 'creation', 'soul', '.', 'members', 'will', 'sympathy', 'pain']


#### Section 6: Load RoBERTa Model and Predict Synonyms


In [None]:
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaForMaskedLM.from_pretrained('roberta-base')

inputs = tokenizer(masked_text, return_tensors='pt')

with torch.no_grad():
    outputs = model(**inputs)

predictions = outputs.logits

# Get the predicted tokens
predicted_tokens = []
input_ids = inputs['input_ids'][0].tolist()
for idx, word in enumerate(masked_words):
    masked_index = input_ids.index(tokenizer.mask_token_id)
    predicted_token_id = predictions[0, masked_index].argmax(axis=-1).item()
    predicted_token = tokenizer.decode(predicted_token_id)
    predicted_tokens.append(predicted_token)
    input_ids[masked_index] = predicted_token_id

print("Predicted Tokens:", predicted_tokens)

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Predicted Tokens: [' members', ' possession', ' body', ' .', ' members', ' to', ' respect', ' beings']


Section 7: Replace Masked Words with Predicted Synonyms and Display Result

In [None]:
output_text = masked_text
for synonym in predicted_tokens:
    output_text = output_text.replace("<mask>", synonym, 1)

def print_wrapped(text, width):
    for line in textwrap.wrap(text, width=width):
        print(line)

print("Original Text:")
print_wrapped(text, 50)
print("\nModified Text:")
print_wrapped(output_text, 50)

Original Text:
Human beings are members of a whole, In creation
of one essence and soul. If one member is
afflicted with pain, Other members uneasy will
remain. If you have no sympathy for human pain,
The name of human you cannot retain.

Modified Text:
Human beings are  members of a whole , In
possession of one essence and  body  . If one
member is afflicted with pain , Other  members
uneasy  to remain . If you have no  respect for
human  beings , The name of human you can not
retain .
