<a href="https://colab.research.google.com/github/sadrireza/Neural-Networks/blob/main/Transformer_for_Text_Correction_T5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Transformer based Model for Text Correction - T5

Step 1: Install the Required Libraries

In [None]:
# Install the 'transformers' library, which provides pre-trained models for NLP tasks.
!pip install transformers
# Install the 'torch' library (PyTorch), a deep learning framework used by transformers.
!pip install torch
# Install the 'nltk' library (Natural Language Toolkit) for text processing and analysis.
!pip install nltk



Step 2: Load the Model and Define the Correction Function

In [None]:
# Import necessary components from the 'transformers' library
from transformers import T5ForConditionalGeneration, T5Tokenizer
# Import the 'nltk' library for natural language processing tasks
import nltk
# Download the 'punkt' dataset for sentence tokenization
nltk.download('punkt')
# Download the 'punkt_tab' dataset which might offer more robust sentence tokenization #More robust
nltk.download('punkt_tab')
# Import the 'sent_tokenize' function for splitting text into sentences
from nltk.tokenize import sent_tokenize

# Define the name of the pre-trained model to be used
model_name = 'prithivida/grammar_error_correcter_v1'
# Initialize the tokenizer for the specified model
tokenizer = T5Tokenizer.from_pretrained(model_name)
# Load the pre-trained model for grammar correction
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Define a function to correct grammar in a single piece of text
def correct_grammar(text):
    # Prepend "gec: " to the input text, indicating a grammar error correction task
    input_text = "gec: " + text
    # Encode the input text using the tokenizer
    input_ids = tokenizer.encode(input_text, return_tensors='pt', max_length=512, truncation=True)

    # Generate corrected text using the model
    outputs = model.generate(input_ids)
    # Decode the model's output to obtain the corrected text
    corrected_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Return the corrected text
    return corrected_text

# Define a function to correct grammar in a larger text by processing it sentence-by-sentence
def correct_text_by_sentences(text):
    # Split the input text into sentences
    sentences = sent_tokenize(text)
    # Apply grammar correction to each sentence
    corrected_sentences = [correct_grammar(sentence) for sentence in sentences]
    # Join the corrected sentences back into a single string
    return " ".join(corrected_sentences)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Step 3: Upload the Input Text File

In [None]:
# Import the 'files' module from the 'google.colab' library, which is specific to Google Colab environments.
from google.colab import files

# This line calls the 'upload()' function from the 'files' module.
# It opens a file upload dialog in the Colab environment, allowing the user to select a file from their local machine.
# The uploaded file is stored in the 'uploaded' variable as a dictionary-like object.
uploaded = files.upload()

# This line extracts the filename of the uploaded file.
# 'iter(uploaded)' creates an iterator over the keys (filenames) in the 'uploaded' dictionary.
# 'next()' retrieves the first key (filename) from the iterator.
# This filename is then assigned to the 'input_filename' variable.
input_filename = next(iter(uploaded))

# This line opens the uploaded file for reading ('r' mode) and assigns it to the 'file' variable.
# The 'with' statement ensures the file is automatically closed when the block of code within it is finished.
with open(input_filename, 'r') as file:
    # This line reads the entire contents of the file and assigns it to the 'complex_text' variable.
    complex_text = file.read()

    # This line prints the string "Input text:" to the console.
    print("Input text:")
    # This line prints the contents of the 'complex_text' variable (the uploaded file's content) to the console.
    print(complex_text)

Saving input_text.txt to input_text.txt
Input text:
Despite of the heavy rain, we went to the park. We seen a lots of ducks swiming in the pond. My friend, she bringed a loaf of bread to feed them. The ducks was very excited, they quacking loudly and chasing after the bread crumbs. We had a really fun time, even though it were cold and wet.


Step 4: Correct the Text and Save It to a File


In [None]:
# Correct the text sentence by sentence
corrected_text = correct_text_by_sentences(complex_text) # Call the function 'correct_text_by_sentences' to correct the grammar in the 'complex_text' and store the result in 'corrected_text'

# Save corrected text to output file
output_filename = 'corrected_text.txt'  # Define the name of the output file as 'corrected_text.txt'
with open(output_filename, 'w') as file:  # Open the output file in write mode ('w') and assign it to the variable 'file'
    file.write(corrected_text)  # Write the 'corrected_text' to the output file

print("Corrected text written to", output_filename)  # Print a message to the console indicating where the corrected text has been saved
print("Corrected text:")  # Print a message to the console indicating that the following output is the corrected text
print(corrected_text)  # Print the 'corrected_text' to the console



Corrected text written to corrected_text.txt
Corrected text:
Despite the heavy rain, we went to the park. We saw lots of ducks swimming in the pond. My friend, she brought a loaf of bread to feed them. The ducks were very excited, they quack loudly and chasing after the bread We had a really fun time, even though it was cold and wet.


Step 5: Download the Corrected Text File

In [None]:
files.download(output_filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>