<a href="https://colab.research.google.com/github/k-dinakaran/automation-of-wordpress-post-publication-using-AI-tools/blob/main/Develop_a_Model_for_Summarizing_Long_Form_Content_into_Highlights.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Import necessary libraries
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the pre-trained T5 model and tokenizer from Hugging Face
model_name = "t5-small"  # You can also use 't5-base' or 't5-large' for better results
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

def summarize_content(text, max_length=150, min_length=30):
    """
    Summarizes long-form content into key highlights using the T5 model.

    Parameters:
        text (str): The long-form content to summarize.
        max_length (int): The maximum length of the generated summary.
        min_length (int): The minimum length of the generated summary.

    Returns:
        formatted_summary (str): The summarized key highlights with each point on a new line.
    """

    # Preprocess the input text for the model (prefix with "summarize: ")
    input_text = "summarize: " + text
    inputs = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)

    # Generate summary
    summary_ids = model.generate(inputs, max_length=max_length, min_length=min_length, length_penalty=2.0, num_beams=4, early_stopping=True)

    # Decode the summary
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    # Replace commas with newlines for better formatting
    formatted_summary = summary.replace('.', '.\n')

    return formatted_summary

# Get user input (option for long-form content)
choice = input("Do you want to enter the content manually or from a file? (Type 'manual' or 'file'): ").strip().lower()

if choice == "manual":
    # Get the long-form content input from the user manually
    long_form_content = input("\nPlease enter the long-form content you want to summarize:\n")
elif choice == "file":
    # Get the file path and read the long-form content
    file_path = input("\nPlease enter the file path containing the long-form content:\n")
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            long_form_content = file.read()
    except FileNotFoundError:
        print("File not found! Please check the file path and try again.")
        exit(1)
else:
    print("Invalid option. Please restart and type 'manual' or 'file'.")
    exit(1)

# Generate summary
summary_output = summarize_content(long_form_content)

# Print the formatted summary
print("\nSummary:\n", summary_output)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Do you want to enter the content manually or from a file? (Type 'manual' or 'file'): manual

Please enter the long-form content you want to summarize:
Remote work has become increasingly popular over the past few years. It offers flexibility, allowing employees to work from anywhere,  and can increase productivity by eliminating commute times. However, it also comes with challenges, such as managing work-life balance  and maintaining communication with colleagues. Companies have started adopting hybrid models to combine the benefits of remote work and  in-office collaboration. The future of work is likely to involve more flexible arrangements as technology continues to improve.

Summary:
 remote work has become increasingly popular over the past few years.
 it offers flexibility, allowing employees to work from anywhere.
 but it also comes with challenges such as managing work-life balance and maintaining communication with colleagues.

