<a href="https://colab.research.google.com/github/jmuzquiz/LLM-Cybersecurity-Summarizer/blob/main/LLM_Cybersecurity_Summarizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Articles Used**

In [None]:
#article 1
#https://www.imf.org/external/pubs/ft/fandd/2021/03/global-cyber-threat-to-financial-systems-maurer.htm

#article 2
#https://news.vt.edu/articles/2024/08/it-cybersecurity-protections-enhanced-2-factor.html

#article 3
#https://www.ifac.org/knowledge-gateway/discussion/cybersecurity-critical-all-organizations-large-and-small

#article 4
#https://news.vt.edu/articles/2024/10/cci-cyberarts-2024-exhibit.html

#article 5
#https://www.propublica.org/article/cybersecurity-expert-finds-another-flaw-in-georgia-voter-portal

**Install libraries**

In [None]:
!pip install requests beautifulsoup4 transformers newspaper3k nltk
#avg runtime 14 seconds

Collecting newspaper3k
  Downloading newspaper3k-0.2.8-py3-none-any.whl.metadata (11 kB)
Collecting cssselect>=0.9.2 (from newspaper3k)
  Downloading cssselect-1.2.0-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting feedparser>=5.2.1 (from newspaper3k)
  Downloading feedparser-6.0.11-py3-none-any.whl.metadata (2.4 kB)
Collecting tldextract>=2.0.1 (from newspaper3k)
  Downloading tldextract-5.1.2-py3-none-any.whl.metadata (11 kB)
Collecting feedfinder2>=0.0.4 (from newspaper3k)
  Downloading feedfinder2-0.0.4.tar.gz (3.3 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting jieba3k>=0.35.1 (from newspaper3k)
  Downloading jieba3k-0.35.1.zip (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m47.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tinysegmenter==0.3 (from newspaper3k)
  Downloading tinysegmenter-0.3.tar.gz (16 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Co

**Beautiful Soup Method**

In [None]:
# Import necessary libraries
# requests: To download web content from the specified URL
# BeautifulSoup: For parsing and extracting information from HTML content
# transformers: To use a pre-trained model (BART) for text summarization
import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# Extract and clean article text from a given URL
def extract_article_text(url):
    try:
        # Send a GET request to the URL and raise an error for any bad response codes (e.g., 404)
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses

        # Parse the HTML content of the article using BeautifulSoup
        soup = BeautifulSoup(response.text, 'html.parser')

        # Extract the text content from all paragraph tags in the HTML document
        paragraphs = soup.find_all('p')
        article_text = ' '.join([para.get_text() for para in paragraphs])

        # Clean up the extracted text by removing any extra spaces
        article_text = ' '.join(article_text.split())  # Normalize spaces
        return article_text.strip()  # Return the cleaned article text
    except Exception as e:
        # Handle errors that occur during the text extraction process
        return f"Failed to extract article text: {str(e)}"

# Summarize the text using a pre-trained transformer model
def summarize_text(text):
    # Initialize the pre-trained summarization model (BART Large CNN model)
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

    # Split the input text into chunks of up to 800 characters, as the model has input size limitations
    max_chunk_size = 800
    text_chunks = [text[i:i + max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Summarize each chunk and combine the resulting summaries into one
    summaries = []
    for chunk in text_chunks:
        summary = summarizer(chunk, max_length=80, min_length=30, do_sample=False)
        summaries.append(summary[0]['summary_text'])

    # Join the summaries and ensure the final result is clean and coherent
    final_summary = ' '.join(summaries)
    sentences = final_summary.split('.')
    sentences = [s.strip() for s in sentences if s]

    # Return a concise summary, limited to the first 5 sentences
    final_summary = '.\n'.join(sentences[:5])
    # Safeguard to ensure each summary ends with a period
    final_summary = final_summary + '.' if final_summary and not final_summary.endswith('.') else final_summary
    return final_summary

# Main execution flow
if __name__ == "__main__":
    # Prompt the user to enter the article URL
    url = input("Enter Article URL: ")

    # Extract the article text from the specified URL
    article_text = extract_article_text(url)

    # If the text extraction was successful, proceed to summarization
    if not article_text.startswith("Failed"):
        summary = summarize_text(article_text)  # Summarize the extracted text
        print("Summary:")
        print(summary)  # Display the final summary
    else:
        # Print the error message if extraction failed
        print(article_text)


**Newspaper3k Method**

In [None]:
# Import necessary libraries
# nltk: For natural language processing tasks like sentence tokenization
# Article: From the newspaper library, to easily handle web articles
import nltk
from newspaper import Article

# Download the 'punkt' resource from nltk, used for sentence tokenization in NLP tasks
nltk.download('punkt')

# Function to extract article information from a given URL
def extract_article_info(url):
    try:
        # Create an Article object with the provided URL
        article = Article(url)

        # Download the article's HTML content
        article.download()

        # Parse the downloaded content to extract the article's text, title, authors, etc.
        article.parse()

        # Perform NLP tasks such as keyword extraction and summarization
        article.nlp()

        # Display key information about the article
        print(f'Title: {article.title}')  # Print the title of the article
        print(f'Authors: {article.authors}')  # Print the list of authors
        print(f'Publication Date: {article.publish_date}')  # Print the publication date
        print(f'Summary: {article.summary}')  # Print the summarized text of the article

    except Exception as e:
        # Handle any errors that occur during article extraction and display the error message
        print(f"An error occurred: {str(e)}")

# Main block of code to execute the program
if __name__ == "__main__":
    # Prompt the user to input the URL of the article they wish to extract
    url = input("Enter Article URL: ")

    # Call the function to extract and display the article's information
    extract_article_info(url)


**Other attempts**

In [None]:
#1st article soup
#this code has an extra line to fix the first sentence from the first article I used
import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# Function to extract and clean article text from a URL
def extract_article_text(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses
        soup = BeautifulSoup(response.text, 'html.parser')

        # Extract text from paragraphs and join them
        paragraphs = soup.find_all('p')
        article_text = ' '.join([para.get_text() for para in paragraphs])

        # Clean the text by removing extra spaces
        article_text = ' '.join(article_text.split())
        return article_text.strip()  # Return the cleaned text
    except Exception as e:
        return f"Failed to extract article text: {str(e)}"

# Function to summarize text using a pre-trained transformer model
def summarize_text(text):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

    # Break the text into chunks of a maximum of 800 characters for summarization
    max_chunk_size = 800
    text_chunks = [text[i:i + max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Summarize each chunk and combine the results
    summaries = []
    for chunk in text_chunks:
        # Summarize the chunk
        summary = summarizer(chunk, max_length=80, min_length=30, do_sample=False)
        summaries.append(summary[0]['summary_text'])

    # Join summaries into one final summary
    final_summary = ' '.join(summaries)

    # Refine the summary to get a coherent output with complete sentences
    sentences = final_summary.split('.')

    # Filter out any sentences that mention 'IMF Press Center' and strip extra spaces
    sentences = [s.strip() for s in sentences if 'IMF Press Center' not in s and s] # This is the extra line

    # Limit to 5 sentences for the final summary
    final_summary = '.\n'.join(sentences[:5])  # Use \n for new line after each sentence

    # Ensure each sentence ends with a period
    final_summary = final_summary + '.' if final_summary and not final_summary.endswith('.') else final_summary

    return final_summary

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")
    article_text = extract_article_text(url)  # Extract the article text from the URL
    if not article_text.startswith("Failed"):  # If extraction is successful
        summary = summarize_text(article_text)  # Summarize the extracted text
        print("Summary:")
        print(summary)
    else:
        print(article_text)  # Print error message

Enter Article URL: https://www.imf.org/external/pubs/ft/fandd/2021/03/global-cyber-threat-to-financial-systems-maurer.htm


Your max_length is set to 80, but your input_length is only 52. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=26)


Summary:
Cyber threats to the financial system are growing, and the global community must cooperate to protect it.
In February 2016, hackers targeted the central bank of Bangladesh and exploited vulnerabilities in SWIFT.
The world’s governments and companies continue to struggle to contain the threat.
It remains unclear who is responsible for protecting the system.
The potential economic costs of such events can be immense and the damage to public trust and confidence significant.


In [None]:
#1st article news
# Import the necessary libraries
import nltk
from newspaper import Article

# Download the 'punkt' resource if not already downloaded
nltk.download('punkt')

# Function to extract article information
def extract_article_info(url):
    try:
        article = Article(url)  # Create an Article object with the URL
        article.download()      # Download the article
        article.parse()         # Parse the article
        article.nlp()           # Perform NLP on the article

        # Print article information
        print(f'Title: {article.title}')
        print(f'Authors: {article.authors}')
        print(f'Publication Date: {article.publish_date}')
        print(f'Summary: {article.summary}')
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")  # Prompt for the article URL
    extract_article_info(url)  # Extract and display the article information

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Enter Article URL: https://www.imf.org/external/pubs/ft/fandd/2021/03/global-cyber-threat-to-financial-systems-maurer.htm
Title: The Global Cyber Threat to Financial Systems – IMF F&D
Authors: []
Publication Date: None
Summary: First, the global financial system is going through an unprecedented digital transformation, which is being accelerated by the COVID-19 pandemic.
Second, malicious actors are taking advantage of this digital transformation and pose a growing threat to the global financial system, financial stability, and confidence in the integrity of the system.
Although they do advance financial inclusion, digital financial services also offer a target-rich environment for hackers.
Better protecting the global financial system is primarily an organizational challenge.
This responsibility gap and continued uncertainty about roles and mandates to protect the global financial system fuel risks.


In [None]:
#2nd article
#beautiful soup

import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# Function to extract and clean article text from a URL
def extract_article_text(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses
        soup = BeautifulSoup(response.text, 'html.parser')

        # Extract text from paragraphs and join them
        paragraphs = soup.find_all('p')
        article_text = ' '.join([para.get_text() for para in paragraphs])

        # Clean the text by removing extra spaces
        article_text = ' '.join(article_text.split())
        return article_text.strip()  # Return the cleaned text
    except Exception as e:
        return f"Failed to extract article text: {str(e)}"

# Function to summarize text using a pre-trained transformer model
def summarize_text(text):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

    # Break the text into chunks of a maximum of 800 characters for summarization
    max_chunk_size = 800
    text_chunks = [text[i:i + max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Summarize each chunk and combine the results
    summaries = []
    for chunk in text_chunks:
        # Summarize the chunk
        summary = summarizer(chunk, max_length=80, min_length=30, do_sample=False)
        summaries.append(summary[0]['summary_text'])

    # Join summaries into one final summary
    final_summary = ' '.join(summaries)

    # Refine the summary to get a coherent output with complete sentences
    sentences = final_summary.split('.')

    # Strip extra spaces from each sentence
    sentences = [s.strip() for s in sentences if s]

    # Limit to 5 sentences for the final summary
    final_summary = '.\n'.join(sentences[:5])  # Use \n for new line after each sentence

    # Ensure each sentence ends with a period
    final_summary = final_summary + '.' if final_summary and not final_summary.endswith('.') else final_summary

    return final_summary

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")
    article_text = extract_article_text(url)  # Extract the article text from the URL
    if not article_text.startswith("Failed"):  # If extraction is successful
        summary = summarize_text(article_text)  # Summarize the extracted text
        print("Summary:")
        print(summary)
    else:
        print(article_text)  # Print error message

Enter Article URL: https://news.vt.edu/articles/2024/08/it-cybersecurity-protections-enhanced-2-factor.html


Your max_length is set to 80, but your input_length is only 11. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=5)


Summary:
A wave of phishing emails targeting Virginia Tech employees attempted to diverting direct deposits, including pay, away from their legitimate destination.
Fortunately, newly deployed cybersecurity protections within the Division of Information Technology detected the unusual login activity and put a stop to the hack.
Hackers are getting better at what they do, and they are studying our business processes to find vulnerabilities.
Each member of the university community has a role to play in staying safe online.
"We must continue to find ways to shore up cyber defenses, to include a more informed and security-aware community," he says.


In [None]:
#2nd newspaper

# Import the necessary libraries
import nltk
from newspaper import Article

# Download the 'punkt' resource if not already downloaded
nltk.download('punkt')

# Function to extract article information
def extract_article_info(url):
    try:
        article = Article(url)  # Create an Article object with the URL
        article.download()      # Download the article
        article.parse()         # Parse the article
        article.nlp()           # Perform NLP on the article

        # Print article information
        print(f'Title: {article.title}')
        print(f'Authors: {article.authors}')
        print(f'Publication Date: {article.publish_date}')
        print(f'Summary: {article.summary}')
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")  # Prompt for the article URL
    extract_article_info(url)  # Extract and display the article information

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Enter Article URL: https://news.vt.edu/articles/2024/08/it-cybersecurity-protections-enhanced-2-factor.html
An error occurred: Article `download()` failed with 403 Client Error: Forbidden for url: https://news.vt.edu/articles/2024/08/it-cybersecurity-protections-enhanced-2-factor.html on URL https://news.vt.edu/articles/2024/08/it-cybersecurity-protections-enhanced-2-factor.html


In [None]:
#3rd article
#beautiful soup

import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# Function to extract and clean article text from a URL
def extract_article_text(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses
        soup = BeautifulSoup(response.text, 'html.parser')

        # Extract text from paragraphs and join them
        paragraphs = soup.find_all('p')
        article_text = ' '.join([para.get_text() for para in paragraphs])

        # Clean the text by removing extra spaces
        article_text = ' '.join(article_text.split())
        return article_text.strip()  # Return the cleaned text
    except Exception as e:
        return f"Failed to extract article text: {str(e)}"

# Function to summarize text using a pre-trained transformer model
def summarize_text(text):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

    # Break the text into chunks of a maximum of 800 characters for summarization
    max_chunk_size = 800
    text_chunks = [text[i:i + max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Summarize each chunk and combine the results
    summaries = []
    for chunk in text_chunks:
        # Summarize the chunk
        summary = summarizer(chunk, max_length=80, min_length=30, do_sample=False)
        summaries.append(summary[0]['summary_text'])

    # Join summaries into one final summary
    final_summary = ' '.join(summaries)

    # Refine the summary to get a coherent output with complete sentences
    sentences = final_summary.split('.')

    # Strip extra spaces from each sentence
    sentences = [s.strip() for s in sentences if s]

    # Limit to 5 sentences for the final summary
    final_summary = '.\n'.join(sentences[:5])  # Use \n for new line after each sentence

    # Ensure each sentence ends with a period
    final_summary = final_summary + '.' if final_summary and not final_summary.endswith('.') else final_summary

    return final_summary

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")
    article_text = extract_article_text(url)  # Extract the article text from the URL
    if not article_text.startswith("Failed"):  # If extraction is successful
        summary = summarize_text(article_text)  # Summarize the extracted text
        print("Summary:")
        print(summary)
    else:
        print(article_text)  # Print error message

Enter Article URL: https://www.ifac.org/knowledge-gateway/discussion/cybersecurity-critical-all-organizations-large-and-small


Your max_length is set to 80, but your input_length is only 36. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=18)


Summary:
Cybercrime is becoming big business and cyber risk a focus of organizations and governments globally.
Monetary and reputational risks are high if organizations don’t have an appropriate cybersecurity plan.
Cyber-attacks have been steadily climbing for four consecutive years.
The manufacturing sector experienced the greatest proportion of cyber-attacks in 2022.
Recent cases have involved thefts of sensitive information.


In [None]:
#3rd newspaper

# Import the necessary libraries
import nltk
from newspaper import Article

# Download the 'punkt' resource if not already downloaded
nltk.download('punkt')

# Function to extract article information
def extract_article_info(url):
    try:
        article = Article(url)  # Create an Article object with the URL
        article.download()      # Download the article
        article.parse()         # Parse the article
        article.nlp()           # Perform NLP on the article

        # Print article information
        print(f'Title: {article.title}')
        print(f'Authors: {article.authors}')
        print(f'Publication Date: {article.publish_date}')
        print(f'Summary: {article.summary}')
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")  # Prompt for the article URL
    extract_article_info(url)  # Extract and display the article information

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Enter Article URL: https://www.ifac.org/knowledge-gateway/discussion/cybersecurity-critical-all-organizations-large-and-small
Title: Cybersecurity Is Critical for all Organizations – Large and Small
Authors: []
Publication Date: None
Summary: Cybersecurity is making sure your organization's data is safe from attacks from both internal and external bad actors.
Once infected, the organization’s data continues to be inaccessible as the encrypts the data using the attackers encryption key.
Cybersecurity GovernanceA cybersecurity governance and risk management program should be established which is appropriate for the size of the organization.
Cybersecurity risk needs to be considered as a significant business risk by the owners and directors.
Reporting of any possible breach of security, unauthorized access, or disclosure of the organizations data.


In [None]:
#4th article
#beautiful soup

import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# Function to extract and clean article text from a URL
def extract_article_text(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses
        soup = BeautifulSoup(response.text, 'html.parser')

        # Extract text from paragraphs and join them
        paragraphs = soup.find_all('p')
        article_text = ' '.join([para.get_text() for para in paragraphs])

        # Clean the text by removing extra spaces
        article_text = ' '.join(article_text.split())
        return article_text.strip()  # Return the cleaned text
    except Exception as e:
        return f"Failed to extract article text: {str(e)}"

# Function to summarize text using a pre-trained transformer model
def summarize_text(text):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

    # Break the text into chunks of a maximum of 800 characters for summarization
    max_chunk_size = 800
    text_chunks = [text[i:i + max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Summarize each chunk and combine the results
    summaries = []
    for chunk in text_chunks:
        # Summarize the chunk
        summary = summarizer(chunk, max_length=80, min_length=30, do_sample=False)
        summaries.append(summary[0]['summary_text'])

    # Join summaries into one final summary
    final_summary = ' '.join(summaries)

    # Refine the summary to get a coherent output with complete sentences
    sentences = final_summary.split('.')

    # Strip extra spaces from each sentence
    sentences = [s.strip() for s in sentences if s]

    # Limit to 5 sentences for the final summary
    final_summary = '.\n'.join(sentences[:5])  # Use \n for new line after each sentence

    # Ensure each sentence ends with a period
    final_summary = final_summary + '.' if final_summary and not final_summary.endswith('.') else final_summary

    return final_summary

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")
    article_text = extract_article_text(url)  # Extract the article text from the URL
    if not article_text.startswith("Failed"):  # If extraction is successful
        summary = summarize_text(article_text)  # Summarize the extracted text
        print("Summary:")
        print(summary)
    else:
        print(article_text)  # Print error message

Enter Article URL: https://news.vt.edu/articles/2024/10/cci-cyberarts-2024-exhibit.html


Your max_length is set to 80, but your input_length is only 16. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=8)


Summary:
CyberArts 2024 opens at the Torpedo Factory Art Center in Alexandria.
The opening reception will be held on Oct.
18 from 6-8 p.
m.
Registration is required.


In [None]:
#4th newspaper

# Import the necessary libraries
import nltk
from newspaper import Article

# Download the 'punkt' resource if not already downloaded
nltk.download('punkt')

# Function to extract article information
def extract_article_info(url):
    try:
        article = Article(url)  # Create an Article object with the URL
        article.download()      # Download the article
        article.parse()         # Parse the article
        article.nlp()           # Perform NLP on the article

        # Print article information
        print(f'Title: {article.title}')
        print(f'Authors: {article.authors}')
        print(f'Publication Date: {article.publish_date}')
        print(f'Summary: {article.summary}')
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")  # Prompt for the article URL
    extract_article_info(url)  # Extract and display the article information

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Enter Article URL: https://news.vt.edu/articles/2024/10/cci-cyberarts-2024-exhibit.html
An error occurred: Article `download()` failed with 403 Client Error: Forbidden for url: https://news.vt.edu/articles/2024/10/cci-cyberarts-2024-exhibit.html on URL https://news.vt.edu/articles/2024/10/cci-cyberarts-2024-exhibit.html


In [None]:
#5th article
#beautiful soup

import requests
from bs4 import BeautifulSoup
from transformers import pipeline

# Function to extract and clean article text from a URL
def extract_article_text(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses
        soup = BeautifulSoup(response.text, 'html.parser')

        # Extract text from paragraphs and join them
        paragraphs = soup.find_all('p')
        article_text = ' '.join([para.get_text() for para in paragraphs])

        # Clean the text by removing extra spaces
        article_text = ' '.join(article_text.split())
        return article_text.strip()  # Return the cleaned text
    except Exception as e:
        return f"Failed to extract article text: {str(e)}"

# Function to summarize text using a pre-trained transformer model
def summarize_text(text):
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

    # Break the text into chunks of a maximum of 800 characters for summarization
    max_chunk_size = 800
    text_chunks = [text[i:i + max_chunk_size] for i in range(0, len(text), max_chunk_size)]

    # Summarize each chunk and combine the results
    summaries = []
    for chunk in text_chunks:
        # Summarize the chunk
        summary = summarizer(chunk, max_length=80, min_length=30, do_sample=False)
        summaries.append(summary[0]['summary_text'])

    # Join summaries into one final summary
    final_summary = ' '.join(summaries)

    # Refine the summary to get a coherent output with complete sentences
    sentences = final_summary.split('.')

    # Strip extra spaces from each sentence
    sentences = [s.strip() for s in sentences if s]

    # Limit to 5 sentences for the final summary
    final_summary = '.\n'.join(sentences[:5])  # Use \n for new line after each sentence

    # Ensure each sentence ends with a period
    final_summary = final_summary + '.' if final_summary and not final_summary.endswith('.') else final_summary

    return final_summary

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")
    article_text = extract_article_text(url)  # Extract the article text from the URL
    if not article_text.startswith("Failed"):  # If extraction is successful
        summary = summarize_text(article_text)  # Summarize the extracted text
        print("Summary:")
        print(summary)
    else:
        print(article_text)  # Print error message

Enter Article URL: https://www.propublica.org/article/cybersecurity-expert-finds-another-flaw-in-georgia-voter-portal
Summary:
Until Monday, a new online portal run by the Georgia Secretary of State’s Office contained what experts describe as a serious security vulnerability.
The flaw was brought to the attention of ProPublica and Atlanta News First over the weekend.
The issue was “as bad as any voter cancellation bug could be,” a cybersecurity researcher says.
The Georgia Secretary of State’s Office said it had no records of Parker's attempts to reach out.
The Secretary of State’s Office told the news organizations that it quickly fixed the portal.


In [None]:
#5th newspaper

# Import the necessary libraries
import nltk
from newspaper import Article

# Download the 'punkt' resource if not already downloaded
nltk.download('punkt')

# Function to extract article information
def extract_article_info(url):
    try:
        article = Article(url)  # Create an Article object with the URL
        article.download()      # Download the article
        article.parse()         # Parse the article
        article.nlp()           # Perform NLP on the article

        # Print article information
        print(f'Title: {article.title}')
        print(f'Authors: {article.authors}')
        print(f'Publication Date: {article.publish_date}')
        print(f'Summary: {article.summary}')
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Main execution
if __name__ == "__main__":
    url = input("Enter Article URL: ")  # Prompt for the article URL
    extract_article_info(url)  # Extract and display the article information

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Enter Article URL: https://www.propublica.org/article/cybersecurity-expert-finds-another-flaw-in-georgia-voter-portal
Title: “A Terrible Vulnerability”: Cybersecurity Researcher Discovers Yet Another Flaw in Georgia’s Voter Cancellation Portal
Authors: ['Doug Bock Clark', 'Doug Bock Clark Is A Reporter In Propublica S South Unit. He Investigates Threats To Democracy', 'Abuses Of Power Throughout The Region.']
Publication Date: None
Summary: Parker, who uses they/them pronouns, said that after discovering it, they attempted to contact the Georgia Secretary of State’s Office.
The Secretary of State’s Office told the news organizations that it quickly fixed the portal.
This one would allow any user of the portal to bypass the screen that requires a driver’s license number and submit the cancellation request without it.
A window popped up stating that “Your cancellation request has been successfully submitted” and that county election workers would process the request within a week.
(Parke