<a href="https://colab.research.google.com/github/narayanadhavala/ML-Projects/blob/main/News_Article_Summarizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# News Article Summarizer

In [7]:
!pip install -qU langchain-openai langchain-community
!pip install -q newspaper3k python-dotenv lxml[html_clean]

## Import OpenAI_API Key
* I stored the OpenAi API key in Google Colab's Secret store, we can also store the key in '.env' and load the keys using the 'python-dotenv' package.

In [8]:
import json
from google.colab import userdata

OPENAI_API_KEY = userdata.get('OpenAI_API')

In [9]:
import requests
from newspaper import Article


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

article_url = "https://link.springer.com/chapter/10.1007/978-3-031-98668-0_7"

session = requests.Session()

try:
    response = session.get(article_url, headers=headers, timeout=10)

    if response.status_code == 200:
        article = Article(article_url)
        article.download()
        article.parse()

        print(f"Title: {article.title}")
        print(f"Text: {article.text}")

    else:
        print(f"Failed to fetch article at {article_url}")
except Exception as e:
    print(f"Error occurred while fetching article at {article_url}: {e}")

Title: Verifying PETSc Vector Components Using CIVL
Text: Modern computational science applications rely on numerical libraries for portable and efficient implementations of numerical algorithms and data structures. One widely used example is the Portable, Extensible Toolkit for Scientific Computation (PETSc) [1,2,3]. PETSc provides a suite of parallel linear and nonlinear solvers, ordinary differential equation (ODE) integrators, and numerical optimization solvers, along with supporting mathematical objects including vectors, matrices, structured grids, and graphs, with associated abstractions and data structures for managing parallel domain decomposition and communication.

PETSc and its development team have received numerous accolades, including the 2015 SIAM/ACM Prize in Computational Science and Engineering and inclusion as one of the Top Ten Advances in Computational Science Accomplishments of the U.S. Department of Energy in 2008. An incomplete list of application domains using

In [10]:
from langchain_core.messages import HumanMessage, SystemMessage

# we get the article data from the scraping part
article_title = article.title
article_text = article.text

# prepare template for prompt
template = """You are a very good assistant that summarizes online articles.

Here's the article you want to summarize.

==================
Title: {article_title}

{article_text}
==================

Write a summary of the previous article.
"""

prompt = template.format(article_title=article.title, article_text=article.text)

messages = [HumanMessage(content=prompt)]

In [12]:
from langchain_openai import ChatOpenAI

# load the model
chat = ChatOpenAI(model_name="gpt-4o-mini", temperature=0, api_key=OPENAI_API_KEY)

In [14]:
summary = chat.invoke(messages)
print(summary)

content="The article discusses the verification of vector components in the Portable, Extensible Toolkit for Scientific Computation (PETSc) using the CIVL model checker. PETSc is a widely used numerical library that provides various solvers and mathematical objects, essential for numerous computational science applications across diverse fields such as acoustics, aerodynamics, and cancer treatment. Given its critical role, ensuring the correctness of PETSc is paramount, leading to extensive testing through GitLab pipelines and a comprehensive test suite.\n\nHowever, traditional testing may not catch subtle defects, prompting a collaboration with PETSc developers to verify its functional correctness through symbolic execution and model checking, starting with its vector module. The article introduces the CIVL model checker, which enhances C and Fortran programs with verification capabilities, allowing for the exploration of all possible program executions. CIVL's ability to handle C, MP

### If we want a bulleted list, we need to modify the prompt and get the result.

In [17]:
# prepare template for prompt
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists.

Here's the article you need to summarize.

==================
Title: {article_title}

{article_text}
==================

Now, provide a summarized version of the article in a bulleted list format.
"""

# format prompt
prompt_2 = template.format(article_title=article.title, article_text=article.text)

# generate summary
summary = chat.invoke([HumanMessage(content=prompt_2)])
print(summary)

content="- **Title**: Verifying PETSc Vector Components Using CIVL\n- **Overview of PETSc**:\n  - A widely used numerical library for scientific computation.\n  - Provides parallel solvers, ODE integrators, and optimization solvers.\n  - Supports mathematical objects like vectors, matrices, and graphs.\n  - Recognized with awards such as the 2015 SIAM/ACM Prize and inclusion in Top Ten Advances by the U.S. Department of Energy.\n  - Used in diverse fields: acoustics, aerodynamics, cancer treatment, economics, etc.\n  - User manual and primary paper have been cited thousands of times.\n\n- **Importance of Testing**:\n  - Extensive testing is crucial due to the potential impact of defects.\n  - PETSc employs GitLab pipelines and a test suite with over 13,000 tests.\n  - Subtle defects, like data races, may still go undetected.\n\n- **Verification Efforts**:\n  - Collaboration with PETSc developers to verify functional correctness.\n  - Utilizes symbolic execution and model checking, focu