# Building a News Articles Summarizer

Steps:
1. Install libraries: requests, newspaper3k, and langchain;
2. Scrape articles: scrape the content of the target news articles from their respective URLs using the `requests` library;
3. Extract titles and text: parse the scraped HTML and extract the titles and text of the articles using  `newspaper3k`; 
4. Preprocess the text: clean and preprocess the extracted text values to make them suitable for ChatGPT input;
5. Generate summaries: utilize ChatGPT to summarize the extracted texts;
6. Output the results: present the summaries along with the original titles, allowing users to grasp the main points of each article quickly.

### 1. Installing and Importing Libraries

In [1]:
# Installing the required libraries
# !pip install langchain==0.0.208 deeplake openai tiktoken
# !pip install -q newspaper3k python-dotenv

In [2]:
import requests
from newspaper import Article
from langchain.schema import HumanMessage
from langchain.chat_models import ChatOpenAI

import os
from keys import OPENAI_API_KEY
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

### 2. Scrape Article

In [3]:
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}
article_url = "https://www.artificialintelligence-news.com/2022/01/25/meta-claims-new-ai-supercomputer-will-set-records/"

session = requests.Session()
try:
    # Fetch article from the URL using the requests library with a custom User-Agent header
    response = session.get(article_url, headers=headers, timeout=10)
    
    # Extract the title and text of each article using the newspaper library
    if response.status_code == 200:
        article = Article(article_url)
        article.download()
        article.parse()
        
        print(f"Title: {article.title}")
        print(f"Text: {article.text}")
        
    else:
        print(f"Failed to fetch article at {article_url}")
except Exception as e:
    print(f"Error occurred while fetching article at {article_url}: {e}")

Title: Meta claims its new AI supercomputer will set records
Text: Ryan is a senior editor at TechForge Media with over a decade of experience covering the latest technology and interviewing leading industry figures. He can often be sighted at tech conferences with a strong coffee in one hand and a laptop in the other. If it's geeky, he’s probably into it. Find him on Twitter (@Gadget_Ry) or Mastodon (@gadgetry@techhub.social)

Meta (formerly Facebook) has unveiled an AI supercomputer that it claims will be the world’s fastest.

The supercomputer is called the AI Research SuperCluster (RSC) and is yet to be fully complete. However, Meta’s researchers have already begun using it for training large natural language processing (NLP) and computer vision models.

RSC is set to be fully built in mid-2022. Meta says that it will be the fastest in the world once complete and the aim is for it to be capable of training models with trillions of parameters.

“We hope RSC will help us build entire

### 3. Extract titles and text

In [4]:
# Get the article data from the scraping part
article_title = article.title
article_text = article.text

### 4.Preprocess the text

In [5]:
# Prepare template for prompt
template = """You are a very good assistant that summarizes online articles.
Here's the article you want to summarize.
==================
Title: {article_title}

{article_text}
==================
Write a summary of the previous article.
"""

prompt = template.format(article_title=article.title, article_text=article.text)
messages = [HumanMessage(content=prompt)]

### 5. Generate and Print Summary

In [6]:
# Load the model
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)  # model_name = "gpt-4"

In [7]:
# Generate summary - use the chat() instance by passing a single HumanMessage object with the formatted prompt
summary = chat(messages) 
print(summary.content)

Meta, formerly known as Facebook, has unveiled its AI Research SuperCluster (RSC), an AI supercomputer that it claims will be the fastest in the world once completed in mid-2022. The RSC is already being used for training large natural language processing and computer vision models. Meta aims for the RSC to be capable of training models with trillions of parameters and hopes it will pave the way for AI-driven applications in the metaverse. The RSC is expected to be 20 times faster than Meta's current clusters and will enable faster training of large-scale NLP workflows. Additionally, the RSC's design prioritizes security and privacy controls, allowing Meta to use real-world examples from its production systems for research purposes.


In [8]:
# If we wanted a bulleted list, we could modify a prompt

# Prepare another template for prompt
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists.

Here's the article you need to summarize.

==================
Title: {article_title}

{article_text}
==================

Now, provide a summarized version of the article in a bulleted list format.
"""

# Format prompt
prompt = template.format(article_title=article.title, article_text=article.text)

# Generate summary
summary = chat([HumanMessage(content=prompt)])
print(summary.content)

- Meta (formerly Facebook) has unveiled an AI supercomputer called the AI Research SuperCluster (RSC) that it claims will be the world's fastest.
- The RSC is currently being used for training large natural language processing (NLP) and computer vision models.
- Once fully built in mid-2022, Meta aims for the RSC to be capable of training models with trillions of parameters and to be the fastest in the world.
- Meta hopes that the RSC will enable the development of new AI systems for real-time voice translations and collaboration in the metaverse.
- The RSC is expected to be 20x faster than Meta's current clusters, 9x faster at running the NVIDIA Collective Communication Library (NCCL), and 3x faster at training large-scale NLP workflows.
- Meta designed the RSC with security and privacy controls to use real-world examples from its production systems for research, such as identifying harmful content on its platforms.
- Meta believes that the RSC tackles performance, reliability, securi

In [11]:
# If we wanted to get summary in French

# Prepare template for prompt
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists in French.

Here's the article you need to summarize.

==================
Title: {article_title}

{article_text}
==================

Now, provide a summarized version of the article in a bulleted list format, in French.
"""

# Format prompt
prompt = template.format(article_title=article.title, article_text=article.text)

# Generate summary
summary = chat([HumanMessage(content=prompt)])
print(summary.content)

- Meta a dévoilé un superordinateur d'intelligence artificielle (IA) appelé AI Research SuperCluster (RSC) qui prétend être le plus rapide au monde.
- Le RSC est encore en construction, mais les chercheurs de Meta l'utilisent déjà pour former de grands modèles de traitement du langage naturel (NLP) et de vision par ordinateur.
- Une fois terminé, le RSC devrait être le plus rapide au monde et capable de former des modèles avec des milliers de milliards de paramètres.
- Meta espère que le RSC permettra de développer de nouveaux systèmes d'IA pour des applications telles que la traduction vocale en temps réel ou les jeux en réalité augmentée.
- Le RSC devrait être 20 fois plus rapide que les clusters actuels de Meta et 9 fois plus rapide pour exécuter la bibliothèque de communication collective NVIDIA (NCCL).
- Meta affirme que le RSC permettra de former des modèles avec des dizaines de milliards de paramètres en trois semaines au lieu de neuf semaines.
- Le RSC a été conçu avec des cont