In [2]:
'''
We set up the environment and retrieved the news article.

Install required libraries: 
1. The first step is to ensure that the necessary libraries, namely requests, newspaper3k, and LangChain, are installed.
2. Scrape articles: We will use the requests library to scrape the content of the target news articles from their respective URLs.
3. Extract titles and text: The newspaper library will be used to parse the scraped HTML, extracting the titles and text of the articles.
4. Preprocess the text: The extracted texts need to be cleaned and preprocessed to make them suitable for input to LLM.
The rest of the lesson will explore new possibilities to enhance the application’s performance further.

5. Use Few-Shot Learning Technique: We use the few-shot learning technique in this step.
This template will provide a few examples of the language model to guide it in generating the summaries in the desired format - a bulleted list.

6. Generate summaries: With the modified prompt, we utilize the model to generate concise summaries of the extracted articles' text in the desired format.
7. Use the Output Parsers: We employ the Output Parsers to interpret the output from the language model, ensuring it aligns with our desired structure and format.
8. Output the results: Finally, we present the bulleted summaries along with the original titles, enabling users to quickly grasp the main points of each article in a structured manner.
'''
!pip install -q newspaper3k python-dotenv


In [3]:
from dotenv import load_dotenv, dotenv_values
import os

# Load environment variables from .env file
config = dotenv_values("C:/Users/SACHENDRA/Documents/Activeloop/.env")
load_dotenv("C:/Users/SACHENDRA/Documents/Activeloop/.env")

True

In [4]:
'''
We picked the URL of a news article to generate a summary.
The following code fetches articles from a list of URLs using the requests library with a custom User-Agent header.
It then extracts the title and text of each article using the newspaper library.
'''

import requests
from newspaper import Article

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

article_url = "https://www.artificialintelligence-news.com/2022/01/25/meta-claims-new-ai-supercomputer-will-set-records/"

session = requests.Session()

try:
  response = session.get(article_url, headers=headers, timeout=10)
  
  if response.status_code == 200:
      article = Article(article_url)
      article.download()
      article.parse()
      
      print(f"Title: {article.title}")
      print(f"Text: {article.text}")
  else:
      print(f"Failed to fetch article at {article_url}")
except Exception as e:
    print(f"Error occurred while fetching article at {article_url}: {e}")

Title: Meta claims its new AI supercomputer will set records
Text: Ryan Daws is a senior editor at TechForge Media, with a seasoned background spanning over a decade in tech journalism. His expertise lies in identifying the latest technological trends, dissecting complex topics, and weaving compelling narratives around the most cutting-edge developments. His articles and interviews with leading industry figures have gained him recognition as a key influencer by organisations such as Onalytica. Publications under his stewardship have since gained recognition from leading analyst houses like Forrester for their performance. Find him on X (@gadget_ry) or Mastodon (@gadgetry@techhub.social)

Meta (formerly Facebook) has unveiled an AI supercomputer that it claims will be the world’s fastest.

The supercomputer is called the AI Research SuperCluster (RSC) and is yet to be fully complete. However, Meta’s researchers have already begun using it for training large natural language processing (

In [5]:
'''
Few Shot Prompting
'''
from langchain.schema import (
    HumanMessage
)

# we get the article data from the scraping part
article_title = article.title
article_text = article.text

# prepare template for prompt
template = """
As an advanced AI, you've been tasked to summarize online articles into bulleted points. Here are a few examples of how you've done this in the past:

Example 1:
Original Article: 'The Effects of Climate Change
Summary:
- Climate change is causing a rise in global temperatures.
- This leads to melting ice caps and rising sea levels.
- Resulting in more frequent and severe weather conditions.

Example 2:
Original Article: 'The Evolution of Artificial Intelligence
Summary:
- Artificial Intelligence (AI) has developed significantly over the past decade.
- AI is now used in multiple fields such as healthcare, finance, and transportation.
- The future of AI is promising but requires careful regulation.

Now, here's the article you need to summarize:

==================
Title: {article_title}

{article_text}
==================

Please provide a summarized version of the article in a bulleted list format.
"""

# Format the Prompt
prompt = template.format(article_title=article.title, article_text=article.text)

messages = [HumanMessage(content=prompt)]

In [6]:
from langchain.chat_models import ChatOpenAI

# load the model
chat = ChatOpenAI(model_name="gpt-4", temperature=0.0)

# generate summary
summary = chat(messages)
print(summary.content)

- Meta (formerly Facebook) has announced an AI supercomputer, the AI Research SuperCluster (RSC), which it claims will be the world's fastest.
- The RSC is not yet fully built, but is already being used by Meta's researchers for training large natural language processing and computer vision models.
- The supercomputer is expected to be fully operational by mid-2022 and will be capable of training models with trillions of parameters.
- Meta hopes the RSC will help build new AI systems for real-time voice translations and other applications, paving the way for the next major computing platform, the metaverse.
- Once in production, RSC is expected to be 20x faster than Meta's current V100-based clusters, 9x faster at running the NVIDIA Collective Communication Library, and 3x faster at training large-scale NLP workflows.
- A model with tens of billions of parameters can finish training in three weeks with RSC, compared to nine weeks previously.
- The RSC was designed with security and pri

In [12]:
'''
OUTPUT Parsers
'''
from langchain.output_parsers import PydanticOutputParser
from pydantic import validator
from pydantic import BaseModel, Field
from typing import List
import json


# create output parser class
class ArticleSummary(BaseModel):
    title: str = Field(description="Title of the article")
    summary: List[str] = Field(description="Bulleted list summary of the article")

    # validating whether the generated summary has at least three lines
    @validator('summary', allow_reuse=True)
    def has_three_or_more_lines(cls, list_of_lines):
        if len(list_of_lines) < 3:
            raise ValueError("Generated summary has less than three bullet points!")
        return list_of_lines

# set up output parser
parser = PydanticOutputParser(pydantic_object=ArticleSummary)

In [8]:
from langchain.prompts import PromptTemplate


# create prompt template
# notice that we are specifying the "partial_variables" parameter
template = """
You are a very good assistant that summarizes online articles.

Here's the article you want to summarize.

==================
Title: {article_title}

{article_text}
==================

{format_instructions}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["article_title", "article_text"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

# Format the prompt using the article title and text obtained from scraping
formatted_prompt = prompt.format_prompt(article_title=article_title, article_text=article_text)

In [14]:
from langchain.llms import OpenAI

# instantiate model class
model = OpenAI(model_name="gpt-3.5-turbo", temperature=0.0)

# Use the model to generate a summary
output = model(formatted_prompt.to_string())


# Extract the JSON string from the output
json_start = output.find('{')
json_end = output.rfind('}') + 1
output_json = output[json_start:json_end]

# Parse the output into the Pydantic model
parsed_output = parser.parse(output_json)
print(parsed_output)

# output_json = output.split("\"]}")[0] + "\"]}"
# parsed_output = parser.parse(output_json)
# print(parsed_output)

# # Parse the output into the Pydantic model
# parsed_output = parser.parse(output.split("\"]}")[0] + "\"]}")
# print(parsed_output)

title='Meta claims its new AI supercomputer will set records' summary=["Meta (formerly Facebook) has unveiled an AI supercomputer called the AI Research SuperCluster (RSC) that it claims will be the world's fastest.", 'RSC is being used for training large natural language processing (NLP) and computer vision models and is set to be fully built in mid-2022.', 'Meta aims for RSC to be capable of training models with trillions of parameters and to pave the way for AI-driven applications in the metaverse.', "RSC is expected to be 20x faster than Meta's current clusters, 9x faster at running the NVIDIA Collective Communication Library (NCCL), and 3x faster at training large-scale NLP workflows.", 'Meta can use RSC to advance research for tasks like identifying harmful content on its platforms using real data from them.']
