# Build a News Articles Summarizer - Solution

Welcome to the solution notebook! This contains completed implementations for building an AI-powered news article summarizer using LangChain.

## Overview

This notebook demonstrates:
- Article extraction and parsing
- Prompt template creation
- Multi-backend LLM integration (OpenAI/Ollama)
- Language and format customization

## Solutions Included

Each exercise is completed with best practices and production-ready code.

---

## 1. Install Dependencies

In [1]:
#!pip install -q langchain langchain_openai python-dotenv newspaper3k lxml_html_clean langchain_community langchain-core

### API Configuration

This notebook supports both OpenAI API and local Ollama. Choose your backend:

**Option 1: OpenAI**
- Set `OPENAI_API_KEY` environment variable
- More reliable, requires internet connection
- Pay per use (check current pricing)

**Option 2: Ollama (Local)**
- Set `USE_OLLAMA=1` environment variable
- Run `ollama serve` in a terminal
- Free, requires local GPU or enough RAM
- Install models with `ollama pull mistral`

**Recommendation:** Start with OpenAI for simplicity, then try Ollama for local development.

In [2]:
import requests
from newspaper import Article

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

article_url = "https://www.artificialintelligence-news.com/2022/01/25/meta-claims-new-ai-supercomputer-will-set-records/"

session = requests.Session()

try:
    response = session.get(article_url, headers=headers, timeout=10)

    if response.status_code == 200:
        article = Article(article_url)
        article.download()
        article.parse()

        print(f"Title: {article.title}")
        print(f"Text: {article.text}")
    else:
        print(f"Failed to fetch article at {article_url}")
except Exception as e:
    print(f"Error occurred while fetching article at {article_url}: {e}")

Title: Meta claims its new AI supercomputer will set records
Text: Meta (formerly Facebook) has unveiled an AI supercomputer that it claims will be the world’s fastest.

The supercomputer is called the AI Research SuperCluster (RSC) and is yet to be fully complete. However, Meta’s researchers have already begun using it for training large natural language processing (NLP) and computer vision models.

RSC is set to be fully built in mid-2022. Meta says that it will be the fastest in the world once complete and the aim is for it to be capable of training models with trillions of parameters.

“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together,” wrote Meta in a blog post.

“Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing

## 2. Build a Prompt Template

**Solution:** The template uses f-string formatting to inject the article title and text into the prompt.

**Best practices shown:**
- Clear role definition for the AI
- Visual separators for context
- Explicit task instructions
- Using f-strings for dynamic content injection

In [3]:
from langchain_core.messages import HumanMessage

# prepare template for prompt
template = f"""You are a very good assistant that summarizes online articles.

Here's the article you want to summarize.

==================
Title: {article.title}

{article.text}
==================

Write a summary of the previous article.
"""

prompt = template.format(article_title=article.title, article_text=article.text)

messages = [HumanMessage(content=prompt)]
messages



## 3. Initialize Language Model

**Solution:** The code initializes the appropriate LLM based on the `USE_OLLAMA` environment variable.

**Key implementation details:**
- Conditional logic to select backend
- Environment variable detection
- Proper model initialization with temperature=0
- User feedback on selected backend

In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

# Determine which backend to use
USE_OLLAMA = os.environ.get("USE_OLLAMA", "").lower() in ("1", "true", "yes")

# Load the appropriate model based on USE_OLLAMA environment variable
if USE_OLLAMA:
    from langchain_community.chat_models import ChatOllama
    llm = ChatOllama(model="mistral", temperature=0)
    print("⚙️ Using Ollama backend")
else:
    from langchain_openai.chat_models import ChatOpenAI
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    print("⚙️ Using OpenAI backend")

⚙️ Using OpenAI backend


## 4. Generate a Summary

**Solution:** Using `.invoke()` method with messages list.

**How it works:**
- Pass the list of messages to the LLM
- The model processes the prompt and returns a response
- Access content via `.content` attribute

In [5]:
summary = llm.invoke(messages)
summary.content

"Meta has introduced the AI Research SuperCluster (RSC), which it claims will be the fastest AI supercomputer in the world upon its completion in mid-2022. Currently, researchers are using RSC to train large natural language processing and computer vision models. The supercomputer is designed to handle models with trillions of parameters and is expected to be 20 times faster than Meta's existing clusters, significantly reducing training times for large models.\n\nMeta envisions RSC enabling advanced AI systems, such as real-time voice translation for collaborative projects and applications in the metaverse. The infrastructure prioritizes performance, reliability, security, and privacy, allowing Meta to utilize real-world data for training, including efforts to identify harmful content on its platforms. Overall, RSC represents a significant advancement in AI research capabilities for Meta."

## 5. Advanced: Customized Output Format

**Solution:** Creating a French bullet-point summary with detailed prompt engineering.

**This solution demonstrates:**
- Advanced prompt construction with `.format()` method
- Multi-language output (French)
- Specific formatting requirements (bullets)
- Reusable template pattern

In [6]:
# prepare template for prompt
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists in French.

Here's the article you need to summarize.

==================
Title: {article_title}

{article_text}
==================

Now, provide a summarized version of the article in a bulleted list format, in French.
"""

# format prompt
prompt = template.format(article_title=article.title, article_text=article.text)

# generate summary
summary = llm.invoke([HumanMessage(content=prompt)])
print(summary.content)

- **Titre** : Meta annonce que son nouvel superordinateur IA établira des records.
- **Présentation** : Meta (anciennement Facebook) a dévoilé un superordinateur IA, l'AI Research SuperCluster (RSC), qu'il prétend être le plus rapide au monde.
- **Statut** : Le RSC n'est pas encore entièrement terminé, mais les chercheurs de Meta l'utilisent déjà pour former des modèles de traitement du langage naturel (NLP) et de vision par ordinateur.
- **Achèvement** : Le RSC devrait être complètement construit d'ici mi-2022.
- **Capacités** : Une fois terminé, il sera capable de former des modèles avec des trillions de paramètres.
- **Applications envisagées** : Meta espère que le RSC permettra de créer de nouveaux systèmes IA, comme des traductions vocales en temps réel pour des groupes parlant différentes langues.
- **Impact futur** : Le travail avec le RSC vise à développer des technologies pour le métavers, où les applications et produits pilotés par l'IA joueront un rôle clé.
- **Performances*