# Build a News Articles Summarizer - Solution

Welcome to the solution notebook! This contains completed implementations for building an AI-powered news article summarizer using LangChain.

## Overview

This notebook demonstrates:
- Article extraction and parsing
- Prompt template creation
- Multi-backend LLM integration (OpenAI/Ollama)
- Language and format customization

## Solutions Included

Each exercise is completed with best practices and production-ready code.

### API Configuration

This notebook supports both OpenAI API and local Ollama. Choose your backend:

**Option 1: OpenAI**
- Set `OPENAI_API_KEY` environment variable
- More reliable, requires internet connection
- Pay per use (check current pricing)

**Option 2: Ollama (Local)**
- Set `USE_OLLAMA=1` environment variable
- Run `ollama serve` in a terminal
- Free, requires local GPU or enough RAM
- Install models with `ollama pull mistral`

**Recommendation:** Start with OpenAI for simplicity, then try Ollama for local development.

In [1]:
import requests
from newspaper import Article

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

article_url = "https://www.artificialintelligence-news.com/2022/01/25/meta-claims-new-ai-supercomputer-will-set-records/"

session = requests.Session()

try:
    response = session.get(article_url, headers=headers, timeout=10)

    if response.status_code == 200:
        article = Article(article_url)
        article.download()
        article.parse()

        print(f"Title: {article.title}")
        print(f"Text: {article.text}")
    else:
        print(f"Failed to fetch article at {article_url}")
except Exception as e:
    print(f"Error occurred while fetching article at {article_url}: {e}")

Title: Meta claims its new AI supercomputer will set records
Text: Meta (formerly Facebook) has unveiled an AI supercomputer that it claims will be the world’s fastest.

The supercomputer is called the AI Research SuperCluster (RSC) and is yet to be fully complete. However, Meta’s researchers have already begun using it for training large natural language processing (NLP) and computer vision models.

RSC is set to be fully built in mid-2022. Meta says that it will be the fastest in the world once complete and the aim is for it to be capable of training models with trillions of parameters.

“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together,” wrote Meta in a blog post.

“Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing

## 2. Build a Prompt Template

**Solution:** The template uses f-string formatting to inject the article title and text into the prompt.

**Best practices shown:**
- Clear role definition for the AI
- Visual separators for context
- Explicit task instructions
- Using f-strings for dynamic content injection

In [2]:
from langchain_core.messages import HumanMessage

# prepare template for prompt
template = f"""You are a very good assistant that summarizes online articles.

Here's the article you want to summarize.

==================
Title: {article.title}

{article.text}
==================

Write a summary of the previous article.
"""

prompt = template.format(article_title=article.title, article_text=article.text)

messages = [HumanMessage(content=prompt)]
messages



## 3. Initialize Language Model

**Solution:** The code initializes the appropriate LLM based on the `USE_OLLAMA` environment variable.

**Key implementation details:**
- Conditional logic to select backend
- Environment variable detection
- Proper model initialization with temperature=0
- User feedback on selected backend

In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

# Determine which backend to use in the .env
USE_OLLAMA = os.environ.get("USE_OLLAMA", "").lower() in ("1", "true", "yes")
USE_GEMINI = os.environ.get("USE_GEMINI", "").lower() in ("1", "true", "yes")

if USE_OLLAMA:
    # Utiliser Ollama
    from langchain_community.chat_models import ChatOllama
    # Assurez-vous que le serveur Ollama est en cours d'exécution
    # et que le modèle 'mistral' est téléchargé.
    llm = ChatOllama(model="mistral", temperature=0)
    print("⚙️ Utilisation du backend Ollama (mistral)")
elif USE_GEMINI:
    # Utiliser Gemini
    # Assurez-vous que la variable d'environnement GOOGLE_API_KEY est définie dans le .env
    from langchain_google_genai import ChatGoogleGenerativeAI
    llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0)
    print("⚙️ Utilisation du backend Gemini (gemini-2.5-flash)")
else:
    # Utiliser OpenAI par défaut
    # Assurez-vous que la variable d'environnement OPENAI_API_KEY est définie dans le .env
    from langchain_openai.chat_models import ChatOpenAI
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    print("⚙️ Utilisation du backend OpenAI (gpt-4o-mini)")

⚙️ Utilisation du backend Gemini (gemini-2.5-flash)


## 4. Generate a Summary

**Solution:** Using `.invoke()` method with messages list.

**How it works:**
- Pass the list of messages to the LLM
- The model processes the prompt and returns a response
- Access content via `.content` attribute

In [5]:
summary = llm.invoke(messages)
summary.content

"Meta has announced its new AI supercomputer, the AI Research SuperCluster (RSC), which it claims will be the world's fastest upon its full completion in mid-2022. Researchers are already using the RSC to train large natural language processing (NLP) and computer vision models, with the goal of training models with trillions of parameters.\n\nMeta expects RSC to be significantly faster than its current systems, boasting 20x faster production, 9x faster NVIDIA Collective Communication Library (NCCL) performance, and 3x faster large-scale NLP training. For instance, a model that previously took nine weeks to train can now finish in three weeks.\n\nThe company intends to use RSC to develop new AI systems for applications like real-time voice translations and to build technologies for the metaverse. A key feature of RSC is its enhanced security and privacy controls, which will allow Meta to use real-world data from its production systems to train models, enabling advancements in tasks such

## 5. Advanced: Customized Output Format

**Solution:** Creating a French bullet-point summary with detailed prompt engineering.

**This solution demonstrates:**
- Advanced prompt construction with `.format()` method
- Multi-language output (French)
- Specific formatting requirements (bullets)
- Reusable template pattern

In [6]:
# prepare template for prompt
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists in French.

Here's the article you need to summarize.

==================
Title: {article_title}

{article_text}
==================

Now, provide a summarized version of the article in a bulleted list format, in French.
"""

# format prompt
prompt = template.format(article_title=article.title, article_text=article.text)

# generate summary
summary = llm.invoke([HumanMessage(content=prompt)])
print(summary.content)

Voici un résumé de l'article en français :

*   Meta (anciennement Facebook) a dévoilé un superordinateur d'IA nommé AI Research SuperCluster (RSC).
*   Meta affirme que le RSC sera le plus rapide du monde une fois entièrement construit, ce qui est prévu pour la mi-2022.
*   Il est déjà utilisé pour l'entraînement de grands modèles de traitement du langage naturel (NLP) et de vision par ordinateur.
*   L'objectif est qu'il puisse entraîner des modèles avec des milliers de milliards de paramètres.
*   Le RSC vise à développer de nouveaux systèmes d'IA pour des applications telles que la traduction vocale en temps réel et les technologies du métavers.
*   Meta s'attend à ce qu'il soit 20 fois plus rapide que ses clusters actuels basés sur V100 et 3 fois plus rapide pour les flux de travail NLP à grande échelle.
*   Grâce au RSC, l'entraînement d'un modèle de plusieurs dizaines de milliards de paramètres prendra trois semaines, contre neuf auparavant.
*   Conçu avec des contrôles de sécur