# News Summarization and Translation

This notebook demonstrates how to build a pipeline for translating and summarizing news articles using NLP.

In [2]:

import os
from transformers import MarianMTModel, MarianTokenizer, PegasusForConditionalGeneration, PegasusTokenizer
from langdetect import detect

# Define translation function
def translate_text(text, source_lang, target_lang):
    model_name = f"Helsinki-NLP/opus-mt-{source_lang}-{target_lang}"
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name)

    inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
    translated = model.generate(**inputs)
    return tokenizer.decode(translated[0], skip_special_tokens=True)

# Define summarization function
def summarize_text(text):
    model_name = "google/pegasus-xsum"
    tokenizer = PegasusTokenizer.from_pretrained(model_name)
    model = PegasusForConditionalGeneration.from_pretrained(model_name)

    inputs = tokenizer(text, return_tensors="pt", max_length=1024, truncation=True)
    summary = model.generate(**inputs)
    return tokenizer.decode(summary[0], skip_special_tokens=True)

# Pipeline for news processing
def process_news_article(article, target_language="en"):
    source_language = detect(article)

    # Translate if necessary
    if source_language != target_language:
        print(f"Translating from {source_language} to {target_language}...")
        article = translate_text(article, source_language, target_language)

    # Summarize
    print("Summarizing article...")
    summary = summarize_text(article)
    return article, summary


## Example Usage
Paste your news article below and specify the target language.

In [3]:

# Example usage
news_article = """Robots are stepping out. Once relegated to factories and warehouses, next-generation robots are popping up in public spaces—from retail stores to museums—cleaning, cooking and even conversing with humans.

Improvements in “brainpower,” most notably the adoption of the technology behind ChatGPT, and a surge of investment are helping drive their public debut and 2025 could be a turning point in what robots can do. 

Operators say they expect to deploy more public-facing robots. The robotics and drone sector in 2024 had attracted about $12.8 billion in venture-capital dollars by mid-December, up from $11.6 billion in all of 2023, according to analytics firm PitchBook. 

While operators are excited about new GenAI-powered capabilities, they are mindful that this next generation of robots won’t excel at every human interaction without some stumbles.

Make that many stumbles. 

“Some things which are very easy for people are very hard for robots,” said David Pinn, chief executive of Brain Corp, which provides software for automated floor-cleaning and inventory management robots used at retailers like Sam’s Club.  

Even something as simple as picking up an arbitrary object and moving it “is a really hard problem in the world of robotics,” he said.

Traditionally, robots rely on code that tells them how to execute functions or react to specific scenarios. Variability of what they could do was more or less limited to the specific actions they were trained on. 

At health system Houston Methodist, Chief Innovation Officer Roberta Schwartz discovered that robots designed to carry out a number of tasks, from checking fire extinguishers to carrying towels, often bumped into objects and got easily confused by elevators.

Robots that will operate in human spaces will need better dexterity and the ability to circumvent obstacles—both areas that generative AI, the technology behind many of today’s chatbots, could help with. 

“You can train the robot through massive data sets to be able to achieve this kind of dexterity, that until now has only been achievable by our own labor,” said Brain Corp’s Pinn.

Generative AI could give robots the ability to plan and replan their tasks if they encounter an obstacle, understand what certain objects are even if they’ve never seen them before, and, critically, take commands in human language, said Marc Segura, president of the robotics division at ABB, a Zurich-based automation provider.

Conversation is a big factor as robots move further into human spaces. Will Jackson, founder and CEO of robotics company Engineered Arts, believes that sectors like hospitality and entertainment are ripe for the introduction of robots that not only talk like humans but look like them as well."""
target_lang = "fr"

translated_text, summary = process_news_article(news_article, target_language=target_lang)
print("\nTranslated Text:", translated_text)
print("\nSummary:", summary)


Translating from en to fr...


tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Summarizing article...


tokenizer_config.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.52M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/259 [00:00<?, ?B/s]


Translated Text: Une fois relégués dans les usines et les entrepôts, les robots de la prochaine génération peuvent être utilisés dans les espaces publics – des magasins de détail aux musées – le nettoyage, la cuisson et même la conversation avec les humains. Améliorations dans le domaine du cerveau, notamment l'adoption de la technologie derrière ChatGPT, et une poussée d'investissement sont en train de conduire leurs débuts publics et 2025 pourrait être un tournant dans ce que les robots peuvent faire. Les opérateurs disent qu'ils attendent de déployer plus de robots publics. Le secteur de la robotique et des drones en 2024 avait attiré environ 12,8 milliards de dollars en capital-risque à la mi-décembre, en hausse de 11,6 milliards de dollars en 2023, selon la firme d'analyse PitchBook.

Summary: Une fois relégués dans les usines et les entrepts, les robots de la prochaine génération peuvent tre utilisés les espaces publics - des magasins de détail aux musées - le net
