# News Aggregator

## Features
- **Top Articles Retrieval**: Scrapes top articles from predefined news sources.
- **Article Summarization**: Summarizes articles to provide concise representations.
- **sentiment Detection**: Determines the sentiment of articles based on sentiment analysis.


In [1]:
import newspaper
import json

newspaper_list = [ 'https://time.com/',
        'https://www.theguardian.com/europe',
        'https://edition.cnn.com/' ]

### Top Articles Retrieval
The script scrapes the top articles from predefined news sources listed in the `newspaper_list`. It utilizes the `newspaper` library to build newspaper objects from these URLs. The `get_top_articles` function takes a newspaper URL and retrieves a specified number of top articles (default is 5) from it. It then extracts relevant information from each article, including its title, publication date, text content, and URL. This feature enables the aggregation of recent articles from multiple news sources.

In [9]:
def get_top_articles(newspaper_url, num_articles=5):
    paper = newspaper.build(newspaper_url,number_threads=3)
    top_articles = []

    article_urls = [article.url for article in paper.articles[:num_articles]]

    for article in paper.articles[:num_articles]:
        article.download()
        article.parse()
        top_articles.append(article)
        article.url
    
    return zip(article_urls, top_articles)

articles_info = []

for news_url in newspaper_list:
    top_articles = get_top_articles(news_url, num_articles=5)
  
    # Display article titles
    for url, article in top_articles:
        single_info = {
            "newspaper": str(news_url),
            "title": article.title,
            "date": article.publish_date.strftime('%Y-%m-%d') if article.publish_date else None,
            "text" : article.text,
            "url": str(url)
        }
        articles_info.append(single_info)

# Convert the list to a JSON object
articles_json = json.dumps(articles_info)

# Print the JSON object of the first one to show it is correctly structured
print(articles_info[0])

{'newspaper': 'https://www.theguardian.com/europe', 'title': 'More tax cuts, fewer green policies: key takeaways from the Tories’ manifesto', 'date': '2024-06-11', 'text': '1. Tax cuts, again\n\nThe overriding theme of the manifesto and the launch – and, indeed, of the whole Tory campaign – has been an attempt to contrast promised tax cuts with the idea, however contested, of Labour tax rises.\n\nThe manifesto thus produced the mini-surprise of a pledge to abolish national insurance contributions (NICs) for self-employed people, plus the briefed-out plan to cut 2p more from NICs for employed workers, and the slightly vaguer promise to eventually phase out NICs “when it is affordable to do so”.\n\nWhile there were other pre-billed cuts, for example to stamp duty and a higher threshold for child benefits, there was nothing on inheritance tax, as sought by some Tory MPs.\n\n2. No rabbit, no hat\n\nWhile one Tory spinner made the reasonable point that the purpose of a manifesto is not simp

### Article Summarization
For article summarization, the script employs the `summarize` function. This function takes an article URL as input and utilizes the `newspaper` library to download, parse, and perform natural language processing (NLP) on the article. By leveraging NLP, it generates a summary of the article's content. This summary provides a condensed representation of the article's main points, making it easier for users to grasp the essential information without having to read the entire article.

In [11]:
def summarize(article_url):
    article = newspaper.article(article_url)
    article.download()
    article.parse()
    article.nlp()
    return article.summary

#Example usage

summarize(articles_info[0]['url'])

'Tax cuts, again The overriding theme of the manifesto and the launch – and, indeed, of the whole Tory campaign – has been an attempt to contrast promised tax cuts with the idea, however contested, of Labour tax rises.\nWhile there were other pre-billed cuts, for example to stamp duty and a higher threshold for child benefits, there was nothing on inheritance tax, as sought by some Tory MPs.\nOthers fallen bills would be resurrected, such as the long-delayed plan to ban no-fault evictions from residential properties, and reform of the leasehold system.\nBut not everything makes it One long-held Tory idea that has not been revived is the ban on so-called conversion therapies.\nWhile saying they are “abhorrent”, the manifesto adds: “But legislation around conversion practices is a very complex issue, with existing criminal law already offering robust protections.”'

### Sentiment Detection
The script includes a `detect_sentiment` function that determines the political bias of articles based on sentiment analysis. It utilizes the `transformers` library to load a sentiment analysis model pretrained on English text. The sentiment analysis model assigns a sentiment score to the text, indicating the overall sentiment (positive, negative, or neutral). Based on this score, the function categorizes the article's political bias as 'left-leaning,' 'right-leaning,' or 'neutral.'

In [12]:
from transformers import pipeline

def detect_sentiment(text):
    # Load sentiment analysis model with explicit model and revision
    sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", revision="af0f99b")

    # Perform sentiment analysis on the text
    sentiment = sentiment_model(text)

    # Based on sentiment score, determine sentiment
    sentiment_score = sentiment[0]['score']
    if sentiment_score >= 0.6:
        return 'positive'
    elif sentiment_score <= 0.4:
        return 'negative'
    else:
        return 'neutral'
    
detect_sentiment(articles_info[0]['text'][:511])

'positive'