# News Aggregator

## Features
- **Top Articles Retrieval**: Scrapes top articles from predefined news sources.
- **Article Summarization**: Summarizes articles to provide concise representations.
- **sentiment Detection**: Determines the sentiment of articles based on sentiment analysis.


In [1]:
import newspaper
import json

newspaper_list = [ 'https://time.com/',
        'https://www.theguardian.com/europe',
        'https://edition.cnn.com/' ]

### Top Articles Retrieval
The script scrapes the top articles from predefined news sources listed in the `newspaper_list`. It utilizes the `newspaper` library to build newspaper objects from these URLs. The `get_top_articles` function takes a newspaper URL and retrieves a specified number of top articles (default is 5) from it. It then extracts relevant information from each article, including its title, publication date, text content, and URL. This feature enables the aggregation of recent articles from multiple news sources.

In [3]:
def get_top_articles(newspaper_url, num_articles=5):
    paper = newspaper.build(newspaper_url,number_threads=3)
    top_articles = []

    article_urls = [article.url for article in paper.articles[:num_articles]]

    for article in paper.articles[:num_articles]:
        article.download()
        article.parse()
        top_articles.append(article)
        article.url
    
    return zip(article_urls, top_articles)

articles_info = []

for news_url in newspaper_list:
    top_articles = get_top_articles(news_url, num_articles=5)
  
    # Display article titles
    for url, article in top_articles:
        single_info = {
            "newspaper": str(news_url),
            "title": article.title,
            "date": article.publish_date.strftime('%Y-%m-%d') if article.publish_date else None,
            "text" : article.text,
            "url": str(url)
        }
        articles_info.append(single_info)

# Convert the list to a JSON object
articles_json = json.dumps(articles_info)

# Print the JSON object
print(articles_json)

  if feed.doc:




### Article Summarization
For article summarization, the script employs the `summarize` function. This function takes an article URL as input and utilizes the `newspaper` library to download, parse, and perform natural language processing (NLP) on the article. By leveraging NLP, it generates a summary of the article's content. This summary provides a condensed representation of the article's main points, making it easier for users to grasp the essential information without having to read the entire article.

In [6]:
def summarize(article_url):
    article = newspaper.article(article_url)
    article.download()
    article.parse()
    article.nlp()
    return article.summary

#Example usage

summarize(articles_info[0]['url'])

'If you were on Twitter or TikTok over the weekend, you might have seen people talking about Project 2025.\nLed by the right-wing think tank the Heritage Foundation, Project 2025 is a presidential transition operation—basically a government-in-waiting if former President Donald Trump returns to office on Jan. 20, 2025.\nProject 2025 said on its website that the handbook is “the next conservative President’s last opportunity to save our republic.” “It is not enough for conservatives to win elections,” Project 2025 said on its website.\nThat is the goal of the 2025 Presidential Transition Project.” “With the right conservative policy recommendations and properly vetted and trained personnel to implement them, we will take back our government,” the project continued.\nMany critics have labeled Project 2025 as “authoritarian.” The project relies on what legal scholars call the unitary executive theory, which dismisses the idea that there are three separate branches of government for checks

### Sentiment Detection
The script includes a `detect_sentiment` function that determines the political bias of articles based on sentiment analysis. It utilizes the `transformers` library to load a sentiment analysis model pretrained on English text. The sentiment analysis model assigns a sentiment score to the text, indicating the overall sentiment (positive, negative, or neutral). Based on this score, the function categorizes the article's political bias as 'left-leaning,' 'right-leaning,' or 'neutral.'

In [10]:
from transformers import pipeline

def detect_sentiment(text):
    # Load sentiment analysis model with explicit model and revision
    sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", revision="af0f99b")

    # Perform sentiment analysis on the text
    sentiment = sentiment_model(text)

    # Based on sentiment score, determine sentiment
    sentiment_score = sentiment[0]['score']
    if sentiment_score >= 0.6:
        return 'positive'
    elif sentiment_score <= 0.4:
        return 'negative'
    else:
        return 'neutral'
    
detect_sentiment(articles_info[0]['text'][:511])

'left-leaning'