# NewsAPI Financial News Analysis

This notebook demonstrates how to integrate with NewsAPI to retrieve and analyze financial news articles for investment research using LangChain.

## Overview

- **Service**: NewsAPI (https://newsapi.org/)
- **Purpose**: Retrieve and analyze financial news for investment research
- **Rate Limits**: 1,000 requests/month (free tier)
- **Documentation**: https://newsapi.org/docs
- **Integration**: LangChain for data processing and analysis

## Prerequisites

1. Register for a free NewsAPI account
2. Obtain your API key
3. Install required dependencies
4. Set up environment variables


# Install required packages


In [7]:
%pip install requests pandas langchain langchain-text-splitters python-dotenv

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Import Required Libraries

First, let's import all necessary libraries for NewsAPI integration and LangChain processing.


In [8]:
import os
import requests
import json
import pandas as pd
from datetime import datetime, timedelta
from typing import List, Dict, Any
import warnings
warnings.filterwarnings('ignore')

# LangChain imports
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

print("Libraries imported successfully")
print(f"Analysis timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Libraries imported successfully
Analysis timestamp: 2025-09-20 22:57:16


## Configuration and Setup

Configure NewsAPI credentials and connection parameters.


In [9]:
# NewsAPI Configuration
NEWSAPI_KEY = os.getenv('NEWSAPI_KEY', 'your_newsapi_key_here')
NEWSAPI_BASE_URL = "https://newsapi.org/v2"

# Test ticker for demonstration
TEST_TICKER = "AAPL"  # Apple Inc.

# Date range for news search (last 7 days)
end_date = datetime.now()
start_date = end_date - timedelta(days=7)

print(f"NewsAPI Configuration:")
print(f"Base URL: {NEWSAPI_BASE_URL}")
print(f"API Key configured: {'Yes' if NEWSAPI_KEY != 'your_newsapi_key_here' else 'No - Please set NEWSAPI_KEY environment variable'}")
print(f"Test ticker: {TEST_TICKER}")
print(f"Date range: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")

NewsAPI Configuration:
Base URL: https://newsapi.org/v2
API Key configured: Yes
Test ticker: AAPL
Date range: 2025-09-13 to 2025-09-20


## NewsAPI Client Class

Create a professional NewsAPI client class with error handling and data processing capabilities.


In [10]:
class NewsAPIClient:
    """Professional NewsAPI client for financial news retrieval."""
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = NEWSAPI_BASE_URL
        
    def get_news_for_ticker(self, ticker: str, from_date: str, to_date: str, 
                           page_size: int = 20) -> List[Dict[str, Any]]:
        """
        Retrieve news articles for a specific stock ticker.
        
        Args:
            ticker: Stock ticker symbol (e.g., 'AAPL')
            from_date: Start date in YYYY-MM-DD format
            to_date: End date in YYYY-MM-DD format
            page_size: Number of articles to retrieve (max 100)
            
        Returns:
            List of news articles with metadata
        """
        url = f"{self.base_url}/everything"
        
        params = {
            'q': f'{ticker} OR "{ticker}" stock shares earnings revenue profit',
            'from': from_date,
            'to': to_date,
            'sortBy': 'relevancy',
            'language': 'en',
            'pageSize': min(page_size, 100),
            'apiKey': self.api_key
        }
        
        try:
            response = requests.get(url, params=params, timeout=30)
            response.raise_for_status()
            
            data = response.json()
            
            if data['status'] == 'ok':
                articles = data.get('articles', [])
                print(f"Retrieved {len(articles)} articles for {ticker}")
                return articles
            else:
                print(f"API Error: {data.get('message', 'Unknown error')}")
                return []
                
        except requests.exceptions.RequestException as e:
            print(f"Request failed: {e}")
            return []
        except json.JSONDecodeError as e:
            print(f"Failed to decode response: {e}")
            return []
    
    def format_articles_for_analysis(self, articles: List[Dict[str, Any]]) -> List[Document]:
        """
        Format news articles for LangChain processing.
        
        Args:
            articles: List of news articles from NewsAPI
            
        Returns:
            List of LangChain Document objects
        """
        documents = []
        
        for article in articles:
            # Combine title, description, and content for comprehensive analysis
            content_parts = []
            
            if article.get('title'):
                content_parts.append(f"Title: {article['title']}")
            
            if article.get('description'):
                content_parts.append(f"Description: {article['description']}")
            
            if article.get('content'):
                # Remove source attribution that often appears at the end
                content = article['content']
                if '[+' in content:
                    content = content.split('[+')[0].strip()
                content_parts.append(f"Content: {content}")
            
            full_content = "\n\n".join(content_parts)
            
            # Create metadata for the document
            metadata = {
                'source': article.get('source', {}).get('name', 'Unknown'),
                'author': article.get('author', 'Unknown'),
                'published_at': article.get('publishedAt', ''),
                'url': article.get('url', ''),
                'ticker': article.get('ticker', 'Unknown')
            }
            
            documents.append(Document(page_content=full_content, metadata=metadata))
        
        return documents

# Initialize the NewsAPI client
news_client = NewsAPIClient(NEWSAPI_KEY)
print("NewsAPI client initialized successfully")

NewsAPI client initialized successfully


## Retrieve Financial News Data

Now let's retrieve news articles for our test ticker and process them using LangChain.


In [11]:
# Retrieve news articles for the test ticker
print(f"Retrieving news articles for {TEST_TICKER}...")

articles = news_client.get_news_for_ticker(
    ticker=TEST_TICKER,
    from_date=start_date.strftime('%Y-%m-%d'),
    to_date=end_date.strftime('%Y-%m-%d'),
    page_size=10  # Limit for demonstration
)

if articles:
    print(f"Successfully retrieved {len(articles)} articles")
    
    # Display basic information about retrieved articles
    print("Article overview:")
    
    for i, article in enumerate(articles[:3], 1):  # Show first 3
        print(f"{i}. {article.get('title', 'No Title')}")
        print(f"   Source: {article.get('source', {}).get('name', 'Unknown')}")
        print(f"   Published: {article.get('publishedAt', 'Unknown')}")
        print()
    
    if len(articles) > 3:
        print(f"... and {len(articles) - 3} more articles")
        
else:
    print("No articles retrieved. Please check your API key and network connection.")

Retrieving news articles for AAPL...
Retrieved 10 articles for AAPL
Successfully retrieved 10 articles
Article overview:
1. This Little-Known AI Stock Is Up 70% in 2025 and Analysts Think It Can Rally Further From Here
   Source: Barchart.com
   Published: 2025-09-15T15:46:32Z

2. Top Stock Movers Now: Apple, FedEx, Lennar, and More
   Source: Investopedia
   Published: 2025-09-19T17:02:52Z

3. How Micron Stock Surges 2x To $300
   Source: Forbes
   Published: 2025-09-18T09:00:27Z

... and 7 more articles
Retrieved 10 articles for AAPL
Successfully retrieved 10 articles
Article overview:
1. This Little-Known AI Stock Is Up 70% in 2025 and Analysts Think It Can Rally Further From Here
   Source: Barchart.com
   Published: 2025-09-15T15:46:32Z

2. Top Stock Movers Now: Apple, FedEx, Lennar, and More
   Source: Investopedia
   Published: 2025-09-19T17:02:52Z

3. How Micron Stock Surges 2x To $300
   Source: Forbes
   Published: 2025-09-18T09:00:27Z

... and 7 more articles


## LangChain Processing and Analysis

Process the retrieved news articles using LangChain for sentiment analysis and summarization.


In [12]:
if articles:
    # Convert articles to LangChain documents
    print("Processing articles with LangChain...")
    
    documents = news_client.format_articles_for_analysis(articles)
    print(f"Converted {len(documents)} articles to LangChain documents")
    
    # Create a simple sentiment analysis function using keyword matching
    def analyze_sentiment(text: str) -> Dict[str, Any]:
        """
        Simple sentiment analysis using keyword matching.
        In a production environment, you would use a proper LLM here.
        """
        text_lower = text.lower()
        
        # Define sentiment keywords
        positive_keywords = [
            'positive', 'growth', 'increase', 'profit', 'earnings beat', 'strong',
            'bullish', 'gain', 'rise', 'up', 'success', 'better', 'good', 'excellent',
            'outperform', 'upgrade', 'buy', 'revenue growth', 'expansion'
        ]
        
        negative_keywords = [
            'negative', 'decline', 'decrease', 'loss', 'earnings miss', 'weak',
            'bearish', 'fall', 'drop', 'down', 'failure', 'worse', 'bad', 'poor',
            'underperform', 'downgrade', 'sell', 'revenue decline', 'contraction'
        ]
        
        neutral_keywords = [
            'stable', 'steady', 'unchanged', 'flat', 'hold', 'maintain',
            'neutral', 'mixed', 'uncertain', 'wait', 'monitor'
        ]
        
        # Count sentiment indicators
        positive_score = sum(1 for keyword in positive_keywords if keyword in text_lower)
        negative_score = sum(1 for keyword in negative_keywords if keyword in text_lower)
        neutral_score = sum(1 for keyword in neutral_keywords if keyword in text_lower)
        
        # Determine overall sentiment
        total_score = positive_score + negative_score + neutral_score
        
        if total_score == 0:
            sentiment = 'neutral'
            confidence = 0.0
        else:
            if positive_score > negative_score and positive_score > neutral_score:
                sentiment = 'positive'
                confidence = positive_score / total_score
            elif negative_score > positive_score and negative_score > neutral_score:
                sentiment = 'negative'
                confidence = negative_score / total_score
            else:
                sentiment = 'neutral'
                confidence = max(neutral_score, positive_score, negative_score) / total_score
        
        return {
            'sentiment': sentiment,
            'confidence': round(confidence, 2),
            'positive_signals': positive_score,
            'negative_signals': negative_score,
            'neutral_signals': neutral_score,
            'total_signals': total_score
        }
    
    # Analyze each document
    print(f"Sentiment Analysis Results for {TEST_TICKER}:")
    
    sentiment_results = []
    
    for i, doc in enumerate(documents[:3], 1):  # Analyze first 3 documents
        # Extract title from content
        content_lines = doc.page_content.split('\n\n')
        title = content_lines[0].replace('Title: ', '') if content_lines else 'No Title'
        
        print(f"Article {i}: {title[:80]}...")
        print(f"  Source: {doc.metadata.get('source', 'Unknown')}")
        
        # Perform sentiment analysis
        sentiment_result = analyze_sentiment(doc.page_content)
        sentiment_results.append(sentiment_result)
        
        print(f"  Sentiment: {sentiment_result['sentiment'].upper()} ({sentiment_result['confidence']:.2f})")
        print(f"  Signals: +{sentiment_result['positive_signals']} -{sentiment_result['negative_signals']} ={sentiment_result['neutral_signals']}")
    
    print(f"\nOverall Sentiment Summary for {TEST_TICKER}:")
    
    if sentiment_results:
        # Calculate overall sentiment
        total_positive = sum(r['positive_signals'] for r in sentiment_results)
        total_negative = sum(r['negative_signals'] for r in sentiment_results)
        total_neutral = sum(r['neutral_signals'] for r in sentiment_results)
        
        sentiment_counts = {
            'positive': sum(1 for r in sentiment_results if r['sentiment'] == 'positive'),
            'negative': sum(1 for r in sentiment_results if r['sentiment'] == 'negative'),
            'neutral': sum(1 for r in sentiment_results if r['sentiment'] == 'neutral')
        }
        
        avg_confidence = sum(r['confidence'] for r in sentiment_results) / len(sentiment_results)
        
        print(f"Articles analyzed: {len(sentiment_results)}")
        print(f"Sentiment distribution: {sentiment_counts['positive']}P {sentiment_counts['negative']}N {sentiment_counts['neutral']}Neu")
        print(f"Average confidence: {avg_confidence:.2f}")
        print(f"Total signals: +{total_positive} -{total_negative} ={total_neutral}")
        
        # Overall market sentiment
        if total_positive > total_negative:
            overall_sentiment = "POSITIVE"
        elif total_negative > total_positive:
            overall_sentiment = "NEGATIVE"
        else:
            overall_sentiment = "NEUTRAL"
            
        print(f"Overall market sentiment for {TEST_TICKER}: {overall_sentiment}")
    
else:
    print("No articles available for analysis.")

Processing articles with LangChain...
Converted 10 articles to LangChain documents
Sentiment Analysis Results for AAPL:
Article 1: This Little-Known AI Stock Is Up 70% in 2025 and Analysts Think It Can Rally Fur...
  Source: Barchart.com
  Sentiment: POSITIVE (1.00)
  Signals: +3 -0 =0
Article 2: Top Stock Movers Now: Apple, FedEx, Lennar, and More...
  Source: Investopedia
  Sentiment: NEUTRAL (0.00)
  Signals: +0 -0 =0
Article 3: How Micron Stock Surges 2x To $300...
  Source: Forbes
  Sentiment: NEUTRAL (0.00)
  Signals: +0 -0 =0

Overall Sentiment Summary for AAPL:
Articles analyzed: 3
Sentiment distribution: 1P 0N 2Neu
Average confidence: 0.33
Total signals: +3 -0 =0
Overall market sentiment for AAPL: POSITIVE


## Data Export and Visualization

Export the processed data and create visualizations for further analysis.


In [13]:
if articles and 'sentiment_results' in locals():
    # Create data directory if it doesn't exist
    import os
    data_dir = "data"
    os.makedirs(data_dir, exist_ok=True)
    
    # Create a DataFrame for easier analysis and export
    news_data = []
    
    for i, (article, sentiment) in enumerate(zip(articles[:len(sentiment_results)], sentiment_results)):
        news_data.append({
            'ticker': TEST_TICKER,
            'title': article.get('title', 'No Title'),
            'source': article.get('source', {}).get('name', 'Unknown'),
            'published_at': article.get('publishedAt', 'Unknown'),
            'url': article.get('url', 'No URL'),
            'sentiment': sentiment['sentiment'],
            'confidence': sentiment['confidence'],
            'positive_signals': sentiment['positive_signals'],
            'negative_signals': sentiment['negative_signals'],
            'neutral_signals': sentiment['neutral_signals'],
            'total_signals': sentiment['total_signals']
        })
    
    df = pd.DataFrame(news_data)
    
    print(f"News Analysis Summary for {TEST_TICKER}")
    print(df[['title', 'source', 'sentiment', 'confidence']].to_string(index=False))
    
    # Create summary statistics
    print(f"\nSentiment Distribution:")
    sentiment_dist = df['sentiment'].value_counts()
    for sentiment, count in sentiment_dist.items():
        percentage = (count / len(df)) * 100
        print(f"{sentiment.capitalize()}: {count} articles ({percentage:.1f}%)")
    
    print(f"\nSignal Analysis:")
    total_positive_signals = df['positive_signals'].sum()
    total_negative_signals = df['negative_signals'].sum()
    total_neutral_signals = df['neutral_signals'].sum()
    
    print(f"Total signals: +{total_positive_signals} -{total_negative_signals} ={total_neutral_signals}")
    
    # Calculate signal ratio
    if total_negative_signals > 0:
        signal_ratio = total_positive_signals / total_negative_signals
        print(f"Positive/Negative ratio: {signal_ratio:.2f}")
    else:
        print("Positive/Negative ratio: Infinite (no negative signals)")
    
    # Save to CSV for further analysis
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    output_filename = os.path.join(data_dir, f"newsapi_analysis_{TEST_TICKER}_{timestamp}.csv")
    df.to_csv(output_filename, index=False)
    print(f"\nData exported to: {output_filename}")
    
    print(f"\nInvestment Research Summary for {TEST_TICKER}:")
    print(f"Analysis period: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")
    print(f"Articles analyzed: {len(df)}")
    print(f"Average sentiment confidence: {df['confidence'].mean():.2f}")
    
    # Generate investment insight
    if total_positive_signals > total_negative_signals * 1.5:
        insight = "BULLISH - Strong positive sentiment detected"
    elif total_negative_signals > total_positive_signals * 1.5:
        insight = "BEARISH - Strong negative sentiment detected"
    else:
        insight = "NEUTRAL - Mixed or balanced sentiment"
    
    print(f"Market sentiment indicator: {insight}")
    
    # Key sources contributing to sentiment
    source_sentiment = df.groupby('source')['sentiment'].apply(list).to_dict()
    print(f"Key news sources:")
    for source, sentiments in source_sentiment.items():
        positive_count = sentiments.count('positive')
        negative_count = sentiments.count('negative')
        neutral_count = sentiments.count('neutral')
        print(f"  {source}: {positive_count}P {negative_count}N {neutral_count}Neu")

else:
    print("No data available for export and visualization.")

News Analysis Summary for AAPL
                                                                                         title       source sentiment  confidence
This Little-Known AI Stock Is Up 70% in 2025 and Analysts Think It Can Rally Further From Here Barchart.com  positive         1.0
                                          Top Stock Movers Now: Apple, FedEx, Lennar, and More Investopedia   neutral         0.0
                                                            How Micron Stock Surges 2x To $300       Forbes   neutral         0.0

Sentiment Distribution:
Neutral: 2 articles (66.7%)
Positive: 1 articles (33.3%)

Signal Analysis:
Total signals: +3 -0 =0
Positive/Negative ratio: Infinite (no negative signals)

Data exported to: data\newsapi_analysis_AAPL_20250920_225716.csv

Investment Research Summary for AAPL:
Analysis period: 2025-09-13 to 2025-09-20
Articles analyzed: 3
Average sentiment confidence: 0.33
Market sentiment indicator: BULLISH - Strong positive sentiment de