# Financial News Sentiment Analysis Tool

## Overview

This Python tool analyzes the sentiment of financial news articles using Google's Gemini API. It processes news articles from JSON files, extracts sentiment scores across multiple dimensions, and outputs the results to CSV files for further analysis.

## Features

- Multi-dimensional sentiment analysis using Gemini AI
- Rate limiting to comply with API constraints
- Batch processing for efficient handling of large datasets
- Progress tracking to allow for interrupted processing to resume
- Date filtering to focus on specific time periods
- Comprehensive sentiment metrics including:
  - Basic polarity detection
  - Fine-grained sentiment analysis
  - Aspect-based sentiment analysis for financial dimensions
  - Topic-sentiment analysis
  - Emotion detection
  - Intent analysis
  - Subjectivity/objectivity detection
  - Contextual sentiment analysis

## Requirements

- Python 3.8+
- Google Gemini API key
- Required Python packages:
  - pandas
  - google-generativeai
  - tqdm
  - Other standard libraries (json, time, os, logging, glob, re, etc.)

## Installation

1. Install required packages:
   ```bash
   pip install pandas tqdm google-generativeai
   ```

2. Set up your Gemini API key:
   - Replace the placeholder API key in the code with your actual key
   - Or set it as an environment variable (modify the code accordingly)

3. Prepare your directory structure:
   ```
   project_root/
   ├── data/
   │   └── news_data/           # Where news JSON files are stored
   ├── log/
   │   └── sentiment/
   │       └── news/            # Where logs will be saved
   ├── progress/
   │   └── news/                # Where progress tracking files will be saved
   └── sentiment_results/
       └── news/                # Where results will be saved
   ```

## Input Data Format

The tool expects JSON files in the `data/news_data/` directory with naming convention `[TICKER]_news.json`. Each JSON file should contain an array of news articles with at least the following fields:

```json
[
  {
    "title": "Article title",
    "content": "Full article content...",
    "date": "YYYY-MM-DD",
    "link": "URL to the original article"
  }
]
```

## Usage

Run the script directly:

```bash
python sentiment_analysis.py
```

The script will prompt you for:
- Batch size (how many articles to process at once)
- Whether to force reprocessing of already processed articles
- Number of rows to process per file (for testing or limiting processing)

To integrate the tool into another script:

```python
from sentiment_analysis import process_multiple_news_files

# Process all news files with custom settings
process_multiple_news_files(
    num_rows=100,             # Process only 100 rows per file (None for all)
    force_reprocess=False,    # Skip already processed items
    batch_size=20             # Process 20 items at a time
)

# Process specific files
files = ["data/news_data/AAPL_news.json", "data/news_data/MSFT_news.json"]
process_multiple_news_files(
    files=files,
    num_rows=None,
    force_reprocess=True,
    batch_size=10
)
```

## Output Format

The tool generates CSV files in the `sentiment_results/news/` directory with naming convention `news_sentiment_[TICKER].csv`. Each CSV file contains:

- Basic article information (ticker, date, title, link)
- Various sentiment scores, including:
  - Overall polarity (-1 to 1)
  - Emotion and intent scores
  - Average scores for fine-grained, aspect-based, topic, and contextual sentiment
  - Individual aspect scores (financial performance, company operations, etc.)
  - Individual topic scores

## Configuration Options

Key constants that can be modified in the code:

- `MAX_REQUESTS_PER_MINUTE`: API rate limit (default: 1900)
- `MAX_REQUESTS_PER_DAY`: API daily limit (default: 1000000)
- `REQUEST_DELAY`: Minimum delay between requests (default: 0.5 seconds)
- `MAX_TEXT_LENGTH`: Maximum length for text inputs (default: 60000 characters)
- `MIN_DATE` and `MAX_DATE`: Date range for filtering news (default: 2023-01-01 to 2024-12-31)

## Technical Details

### Main Components

#### RateLimiter Class
Manages API rate limiting to prevent quota issues.

#### Key Functions

- `analyze_sentiment_with_gemini(text)`: Sends text to Gemini API and retrieves sentiment analysis
- `extract_json_from_response(response_text)`: Parses JSON from API response
- `flatten_sentiment_json(json_obj)`: Converts nested JSON to flat dictionary for CSV output
- `analyze_sentiment_batch(batch_items, filtered_news, retry_limit)`: Processes a batch of articles
- `process_news_file(file_path, ...)`: Processes all articles in a single news file
- `process_multiple_news_files(files, ...)`: Processes multiple news files

### Sentiment Analysis Prompt

The tool uses a detailed prompt that instructs the Gemini API to analyze sentiment across multiple dimensions:

1. Polarity Detection: Overall positive/negative sentiment
2. Fine-Grained Sentiment Analysis: Individual scores for different sections
3. Aspect-Based Sentiment Analysis: Scores for financial aspects (performance, operations, etc.)
4. Topic-Sentiment Analysis: Scores for key topics
5. Emotion Detection: Dominant emotions in the article
6. Intent Analysis: Purpose of the article
7. Subjectivity/Objectivity Detection: How factual vs. opinionated the article is
8. Contextual Sentiment Analysis: Sentiment across different contexts
9. Deep Learning-Based Analysis: Gemini's own sentiment assessment

### Error Handling and Retry Logic

The tool includes robust error handling:
- Failed API calls are retried up to a configurable limit
- JSON parsing errors are handled gracefully
- Progress is saved after each batch to prevent data loss

## Limitations

- Dependent on Gemini API availability and quotas
- Processing large volumes of news can be time-consuming
- API costs can accumulate with large datasets
- Sentiment analysis quality depends on the Gemini model's capabilities

## Troubleshooting

- **API Key Issues**: Ensure your API key is valid and has sufficient quota
- **Rate Limiting**: Adjust the rate limits if you encounter API throttling
- **Memory Issues**: For very large datasets, consider reducing batch size
- **JSON Parsing Errors**: Check that the API responses conform to expected format

In [None]:
import pandas as pd
import json
import time
import os
import logging
import glob
import re
from datetime import datetime
from collections import deque
from tqdm import tqdm
from google import genai

# Set up logging
log_dir = "log/sentiment/news"
os.makedirs(log_dir, exist_ok=True)
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(f'{log_dir}/gemini_news_sentiment_{datetime.now().strftime("%Y%m%d_%H%M%S")}.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

# Gemini API configuration
API_KEY = "AIzaSyAyPLiF-ckAV2N81bNwUZPzk1Vrrs-R9MI"  # Replace with your actual API key
MODEL = "gemini-2.0-flash"

# Set up Gemini client
client = genai.Client(api_key=API_KEY)

# Constants for API limits and processing
MAX_REQUESTS_PER_MINUTE = 1900  # Using your specified rate limit
MAX_REQUESTS_PER_DAY = 1000000  # Using your specified daily limit
REQUEST_DELAY = 0.5  # Using your specified delay
SAVE_FREQUENCY = 5  # Using your specified save frequency
MAX_TEXT_LENGTH = 60000  # Maximum length for text inputs
MIN_DATE = "2023-01-01"  # Start date for filtering news
MAX_DATE = "2024-12-31"  # End date for filtering news

# Create directories for outputs and progress tracking
os.makedirs("sentiment_results/news", exist_ok=True)
os.makedirs("progress/news", exist_ok=True)

# Class to track and enforce rate limits
class RateLimiter:
    def __init__(self, max_per_minute, max_per_day):
        self.max_per_minute = max_per_minute
        self.max_per_day = max_per_day
        self.minute_requests = deque()
        self.daily_requests = 0
        self.start_time = time.time()
    
    def check_and_wait(self):
        """Check if we can make a request, wait if needed, and track the request"""
        current_time = time.time()
        
        # Check daily limit
        if self.daily_requests >= self.max_per_day:
            logger.warning(f"Reached maximum daily request limit of {self.max_per_day}")
            return False
        
        # Clean up minute_requests older than 60 seconds
        while self.minute_requests and current_time - self.minute_requests[0] > 60:
            self.minute_requests.popleft()
        
        # Check if we're at the per-minute limit
        if len(self.minute_requests) >= self.max_per_minute:
            wait_time = 60 - (current_time - self.minute_requests[0])
            if wait_time > 0:
                logger.info(f"Rate limit approaching: Waiting {wait_time:.2f} seconds before next request")
                time.sleep(wait_time)
        
        # Always wait the minimum delay between requests
        time.sleep(REQUEST_DELAY)
        
        # Record this request
        self.minute_requests.append(time.time())
        self.daily_requests += 1
        
        return True

# Create rate limiter instance
rate_limiter = RateLimiter(MAX_REQUESTS_PER_MINUTE, MAX_REQUESTS_PER_DAY)

# Helper function to check if a row has already been processed
def get_processed_indices(progress_file):
    """Load the set of indices that have already been processed"""
    if os.path.exists(progress_file):
        with open(progress_file, 'r') as f:
            return set(json.load(f))
    return set()

# Helper function to save processed indices
def save_processed_indices(progress_file, processed_indices):
    """Save the set of indices that have been processed"""
    with open(progress_file, 'w') as f:
        json.dump(list(processed_indices), f)

# Helper function to clean and standardize dates
def clean_date(date_str):
    """Extract YYYY-MM-DD from date string"""
    try:
        # Parse the date string to datetime
        dt = pd.to_datetime(date_str)
        # Return only the date part as string
        return dt.strftime('%Y-%m-%d')
    except:
        logger.warning(f"Could not parse date: {date_str}")
        return None

# Function to analyze sentiment via Gemini API with rate limiting
def analyze_sentiment_with_gemini(text):
    """
    Analyze sentiment of a single text using Gemini API
    """
    # Using the EXACT prompt as provided by you
    prompt = f"""
Analyze the following financial news article for sentiment using multiple methods. Provide sentiment scores between -1 (strongly negative) and 1 (strongly positive) for each method.

Financial News Article: {text}

Instructions:

1.  Focus on the direct relevance of the news to the company mentioned in the article.
2.  Consider the article's impact on the company's stock price or financial performance.
3.  Pay attention to any forward-looking statements or projections.
4.  Identify and analyze any specific events or announcements mentioned in the article.

Methods:

1.  Polarity Detection (Basic Sentiment): Provide a single score representing the overall positive, negative, or neutral sentiment of the news article in relation to the company.
2.  Fine-Grained Sentiment Analysis: Provide individual scores for different sections or perspectives within the article, if applicable, and an average score.
3.  Aspect-Based Sentiment Analysis (ABSA): Identify the key aspects or attributes that are discussed in the news article that relates to the company, and provide sentiment scores for each identified aspect, and an average score. If possible, prioritize the following aspects:
    * Financial Performance: Earnings/Revenue, Profitability/Margins, Financial Health
    * Company Operations: Product/Service Developments, Sales/Market Share, Operational Efficiency
    * Growth and Strategy: Expansion/Acquisitions, Strategic Initiatives, Future Outlook/Guidance
    * Market/Industry: Industry Trends, Competitive Landscape, Regulatory Changes
    * Management/Leadership: Executive Changes, Management Decisions, Corporate Governance
    * Stock/Valuation: Stock Price Movements, Analyst Ratings/Targets, Valuation Metrics
    * Risks/Challenges: Legal Issues/Litigation, Financial Risks, Reputational Risks
4.  Topic-Sentiment Analysis: Identify the key topics or themes that are discussed in the news article that relates to the company, and provide sentiment scores for each identified topic, and an average score. If possible, prioritize the following topics:
    * Financial Performance
    * Company Operations
    * Growth and Strategy
    * Market/Industry
    * Management/Leadership
    * Stock/Valuation
    * Risks/Challenges
5.  Emotion Detection: Provide a score representing the dominant emotion(s) expressed in the news article. If multiple emotions are present, provide a weighted average.
6.  Intent Analysis: Provide a score representing the overall intent or purpose of the news article. If there are multiple intents, provide an average score.
7.  Subjectivity/Objectivity Detection: Provide a score representing the overall level of subjectivity or objectivity in the news article.
8.  Contextual Sentiment Analysis: Provide individual scores for different contextual elements or speakers within the news article, and an average score.
9.  Deep Learning-Based Approach (Gemini's Analysis): Provide overall, emotional, and contextual sentiment scores using Gemini's advanced analysis.

Format your response as a JSON object with the following structure:

{{
  "polarity_detection": score,
  "fine_grained_sentiment": {{
    "individual_scores": [{{"section/perspective": score}}],
    "average_score": score
  }},
  "aspect_based_sentiment": {{
    "aspect_scores": [{{"aspect": score}}],
    "average_score": score
  }},
  "topic_sentiment_analysis": {{
    "topic_scores": [{{"topic": score}}],
    "average_score": score
  }},
  "emotion_detection": score,
  "intent_analysis": score,
  "subjectivity_objectivity": score,
  "contextual_sentiment": {{
    "context_scores": [{{"context_element": score}}],
    "average_score": score
  }},
  "gemini_analysis": {{
    "overall_sentiment": score,
    "emotional_sentiment": score,
    "contextual_sentiment": score
  }}
}}
"""

    try:
        # Check rate limits
        if not rate_limiter.check_and_wait():
            logger.warning("Rate limit reached. Skipping request.")
            return None

        # Use Gemini API
        try:
            response = client.models.generate_content(
                model=MODEL,
                contents=prompt
            )
            
            # Safety checks and handling
            if not hasattr(response, 'text'):
                logger.error("No text attribute in response")
                return None
            
            return response.text
        except Exception as e:
            logger.error(f"Error in Gemini API call: {e}")
            return None
    
    except Exception as e:
        logger.error(f"Error preparing sentiment analysis: {e}")
        return None

def extract_json_from_response(response_text):
    """
    Extract the JSON object from the API response text using a more robust method
    """
    try:
        # Look for JSON pattern using regex - this is more reliable
        match = re.search(r'({.*})', response_text, re.DOTALL)
        if match:
            json_str = match.group(1)
            # Clean up potential issues before parsing
            json_str = re.sub(r',\s*}', '}', json_str)  # Remove trailing commas
            json_str = re.sub(r',\s*]', ']', json_str)  # Remove trailing commas in arrays
            
            return json.loads(json_str)
        else:
            logger.error("No JSON pattern found in response")
            return None
    except json.JSONDecodeError as e:
        logger.error(f"Error parsing JSON: {e}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error extracting JSON: {e}")
        return None

def flatten_sentiment_json(json_obj):
    """
    Flatten the nested JSON structure into a dictionary with consistent key-value pairs
    Only keeps predefined scores and categories for consistent CSV structure
    """
    flat_dict = {}
    
    if not json_obj:
        return flat_dict
    
    # Add default values for key metrics
    # Basic scores
    flat_dict["polarity_detection"] = 0.0
    flat_dict["emotion_detection"] = 0.0
    flat_dict["intent_analysis"] = 0.0
    flat_dict["subjectivity_objectivity"] = 0.0
    
    # Average scores
    flat_dict["fine_grained_sentiment_avg"] = 0.0
    flat_dict["aspect_based_sentiment_avg"] = 0.0
    flat_dict["topic_sentiment_analysis_avg"] = 0.0
    flat_dict["contextual_sentiment_avg"] = 0.0
    
    # Gemini analysis scores
    flat_dict["gemini_overall_sentiment"] = 0.0
    flat_dict["gemini_emotional_sentiment"] = 0.0
    flat_dict["gemini_contextual_sentiment"] = 0.0
    
    # Predefined aspects (initialize with default values)
    predefined_aspects = [
        "financial_performance",
        "company_operations",
        "growth_and_strategy",
        "market_industry",
        "management_leadership",
        "stock_valuation",
        "risks_challenges"
    ]
    
    for aspect in predefined_aspects:
        flat_dict[f"aspect_{aspect}"] = 0.0
    
    # Predefined topics (same as aspects for this case)
    for topic in predefined_aspects:
        flat_dict[f"topic_{topic}"] = 0.0
    
    # Extract top-level simple scores
    for key in ['polarity_detection', 'emotion_detection', 'intent_analysis', 'subjectivity_objectivity']:
        if key in json_obj:
            try:
                value = json_obj[key]
                if isinstance(value, list) and len(value) > 0:
                    value = value[0]
                flat_dict[key] = float(value)
            except Exception:
                pass  # Keep default if error
    
    # Extract Gemini Analysis scores
    if 'gemini_analysis' in json_obj:
        for sub_key, value in json_obj['gemini_analysis'].items():
            try:
                if isinstance(value, list) and len(value) > 0:
                    value = value[0]
                flat_dict[f"gemini_{sub_key}"] = float(value)
            except Exception:
                pass
    
    # Extract average scores only (no individual sections)
    if 'fine_grained_sentiment' in json_obj and 'average_score' in json_obj['fine_grained_sentiment']:
        try:
            value = json_obj['fine_grained_sentiment']['average_score']
            if isinstance(value, list) and len(value) > 0:
                value = value[0]
            flat_dict["fine_grained_sentiment_avg"] = float(value)
        except Exception:
            pass
    
    # Extract other average scores
    for key in ['aspect_based_sentiment', 'topic_sentiment_analysis', 'contextual_sentiment']:
        if key in json_obj and 'average_score' in json_obj[key]:
            try:
                value = json_obj[key]['average_score']
                if isinstance(value, list) and len(value) > 0:
                    value = value[0]
                flat_dict[f"{key}_avg"] = float(value)
            except Exception:
                pass
    
    # Process predefined aspects only
    if 'aspect_based_sentiment' in json_obj and 'aspect_scores' in json_obj['aspect_based_sentiment']:
        aspect_scores = json_obj['aspect_based_sentiment']['aspect_scores']
        if isinstance(aspect_scores, list):
            for score_dict in aspect_scores:
                for aspect, score in score_dict.items():
                    # Normalize aspect name for matching
                    normalized_aspect = aspect.lower().replace('/', '_').replace(' ', '_').replace(':', '_')
                    
                    # Find matching predefined aspect
                    matched_aspect = None
                    for predefined in predefined_aspects:
                        if predefined in normalized_aspect:
                            matched_aspect = predefined
                            break
                    
                    # Store only if it matches a predefined aspect
                    if matched_aspect:
                        try:
                            if isinstance(score, list) and len(score) > 0:
                                score = score[0]
                            flat_dict[f"aspect_{matched_aspect}"] = float(score)
                        except Exception:
                            pass
    
    # Process predefined topics only (similar to aspects)
    if 'topic_sentiment_analysis' in json_obj and 'topic_scores' in json_obj['topic_sentiment_analysis']:
        topic_scores = json_obj['topic_sentiment_analysis']['topic_scores']
        if isinstance(topic_scores, list):
            for score_dict in topic_scores:
                for topic, score in score_dict.items():
                    # Normalize topic name for matching
                    normalized_topic = topic.lower().replace('/', '_').replace(' ', '_')
                    
                    # Find matching predefined topic
                    matched_topic = None
                    for predefined in predefined_aspects:  # Using same list as aspects
                        if predefined in normalized_topic:
                            matched_topic = predefined
                            break
                    
                    # Store only if it matches a predefined topic
                    if matched_topic:
                        try:
                            if isinstance(score, list) and len(score) > 0:
                                score = score[0]
                            flat_dict[f"topic_{matched_topic}"] = float(score)
                        except Exception:
                            pass
    
    return flat_dict

# New function to process batches of articles
def analyze_sentiment_batch(batch_items, filtered_news, retry_limit=3):
    """
    Process a batch of news items
    
    Parameters:
    -----------
    batch_items : list
        List of indices to process
    filtered_news : list
        List of news items
    retry_limit : int
        Number of retries for API calls
    
    Returns:
    --------
    dict
        Dictionary mapping indices to sentiment results
    """
    batch_results = {}
    
    for idx in batch_items:
        news_item = filtered_news[idx]
        
        # Combine title and content for analysis
        title = news_item.get('title', '')
        content = news_item.get('content', '')
        
        if not content or not isinstance(content, str):
            logger.warning(f"Empty content for news item {idx}, skipping")
            continue
        
        # Combine title and content for more comprehensive analysis
        combined_text = f"Title: {title}\n\nContent: {content}"
        
        # Limit text length if too long
        if len(combined_text) > MAX_TEXT_LENGTH:
            combined_text = combined_text[:MAX_TEXT_LENGTH]
        
        # Retry mechanism
        success = False
        for attempt in range(retry_limit):
            try:
                # Call Gemini API
                response_text = analyze_sentiment_with_gemini(combined_text)
                
                if not response_text:
                    logger.error(f"No response from API for news item {idx}, attempt {attempt+1}/{retry_limit}")
                    time.sleep(3)  # Wait before retry
                    continue
                
                # Extract JSON from response
                sentiment_json = extract_json_from_response(response_text)
                
                if not sentiment_json:
                    logger.error(f"Failed to extract JSON from response for news item {idx}, attempt {attempt+1}/{retry_limit}")
                    time.sleep(3)  # Wait before retry
                    continue
                
                # Flatten JSON
                flat_sentiment = flatten_sentiment_json(sentiment_json)
                
                # Store results
                batch_results[idx] = flat_sentiment
                
                success = True
                break
                
            except Exception as e:
                logger.error(f"Error on attempt {attempt+1}/{retry_limit} for news item {idx}: {e}")
                time.sleep(3)  # Wait before retry
        
        if not success:
            logger.error(f"Failed to analyze sentiment for news item {idx} after {retry_limit} attempts")
    
    return batch_results

def process_news_file(file_path, output_dir="sentiment_results/news", num_rows=None, retry_limit=3, force_reprocess=True, batch_size=10):
    """
    Process a single news JSON file and run sentiment analysis
    
    Parameters:
    -----------
    file_path : str
        Path to the JSON file
    output_dir : str
        Directory to save results
    num_rows : int or None
        Number of rows to process (None for all)
    retry_limit : int
        Number of retries for API calls
    force_reprocess : bool
        If True, reprocess all rows even if they've been processed before
    batch_size : int
        Number of news items to process in a batch
    """
    try:
        # Create output directory if it doesn't exist
        os.makedirs(output_dir, exist_ok=True)
        
        # Extract ticker from filename
        ticker = os.path.basename(file_path).replace('_news.json', '')
        
        # Load the JSON file
        logger.info(f"Loading news file: {file_path}")
        with open(file_path, 'r') as f:
            news_data = json.load(f)
            
        logger.info(f"Loaded {len(news_data)} news items for {ticker}")
        
        # Filter by date
        filtered_news = []
        for item in news_data:
            clean_date_str = clean_date(item.get('date', ''))
            if clean_date_str:
                if MIN_DATE <= clean_date_str <= MAX_DATE:
                    # Add cleaned date and ticker to the item
                    item['clean_date'] = clean_date_str
                    item['ticker'] = ticker
                    filtered_news.append(item)
        
        logger.info(f"Filtered to {len(filtered_news)} news items between {MIN_DATE} and {MAX_DATE}")
        
        # Limit rows if specified
        if num_rows and num_rows < len(filtered_news):
            filtered_news = filtered_news[:num_rows]
            logger.info(f"Limited to first {num_rows} news items for testing")
            
        # Set up progress tracking
        progress_file = f"progress/news/news_{ticker}_progress.json"
        os.makedirs(os.path.dirname(progress_file), exist_ok=True)
        processed_indices = get_processed_indices(progress_file)
        
        # Create a list of indices to process
        all_indices = list(range(len(filtered_news)))
        if not force_reprocess:
            # Only process unprocessed items
            indices_to_process = [idx for idx in all_indices if idx not in processed_indices]
            logger.info(f"Found {len(indices_to_process)} unprocessed news items")
        else:
            indices_to_process = all_indices
            logger.info(f"Force reprocessing all {len(indices_to_process)} news items")
        
        # Track results
        results = {}
        
        # Process in batches
        total_batches = (len(indices_to_process) + batch_size - 1) // batch_size
        logger.info(f"Processing {len(indices_to_process)} news items in {total_batches} batches of size {batch_size}")
        
        for batch_idx in tqdm(range(0, len(indices_to_process), batch_size), desc=f"Processing {ticker} news batches"):
            # Get indices for this batch
            batch_indices = indices_to_process[batch_idx:batch_idx + batch_size]
            
            # Process the batch
            batch_results = analyze_sentiment_batch(batch_indices, filtered_news, retry_limit)
            
            # Update results and processed indices
            results.update(batch_results)
            for idx in batch_indices:
                processed_indices.add(idx)
            
            # Save progress after each batch
            save_processed_indices(progress_file, processed_indices)
            
            # Log progress
            items_processed = min(batch_idx + batch_size, len(indices_to_process))
            logger.info(f"Processed {items_processed}/{len(indices_to_process)} items ({items_processed/len(indices_to_process)*100:.1f}%)")
        
        # Create the result dataframe
        rows = []
        for idx, sentiment in results.items():
            # Get original news item data
            news_item = filtered_news[idx]
            
            # Create a row with required fields
            row = {
                'ticker': news_item['ticker'],
                'date': news_item['clean_date'],
                'title': news_item.get('title', ''),
                'link': news_item.get('link', '')
            }
            
            # Add sentiment scores
            for key, value in sentiment.items():
                row[key] = value
            
            rows.append(row)
        
        # Create dataframe from rows
        if rows:
            result_df = pd.DataFrame(rows)
            
            # Save results
            output_file = os.path.join(output_dir, f"news_sentiment_{ticker}.csv")
            result_df.to_csv(output_file, index=False)
            logger.info(f"Saved results to {output_file}")
        else:
            logger.warning(f"No results generated for {ticker}")
        
        # Save progress
        save_processed_indices(progress_file, processed_indices)
        
        return result_df
        
    except Exception as e:
        logger.error(f"Error processing file {file_path}: {e}")
        import traceback
        logger.error(f"Traceback: {traceback.format_exc()}")
        return None

def find_all_news_files():
    """
    Find all news JSON files in the data directory
    """
    files = glob.glob("data/news_data/*_news.json")
    return files

def process_multiple_news_files(files=None, num_rows=None, force_reprocess=True, batch_size=10):
    """
    Process multiple news files, running sentiment analysis on each
    
    Parameters:
    -----------
    files : list or None
        List of file paths to process. If None, processes all news files
    num_rows : int or None
        Number of rows to process per file. If None, processes all rows
    force_reprocess : bool
        If True, reprocess all rows even if they've been processed before
    batch_size : int
        Number of news items to process in a batch
    """
    if files is None:
        # Find all news files
        files = find_all_news_files()
    
    if not files:
        logger.error("No news files found")
        return
    
    total_files = len(files)
    logger.info(f"Found {total_files} news files to process")
    
    for i, file_path in enumerate(files):
        logger.info(f"Processing file {i+1}/{total_files}: {file_path}")
        process_news_file(file_path, num_rows=num_rows, force_reprocess=force_reprocess, batch_size=batch_size)
    
    logger.info("Completed processing all files")

def test_gemini_api():
    """Test the Gemini API with a small sample text"""
    logger.info("Testing Gemini API with a small sample...")
    sample_text = "The company reported strong revenue growth and exceeded analyst expectations for the quarter. However, challenges in the supply chain have impacted margins."
    
    try:
        # Simple prompt for testing
        prompt = "Analyze this text for sentiment and respond with a single word: positive, negative, or neutral: " + sample_text
        
        response = client.models.generate_content(
            model=MODEL,
            contents=prompt
        )
        
        logger.info(f"API Test Response: {response.text}")
        logger.info("Gemini API test successful!")
        return True
    except Exception as e:
        logger.error(f"Gemini API test failed: {e}")
        return False

def main():
    # First test if the API is working
    if not test_gemini_api():
        logger.error("Gemini API test failed. Please check your API key and connection.")
        return
    
    # Process all files
    print("\n=== FINANCIAL NEWS SENTIMENT ANALYSIS ===")
    
    # Get batch size
    batch_size_input = input("Enter batch size (default 10): ")
    batch_size = int(batch_size_input) if batch_size_input.strip() else 10
    
    # Force reprocessing?
    force_reprocess_input = input("Force reprocessing of all rows, even if previously processed? (y/n): ")
    force_reprocess = force_reprocess_input.lower() == 'y'
    
    # Number of rows?
    rows_input = input("How many rows per file? (Enter a number or 'all' for all rows): ")
    rows_per_file = None if rows_input.lower() == 'all' else int(rows_input)
    
    # Process files
    process_multiple_news_files(num_rows=rows_per_file, force_reprocess=force_reprocess, batch_size=batch_size)

if __name__ == "__main__":
    main()