# ü§ñ Automated Research Paper Analysis System
**Multi-Agent AI System for Academic Research Analysis**

---

**Author**: Mohamed Outahajala  
 
**Framework**: Google Agent Development Kit (ADK)  
---

## Quick Start

1. Upload your PDF research paper as `document.pdf`
2. Set your API keys in `.env`:

## Problem Statement
Analyzing research papers manually is time-consuming and requires:
- Reading lengthy PDFs
- Summarizing key findings
- Researching latest trends
- Synthesizing information from multiple sources

This process can take 2-3 hours per paper.

## Solution
An automated Multi-Agent Research System that:
- Extracts PDF content automatically
- Generates comprehensive summaries
- Performs real-time market research
- Produces structured reports in under 5 minutes

**Value**: Reduces research time by 95%, from 3 hours to 5 minutes per paper.

<!-- if numpy or pandas .. not installed -->
!pip install numpy
!pip install pandas
!pip install pypdf
!pip install google-adk
!pip install nest-asyncio

In [101]:
# Load environment variables from a .env file
from dotenv import load_dotenv
load_dotenv()
import os
import requests
import json

# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Retrieve API keys from environment variables
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
GOOGLE_SEARCH_API_KEY = os.getenv("GOOGLE_SEARCH_API_KEY")
MODEL_NAME = os.getenv("MODEL_NAME", "gemini-2.5-flash")  # Default to "gemini-2.5-flash" if not set   



In [102]:
from google.adk.agents import Agent, SequentialAgent, ParallelAgent, LoopAgent
from google.adk.models.google_llm import Gemini
from google.adk.runners import InMemoryRunner
from google.adk.tools import AgentTool, FunctionTool, google_search
from google.genai import types

print("‚úÖ ADK components imported successfully.")

‚úÖ ADK components imported successfully.


In [103]:
retry_config=types.HttpRetryOptions(
    attempts=5,  # Maximum retry attempts
    exp_base=7,  # Delay multiplier
    initial_delay=1,
    http_status_codes=[429, 500, 503, 504], # Retry on these HTTP errors
)

In [104]:
# Diagnostic: Check if document.pdf is accessible
import os

print("Current working directory:", os.getcwd())
print("Files in current directory:")
for item in os.listdir('.'):
    if item.endswith('.pdf'):
        print(f"  üìÑ {item}")

# Check if document.pdf exists
pdf_path = "document.pdf"
if os.path.exists(pdf_path):
    print(f"‚úÖ '{pdf_path}' found and accessible")
    # Quick test read
    from pypdf import PdfReader
    reader = PdfReader(pdf_path)
    print(f"   Pages: {len(reader.pages)}")
    print(f"   First 100 chars: {reader.pages[0].extract_text()[:100]}...")
else:
    print(f"‚ùå '{pdf_path}' NOT found in current directory")
    print(f"   You may need to specify the full path or move the file")

Current working directory: /Users/admin/HF/Agents5D
Files in current directory:
  üìÑ document.pdf
‚úÖ 'document.pdf' found and accessible
   Pages: 5
   First 100 chars: Tagging Amazigh with AnCoraPipe 
Mohamed Outahajala, Lahbib Zekouar, Paolo Rosso, M. Ant√≤nia Mart√≠ 
...


In [105]:
# PDF Search Tool
from pypdf import PdfReader

def search_pdf_tool(file_path: str, query: str) -> str:
    """
    Searches for keywords within a PDF file and returns relevant text snippets.
    If the file is not found, returns mock data for demonstration.
    """
    print(f"    üîé [Tool] Searching PDF '{file_path}' for: '{query}'")
    
    # 1. Try to read the actual file
    if os.path.exists(file_path):
        try:
            reader = PdfReader(file_path)
            text = ""
            for page in reader.pages:
                text += page.extract_text() + "\n"
            
            # Return full document text instead of keyword matching
            if text.strip():
                return text[:5000]  # Return first 5000 chars to avoid token limits
            else:
                return "PDF file is empty or unreadable."
        except Exception as e:
            return f"Error reading PDF: {e}"

    # 2. Fallback Mock Data (for testing without a file)
    else:
        print(f"    ‚ö†Ô∏è [Tool] File not found. Using MOCK data for demonstration.")
        mock_content = """
Research Paper: Advanced AI Systems

Abstract: This paper presents a novel framework for multi-agent AI systems that can collaborate to solve complex research problems.

Introduction: The field of artificial intelligence has rapidly evolved with the emergence of large language models and agent-based architectures.

Methodology: We employed a sequential workflow combining PDF analysis, parallel processing, and result aggregation using the Google ADK framework.

Key Findings:
- Multi-agent systems show 95% improvement in research analysis speed
- Parallel execution reduces processing time by 60%
- Integration with real-time search enables up-to-date information retrieval

Conclusion: Our approach demonstrates significant improvements in automating research paper analysis.
        """
        return mock_content

print("‚úÖ PDF Search Tool initialized.")

# 1. PDF Reader Agent - FIXED
pdf_reader_agent = Agent(
    name="PDFReader",
    model=Gemini(model=MODEL_NAME, api_key=GOOGLE_API_KEY, retry_options=retry_config),
    instruction="""You are an expert document researcher. 
    
    The user will specify which PDF file to analyze in their message.
    Use the search_pdf_tool to extract content from that PDF file.
    
    Analyze the document and provide a structured analysis with these sections:
    
    **Main Topic**: [Brief description of what the paper is about]
    
    **Key Contributions**: [List the novel contributions and innovations]
    
    **Methodology**: [Describe the approaches and methods used]
    
    **Results/Findings**: [Summarize the main outcomes and conclusions]
    
    Keep your response clear and structured. Cite specific sections when relevant.
    Do not return the raw text - provide your analysis.""",
    tools=[FunctionTool(search_pdf_tool)],
    output_key="pdf_findings"
)
print("‚úÖ PDF Reader Agent created.")


‚úÖ PDF Search Tool initialized.
‚úÖ PDF Reader Agent created.


In [106]:
# 2. Summarizer Agent
summarizer_agent = Agent(
    name="Summarizer",
    model=Gemini(model=MODEL_NAME, api_key=GOOGLE_API_KEY, retry_options=retry_config),
    instruction="""You are an expert scientific paper analyst. 
    Read the research paper content provided: {pdf_findings}
    
    Create a comprehensive summary that includes:
    1. **Main Topic**: What is the paper about?
    2. **Key Contributions**: What are the novel contributions and innovations?
    3. **Methodology**: What approaches or methods were used?
    4. **Results/Findings**: What were the main outcomes?
    
    Keep the summary clear, structured, and under 200 words.
    If the findings are empty, state that no information was found.""",
    output_key="final_summary"
)

In [107]:
# 3. Tech Researcher
tech_researcher = Agent(
    name="Tech_Researcher",
    model=Gemini(model=MODEL_NAME, api_key=GOOGLE_API_KEY, retry_options=retry_config),

    instruction="""
You are a senior research analyst.

Input: {pdf_findings}

1. Extract the paper‚Äôs **main technical focus**, research problem, and method.
2. Evaluate the paper technically:
   - What is innovative?
   - What is weak or missing?
   - What assumptions does it make?
   - Possible real-world applications?
3. Perform a web search using the search tool:
   - Find the latest (2024‚Äì2025) work, breakthroughs, or criticisms related to the same topic.
   - Prefer scholarly or technical sources.
4. Produce a concise synthesis (max 100 words):
   - Technical evaluation of the paper
   - How the latest research trends compare or validate/challenge it
   - Missing gaps or future directions

Your output must be factual, technical, and short.
""",

    tools=[google_search],
    output_key="tech_research"
)


In [108]:
linguistic_reviewer = Agent(
    name="LinguisticReviewer",
    model=Gemini(model=MODEL_NAME, api_key=GOOGLE_API_KEY, retry_options=retry_config),
    instruction="""
You are a senior linguist specializing in Afro-Asiatic languages, morphology, and computational tagging.

Input:
{pdf_findings}

Your task:
Provide a rigorous linguistic analysis of the paper, focusing on:
- correctness of linguistic claims
- completeness and adequacy of the tagset
- treatment of Amazigh morphology (root‚Äìpattern, affixes, clitics)
- dialectal consistency and variation issues
- writing system adequacy (Tifinaghe, Arabic script, Latin script)
- grammatical phenomena that should be included but are missing
- potential linguistic ambiguities or tagging challenges

Output:
Produce an actionable review section titled ‚ÄúLinguistic Analysis & Recommendations‚Äù.
Be technical, precise, and non-redundant.
""",
    output_key="linguistic_review"
)


In [109]:
# The ParallelAgent runs all its sub-agents simultaneously
# other agents can be added here later
parallel_research_team = ParallelAgent(
    name="ParallelResearchTeam",
    sub_agents=[summarizer_agent, tech_researcher, linguistic_reviewer],
)

In [110]:
research_aggregator = Agent(
    name="ResearchAggregator",
    model=Gemini(model=MODEL_NAME, api_key=GOOGLE_API_KEY, retry_options=retry_config),
    instruction="""
You are a Senior Research Reviewer providing a CONCISE expert review with scores.

Inputs:
1. Summary: {final_summary}
2. Technical Analysis: {tech_research}
3. Linguistic Analysis: {linguistic_review}

Produce a professional peer review following this EXACT structure and format:

---

## Scores
**Knowledge of the Field:** [1-5]/5
**Soundness:** [1-5]5
**Clarity:** [1-5]/5
**Originality of the Approach:** [1-5]/5
**Significance of Results:** [1-5]/5
**Replicability:** [1-5]/5

**Overall Assessment:** [1-5]/5

**Decision:** [Reject/Weak Reject/Borderline/Accept/Strong Accept]

*Scoring Guide:*
- 1 = Should be rejected without doubt
- 2 = Some salvageable ideas, but reject
- 3 = Ambivalent, OK to accept but not enthusiastic
- 4 = Should be accepted
- 5 = Enthusiastically advocate for acceptance

---

## Executive Summary (3-4 sentences)
One paragraph overview of paper's purpose and significance.

## Key Contributions (3-5 bullet points)
‚Ä¢ Most important contribution
‚Ä¢ Second contribution
‚Ä¢ Third contribution

## Methodology (2-3 sentences)
Brief description of approach used.

## Critical Assessment

**Strengths:**
- Strength 1
- Strength 2

**Weaknesses:**
- Weakness 1
- Weakness 2
- Missing element or gap

## Linguistic Review (2-3 sentences)
Brief comment on linguistic adequacy, tagset, and morphology handling.

## Recommendations (4-5 points)
1. Most critical improvement
2. Second priority
3. Third priority
4. Additional suggestion

---

CRITICAL RULES:
- Provide honest, justified scores based on the analysis
- Be consistent between scores and written assessment
- TOTAL LENGTH: 300-450 words maximum
- Be direct and specific
- Professional reviewer tone
- Cite specific issues with evidence
- Prioritize actionable feedback
""",
    output_key="research_report"
)


In [111]:
# Create the Sequential Agent to agregate read PDF, then run workflow
agents = SequentialAgent(
    name="ResearchWorkflowAgent",
    sub_agents=[pdf_reader_agent, parallel_research_team, research_aggregator],
)

In [112]:
# pdf_file can be changed to analyze different documents
# the file could be in /Users/admin/Downloads for example
string_pdf_file = "document.pdf"

In [113]:
import nest_asyncio
nest_asyncio.apply()

from google.genai.types import Content, Part
import asyncio

runner = InMemoryRunner(agent=agents, app_name="agents")

async def run_analysis_async(pdf_file=string_pdf_file):
    """Async function to run the analysis using run_debug"""
    try:
        print("üöÄ Starting Multi-Agent Analysis...")
        print(f"üìÑ Analyzing: {pdf_file}")
        print("=" * 80)
        
        # Use run_debug - the filename is in the message
        result = await runner.run_debug(
            f"Analyse {pdf_file} and provide a comprehensive summary of the key findings and methodology."
        )
        
        # Extract text from result
        final_text = None
        if result:
            # Check if result is a string
            if isinstance(result, str):
                final_text = result
            # Check if it has a text attribute
            elif hasattr(result, 'text'):
                final_text = result.text
            # Check if it's a Content object with parts
            elif hasattr(result, 'parts') and result.parts:
                final_text = result.parts[0].text
            # Try to convert to string
            else:
                final_text = str(result)
        
        # Print the result
        print("=" * 80)
        if final_text:
            #print("üìä FINAL RESEARCH REPORT")
            #print("=" * 80)
            #print(final_text)
            print("=" * 80)
            #print("‚úÖ Analysis Complete!")
        else:
            print("‚ö†Ô∏è No response received")
            print(f"Debug - Result type: {type(result)}")
            print(f"Debug - Result: {result}")
        
        return final_text
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback
        traceback.print_exc()
        return None

# Change the pdf_file parameter to analyze different PDFs
try:
    loop = asyncio.get_running_loop()
    # We're in a running loop (Jupyter), use nest_asyncio
    result = await run_analysis_async(pdf_file=string_pdf_file)  # Change filename here
except RuntimeError:
    # No running loop, create one
    result = asyncio.run(run_analysis_async(pdf_file=string_pdf_file))  # Change filename here


üöÄ Starting Multi-Agent Analysis...
üìÑ Analyzing: document.pdf

 ### Created new session: debug_session_id

User > Analyse document.pdf and provide a comprehensive summary of the key findings and methodology.
    üîé [Tool] Searching PDF 'document.pdf' for: 'introduction OR abstract OR methodology OR methods OR results OR findings OR conclusion OR summary OR contributions'
PDFReader > **Main Topic**: This paper focuses on advancing the automatic processing of the Amazigh language, specifically through its morphological annotation (tagging) using the multilevel annotation tool AnCoraPipe. The overarching goal is to equip the Amazigh language with essential processing tools, addressing the scarcity of resources for non-European languages in Natural Language Processing (NLP).

**Key Contributions**:
*   **Presentation of Amazigh Language Features**: The paper aims to delineate the distinctive features of the Amazigh language pertinent to morphological annotation.
*   **Adaptation of 