# 🤖 AI-Powered Research Assistant

**An intelligent multi-agent system that automates the entire research process:**

- 🔎 **Scout Agent**: Automatically finds the most relevant research paper
- ⚙️ **Analyst Agent**: Downloads, processes, and creates a searchable knowledge base
- 🧠 **AI-Powered Knowledge Base**: Uses Google Gemini + ChromaDB for intelligent storage
- 💬 **Interactive Q&A**: Ask any questions about the paper content
- 📋 **Auto-Summary**: Generates comprehensive paper analysis

**Technologies**: LangChain + Google Gemini + ChromaDB + SerpAPI

In [1]:
# 🔧 Setup API Keys
import os

os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY"
os.environ["SERPAPI_API_KEY"] = "YOUR_SERPAPI_API_KEY"
print("✅ API Keys configured!")

✅ API Keys configured!


In [2]:
# 📦 Import Required Libraries
import requests
import re
import json
from typing import List, Dict, Optional
from langchain_community.utilities import SerpAPIWrapper
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA
from langchain.schema import HumanMessage, SystemMessage

print("✅ Libraries imported successfully!")

✅ Libraries imported successfully!


In [3]:
# 🧠 Initialize Core AI Components
class AIResearchAssistant:
    def __init__(self):
        self.embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
        self.llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro", temperature=0.1)
        self.search_tool = SerpAPIWrapper(params={"engine": "google_scholar", "hl": "en"})
        self.vectorstore = None
        self.current_paper_info = None
        
        print("🤖 AI Research Assistant initialized!")
        print("   - Google Gemini LLM: Ready")
        print("   - Embeddings Model: Ready")
        print("   - Google Scholar Search: Ready")

# Initialize the assistant
assistant = AIResearchAssistant()

🤖 AI Research Assistant initialized!
   - Google Gemini LLM: Ready
   - Embeddings Model: Ready
   - Google Scholar Search: Ready


In [9]:
# 🔎 ArXiv Scout Agent: Reliable Paper Discovery
class ArXivScoutAgent:
    def __init__(self, assistant):
        self.assistant = assistant
        self.llm = assistant.llm
    
    def find_best_paper(self, topic: str) -> Dict:
        """ArXiv scout agent finds papers with guaranteed PDF access"""
        print(f"🔎 ArXiv Scout: Searching for papers on '{topic}'...")
        
        try:
            # Search ArXiv directly
            search = arxiv.Search(
                query=topic,
                max_results=5,
                sort_by=arxiv.SortCriterion.Relevance
            )
            
            papers = []
            print("   📚 Found papers:")
            
            for i, result in enumerate(search.results()):
                paper = {
                    "title": result.title.strip(),
                    "authors": ", ".join([author.name for author in result.authors[:3]]),
                    "summary": result.summary[:200].strip() + "...",
                    "pdf_url": result.pdf_url,
                    "published": str(result.published.date()),
                    "categories": ", ".join(result.categories)
                }
                papers.append(paper)
                print(f"      {i+1}. {paper['title'][:60]}...")
            
            if not papers:
                print("❌ No papers found on ArXiv for this topic")
                return self.get_fallback_paper(topic)
            
            # Select the most relevant paper (first one)
            selected_paper = papers[0]
            
            paper_info = {
                "title": selected_paper["title"],
                "authors": selected_paper["authors"],
                "url": selected_paper["pdf_url"],
                "reason": f"Most relevant ArXiv paper. Published: {selected_paper['published']}"
            }
            
            print(f"✅ Selected: {paper_info['title'][:60]}...")
            print(f"   Authors: {paper_info['authors']}")
            print(f"   PDF URL: {paper_info['url']}")
            
            return paper_info
            
        except Exception as e:
            print(f"❌ ArXiv search error: {e}")
            return self.get_fallback_paper(topic)
    
    def get_fallback_paper(self, topic: str) -> Dict:
        """Fallback to known working papers"""
        print("🔄 Using fallback paper selection...")
        
        fallback_papers = {
            "machine learning": {
                "title": "Attention Is All You Need",
                "authors": "Vaswani et al.",
                "url": "https://arxiv.org/pdf/1706.03762.pdf",
                "reason": "Foundational transformer paper - highly relevant to ML"
            },
            "artificial intelligence": {
                "title": "BERT: Pre-training of Deep Bidirectional Transformers",
                "authors": "Devlin et al.",
                "url": "https://arxiv.org/pdf/1810.04805.pdf",
                "reason": "Influential AI language model paper"
            },
            "finance": {
                "title": "Deep Learning for Finance: Deep Portfolios",
                "authors": "Heaton et al.",
                "url": "https://arxiv.org/pdf/1605.07230.pdf",
                "reason": "AI applications in finance"
            }
        }
        
        # Find best match
        topic_lower = topic.lower()
        for key, paper in fallback_papers.items():
            if key in topic_lower or any(word in topic_lower for word in key.split()):
                print(f"✅ Fallback: {paper['title']}")
                return paper
        
        # Default fallback
        default_paper = fallback_papers["machine learning"]
        print(f"✅ Default fallback: {default_paper['title']}")
        return default_paper

# Replace the original scout with ArXiv scout
scout = ArXivScoutAgent(assistant)
print("🔎 ArXiv Scout Agent: Ready to find papers with guaranteed PDF access!")


🔎 ArXiv Scout Agent: Ready to find papers with guaranteed PDF access!


In [10]:
# ⚙️ Analyst Agent: Intelligent Document Processing
class AnalystAgent:
    def __init__(self, assistant):
        self.assistant = assistant
        self.embeddings = assistant.embeddings
    
    def process_paper(self, paper_info: Dict) -> bool:
        """Analyst agent downloads and processes the paper into a knowledge base"""
        print(f"⚙️ Analyst Agent: Processing paper - {paper_info['title']}")
        
        try:
            # Download the paper
            print("   📥 Downloading PDF...")
            response = requests.get(paper_info['url'], timeout=30)
            filename = "research_paper.pdf"
            
            with open(filename, "wb") as f:
                f.write(response.content)
            
            # Load and extract text
            print("   📄 Extracting text from PDF...")
            loader = PyPDFLoader(filename)
            documents = loader.load()
            
            # Split into chunks for better processing
            print("   ✂️ Splitting document into chunks...")
            text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=1000,
                chunk_overlap=200,
                separators=["\n\n", "\n", ".", " ", ""]
            )
            splits = text_splitter.split_documents(documents)
            
            # Create AI-powered knowledge base
            print("   🧠 Creating AI knowledge base with embeddings...")
            self.assistant.vectorstore = Chroma.from_documents(
                documents=splits,
                embedding=self.embeddings,
                persist_directory="./research_knowledge_base",
                collection_name="research_paper"
            )
            
            # Store paper info
            self.assistant.current_paper_info = paper_info
            
            # Cleanup
            os.remove(filename)
            
            print(f"✅ Analyst Agent: Successfully processed {len(splits)} chunks")
            print("   🎯 Knowledge base ready for Q&A!")
            return True
            
        except Exception as e:
            print(f"❌ Analyst Agent Error: {e}")
            return False

# Initialize Analyst Agent
analyst = AnalystAgent(assistant)
print("⚙️ Analyst Agent: Ready to process papers!")

⚙️ Analyst Agent: Ready to process papers!


In [11]:
# 💬 Interactive Q&A System
class QASystem:
    def __init__(self, assistant):
        self.assistant = assistant
        self.llm = assistant.llm
    
    def ask_question(self, question: str) -> str:
        """Ask any question about the processed paper"""
        if not self.assistant.vectorstore:
            return "❌ No paper has been processed yet. Please run the research pipeline first."
        
        print(f"❓ Question: {question}")
        
        # Create retrieval-augmented generation chain
        qa_chain = RetrievalQA.from_chain_type(
            llm=self.llm,
            chain_type="stuff",
            retriever=self.assistant.vectorstore.as_retriever(
                search_kwargs={"k": 4}  # Retrieve top 4 relevant chunks
            ),
            return_source_documents=True
        )
        
        result = qa_chain({"query": question})
        
        print("✅ Answer generated from paper content!")
        return result["result"]
    
    def generate_summary(self) -> Dict[str, str]:
        """Generate comprehensive paper summary"""
        if not self.assistant.vectorstore:
            return {"error": "No paper processed yet"}
        
        print("📋 Generating comprehensive paper summary...")
        
        summary_questions = {
            "Research Objective": "What is the main research question and objective of this paper?",
            "Methodology": "What methodology and approach was used in this research?",
            "Key Findings": "What are the key findings and main results of this study?",
            "Conclusions": "What are the main conclusions and implications of this research?",
            "Future Work": "What limitations are mentioned and what future work is suggested?",
            "Applications": "What are the potential real-world applications of this research?"
        }
        
        summary = {}
        for section, question in summary_questions.items():
            print(f"   📝 Analyzing: {section}...")
            answer = self.ask_question(question)
            summary[section] = answer
        
        print("✅ Comprehensive summary generated!")
        return summary

# Initialize Q&A System
qa_system = QASystem(assistant)
print("💬 Q&A System: Ready for interactive questions!")

💬 Q&A System: Ready for interactive questions!


In [12]:
# 🚀 Main Research Pipeline
def run_research_pipeline(topic: str):
    """Complete automated research pipeline"""
    print(f"🚀 Starting AI Research Pipeline for: '{topic}'")
    print("=" * 60)
    
    # Step 1: Scout finds the best paper
    paper_info = scout.find_best_paper(topic)
    if not paper_info:
        print("❌ Pipeline failed: Could not find suitable paper")
        return False
    
    print("\n" + "=" * 60)
    
    # Step 2: Analyst processes the paper
    success = analyst.process_paper(paper_info)
    if not success:
        print("❌ Pipeline failed: Could not process paper")
        return False
    
    print("\n" + "=" * 60)
    print("🎉 Research Pipeline Complete!")
    print(f"📚 Paper: {paper_info['title']}")
    print(f"👥 Authors: {paper_info['authors']}")
    print("💬 Ready for Q&A and summary generation!")
    
    return True

print("🚀 Research Pipeline: Ready to run!")

🚀 Research Pipeline: Ready to run!


## 🎯 Run the Complete Research Pipeline

**This will automatically:**
1. Find the most relevant research paper
2. Download and process it into a knowledge base
3. Make it ready for Q&A and summary generation

In [13]:
# 🎯 Run the Pipeline - Change the topic to whatever you want to research!
research_topic = "Artificial Intelligence in finance"

# Run the complete pipeline
success = run_research_pipeline(research_topic)

if success:
    print("\n🎉 SUCCESS! You can now:")
    print("   💬 Ask questions using: qa_system.ask_question('your question')")
    print("   📋 Generate summary using: qa_system.generate_summary()")
else:
    print("\n❌ Pipeline failed. Please check the errors above.")

🚀 Starting AI Research Pipeline for: 'Artificial Intelligence in finance'
🔎 ArXiv Scout: Searching for papers on 'Artificial Intelligence in finance'...
   📚 Found papers:


  for i, result in enumerate(search.results()):


      1. Impact of Artificial Intelligence on Economic Theory...
      2. Does an artificial intelligence perform market manipulation ...
      3. A Survey on Blockchain-based Supply Chain Finance with Progr...
      4. Artificial Intelligence in the Service of Entrepreneurial Fi...
      5. AI in Finance: Challenges, Techniques and Opportunities...
✅ Selected: Impact of Artificial Intelligence on Economic Theory...
   Authors: Tshilidzi Marwala
   PDF URL: http://arxiv.org/pdf/1509.01213v1

⚙️ Analyst Agent: Processing paper - Impact of Artificial Intelligence on Economic Theory
   📥 Downloading PDF...
   📄 Extracting text from PDF...
   ✂️ Splitting document into chunks...
   🧠 Creating AI knowledge base with embeddings...
✅ Analyst Agent: Successfully processed 11 chunks
   🎯 Knowledge base ready for Q&A!

🎉 Research Pipeline Complete!
📚 Paper: Impact of Artificial Intelligence on Economic Theory
👥 Authors: Tshilidzi Marwala
💬 Ready for Q&A and summary generation!

🎉 SUCCESS! You ca

## 💬 Interactive Q&A Session

**Ask any questions about the research paper!**

In [14]:
# 💬 Ask Questions About the Paper
question = "What is the main contribution of this research?"
answer = qa_system.ask_question(question)
print(f"\n🤖 Answer:\n{answer}")

❓ Question: What is the main contribution of this research?


  result = qa_chain({"query": question})


✅ Answer generated from paper content!

🤖 Answer:



In [15]:
# 💬 Ask Another Question
question = "What datasets were used in this study ?"
answer = qa_system.ask_question(question)
print(f"\n🤖 Answer:\n{answer}")

❓ Question: What datasets were used in this study ?
✅ Answer generated from paper content!

🤖 Answer:
Based on the context provided, there is no mention of any specific datasets that were used. The text is a general discussion about the applications and impact of artificial intelligence, referencing several books and papers, but it does not describe a particular study or its methodology.


In [16]:
# 💬 Ask Your Own Question
your_question = input("❓ Enter your question about the paper: ")
answer = qa_system.ask_question(your_question)
print(f"\n🤖 Answer:\n{answer}")

❓ Question: what significance did ai particularly improved in finance \
✅ Answer generated from paper content!

🤖 Answer:
Based on the provided text, artificial intelligence has had a significant impact on finance in the following ways:

*   **Modeling Financial Instruments:** AI has been applied to model financial instruments such as stock markets, derivatives, and options.
*   **Improving Market Efficiency:** AI challenges and influences the Efficient Market Hypothesis. While human traders are imperfect and have incomplete information, making markets less efficient, the introduction of AI-empowered computer traders helps overcome these limitations. The text states that the more AI traders there are, the more efficient the markets become.
*   **Challenging Economic Theories:** AI's role as a decision-maker questions the applicability of theories like Prospect Theory, which is based on human decision-making processes of weighing potential gains against losses. The theory's relevance is

## 📋 Generate Comprehensive Summary

**Get a detailed analysis of the entire paper**

In [17]:
# 📋 Generate Complete Paper Summary
summary = qa_system.generate_summary()

print("\n" + "=" * 200)
print(f"📋 COMPREHENSIVE SUMMARY: {assistant.current_paper_info['title']}")
print("=" * 200)

for section, content in summary.items():
    print(f"\n🔹 {section.upper()}:")
    print("-" * 40)
    print(content)
    print()

📋 Generating comprehensive paper summary...
   📝 Analyzing: Research Objective...
❓ Question: What is the main research question and objective of this paper?
✅ Answer generated from paper content!
   📝 Analyzing: Methodology...
❓ Question: What methodology and approach was used in this research?
✅ Answer generated from paper content!
   📝 Analyzing: Key Findings...
❓ Question: What are the key findings and main results of this study?
✅ Answer generated from paper content!
   📝 Analyzing: Conclusions...
❓ Question: What are the main conclusions and implications of this research?
✅ Answer generated from paper content!
   📝 Analyzing: Future Work...
❓ Question: What limitations are mentioned and what future work is suggested?
✅ Answer generated from paper content!
   📝 Analyzing: Applications...
❓ Question: What are the potential real-world applications of this research?
✅ Answer generated from paper content!
✅ Comprehensive summary generated!

📋 COMPREHENSIVE SUMMARY: Impact of Artificia

## 🎉 Congratulations!

You now have a fully functional AI Research Assistant that:

✅ **Automatically finds** the most relevant research paper  
✅ **Intelligently processes** it into a searchable knowledge base  
✅ **Enables interactive Q&A** with the paper content  
✅ **Generates comprehensive summaries** automatically  

### 🔄 To Research Another Topic:
Simply change the `research_topic` variable and run the pipeline again!

### 💡 Pro Tips:
- Ask specific questions for detailed answers
- The system uses the actual paper content (not general knowledge)
- All data is stored locally in ChromaDB for privacy
- You can ask follow-up questions anytime

In [None]:
try:
    import arxiv
    print("✅ arxiv library installed")
except ImportError:
    print("❌ Run: pip install arxiv")
