# AI Agent for Market Research & Competitive Analysis

This notebook implements an AI agent designed to automate market research. The agent can take a high-level query about a company or technology, search the web for relevant articles, analyze the content, and generate a concise business brief.

**Project Goal:** To build a portfolio project demonstrating the practical application of LLMs, Retrieval-Augmented Generation (RAG), and AI agent tool use for a real-world business problem.

**Core Components:**
1.  **Agent Framework:** LangChain
2.  **LLM (for generation):** Google's `gemini-pro` (accessible via free API key)
3.  **Embedding Model:** `all-MiniLM-L6-v2` (Open-source from Hugging Face)
4.  **Vector Store:** FAISS (In-memory, local, and free)
5.  **Tools:**
    * Custom Google Search Tool
    * Web Scraper Tool
    * Yahoo Finance Tool

## Step 1: Install Dependencies

First, we install all the required Python libraries from our `requirements.txt` file hosted on GitHub. This ensures our Colab environment is set up correctly.

In [None]:
# IMPORTANT: Make sure your requirements.txt is in your GitHub repo!
# Replace 'your-github-username' with your actual GitHub username.
!pip install -q -r https://raw.githubusercontent.com/your-github-username/ai-agent-moat/main/requirements.txt

## Step 2: Securely Set Up API Keys

We need API keys for Google services. We'll use Colab's built-in **Secrets Manager** to handle these securely. This is the best practice and prevents you from ever exposing your keys in the notebook.

**Instructions:**
1.  Click the **key icon (🔑)** in the left sidebar of Colab.
2.  Click **"Add a new secret"**.
3.  Create a secret with the name `GOOGLE_API_KEY` and paste your Google AI Studio API key as the value.
4.  Create another secret named `GOOGLE_CSE_ID` with your Custom Search Engine ID.

You can get a Google API Key from [Google AI Studio](https://aistudio.google.com/app/apikey) and set up a Custom Search Engine [here](https://programmablesearchengine.google.com/controlpanel/all) to get a Search Engine ID.

In [None]:
import os
from google.colab import userdata

# Securely access the API keys
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
GOOGLE_CSE_ID = userdata.get('GOOGLE_CSE_ID')

# Set environment variables for LangChain
os.environ['GOOGLE_API_KEY'] = GOOGLE_API_KEY

## Step 3: Define the Agent's Tools

An agent's power comes from its tools. We will create three tools:
1.  **Web Search Tool:** To find relevant articles and sources.
2.  **Web Scraper Tool:** To extract the actual content from the URLs found by the search tool.
3.  **Yahoo Finance Tool:** To fetch key financial metrics for public companies.

In [None]:
import requests
from bs4 import BeautifulSoup
import yfinance as yf
from langchain.agents import tool
from langchain_community.utilities import GoogleSearchAPIWrapper

# Tool 1: Google Search Tool
# We will use this object directly in the agent now.
search = GoogleSearchAPIWrapper(google_cse_id=GOOGLE_CSE_ID, google_api_key=GOOGLE_API_KEY)

@tool
def web_search(query: str) -> str:
    """Performs a web search using Google and returns the results. NOTE: This tool is now used indirectly by the agent's main logic."""
    return search.run(query)

# Tool 2: Web Scraper Tool
@tool
def scrape_website(url: str) -> str:
    """Scrapes the text content of a given website URL."""
    try:
        response = requests.get(url, timeout=10)
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')
            # Extract text and clean it up
            text = ' '.join(p.get_text() for p in soup.find_all('p'))
            return text[:4000] # Return first 4000 characters to avoid token limits
        return f"Error: Received status code {response.status_code}"
    except requests.RequestException as e:
        return f"Error: Could not access the URL. {e}"

# Tool 3: Yahoo Finance Tool
@tool
def get_stock_info(ticker: str) -> str:
    """Fetches key financial information for a given stock ticker using Yahoo Finance."""
    try:
        stock = yf.Ticker(ticker)
        info = stock.info
        # Extracting a few key metrics
        market_cap = info.get('marketCap', 'N/A')
        trailing_pe = info.get('trailingPE', 'N/A')
        forward_pe = info.get('forwardPE', 'N/A')
        long_business_summary = info.get('longBusinessSummary', 'N/A')
        return f"Market Cap: {market_cap}\nTrailing P/E: {trailing_pe}\nForward P/E: {forward_pe}\nBusiness Summary: {long_business_summary}"
    except Exception as e:
        return f"Error fetching stock info for {ticker}: {e}"

## Step 4: Set Up the RAG Pipeline (In-Memory)

This is the core of our project. We will create a `RagAgent` class that encapsulates the logic for Retrieval-Augmented Generation.

**How it works:**
1.  The agent uses the `web_search` tool to find URLs.
2.  It then uses the `scrape_website` tool to get the text from those URLs.
3.  All the scraped text is combined into a single corpus.
4.  We use our open-source embedding model (`all-MiniLM-L6-v2`) to convert the text into vectors.
5.  These vectors are indexed into a temporary **FAISS** vector store.
6.  This vector store is then used to create a **retrieval chain**, allowing the agent to "chat" with the documents it just found.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI

class RagAgent:
    def __init__(self, llm, embeddings_model):
        self.llm = llm
        self.embeddings_model = embeddings_model
        self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
        self.retriever = None
        self.retrieval_chain = None
        # <<< CHANGE: We will use the search object directly, not the tool decorator
        self.search_wrapper = GoogleSearchAPIWrapper(google_cse_id=GOOGLE_CSE_ID, google_api_key=GOOGLE_API_KEY)

    def _create_rag_pipeline(self, text_corpus):
        """Creates a RAG pipeline from a given text corpus."""
        print("\nStep 1: Splitting documents...")
        docs = self.text_splitter.split_text(text_corpus)
        
        print("Step 2: Creating FAISS vector store...")
        vector_store = FAISS.from_texts(texts=docs, embedding=self.embeddings_model)
        self.retriever = vector_store.as_retriever()

        print("Step 3: Creating retrieval chain...")
        system_prompt = (
            "You are an expert market research analyst. Use the following retrieved context "
            "to answer the user's question. If you don't know the answer, say you don't know. "
            "Provide a detailed, well-structured answer based only on the context provided."
            "\n\n{context}"
        )
        prompt = ChatPromptTemplate.from_messages([
            ("system", system_prompt),
            ("human", "{input}"),
        ])
        
        question_answer_chain = create_stuff_documents_chain(self.llm, prompt)
        self.retrieval_chain = create_retrieval_chain(self.retriever, question_answer_chain)
        print("RAG pipeline is ready.")

    def run(self, query):
        print(f"Executing query: {query}")
        
        # <<< CHANGE: Create a simpler, keyword-focused search query.
        # For this example, we'll extract the core entity from the user's query.
        # A more advanced agent could use an LLM call to generate a better query.
        search_query = f"market analysis and recent news for {query.split('for ')[-1].split('.')[0]}"
        print(f"Generated search query: {search_query}")

        # <<< CHANGE: Use .results() to get structured data, not .run()
        search_results = self.search_wrapper.results(search_query, num_results=5)
        
        if not search_results:
            return "The web search returned no results. Please try a different query."
        
        # <<< CHANGE: Extract links from the structured results
        urls = [result['link'] for result in search_results if 'link' in result]
        print(f"Found {len(urls)} URLs.")

        # Scrape content from the top 3 URLs
        scraped_content = []
        for url in urls[:3]:
            print(f"Scraping {url}...")
            content = scrape_website.run(url)
            if content and not content.startswith("Error"):
                scraped_content.append(content)
        
        if not scraped_content:
            return "Could not retrieve any content from the web. Please try another query."
        
        full_text = "\n\n---\n\n".join(scraped_content)
        self._create_rag_pipeline(full_text)
        
        print("\nSynthesizing final answer...")
        response = self.retrieval_chain.invoke({"input": query})
        
        return response['answer']

# Initialize models
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest")
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create the agent
market_research_agent = RagAgent(llm=llm, embeddings_model=embeddings)

## Step 5: Run the Agent and Generate a Business Brief

Now it's time to test our agent. Let's give it a complex query and see what kind of analysis it can generate.

In [None]:
user_query = "Generate a market analysis for NVIDIA (NVDA). Identify key growth drivers and summarize recent news."

final_brief = market_research_agent.run(user_query)

print("\n--- FINAL BUSINESS BRIEF ---\n")
print(final_brief)

## Bonus: Using the Yahoo Finance Tool

We can also use our tools individually. Let's quickly get some financial data for another company.

In [None]:
amd_info = get_stock_info.run("AMD")
print("--- AMD Financial Info ---")
print(amd_info)