# AI Agent for Market Research & Competitive Analysis

This application serves as an interactive demo for a sophisticated AI research agent. Simply enter a company name and stock ticker to generate a multi-faceted analysis from four distinct investor perspectives.

## Step 1: API Key Setup

Before running this agent, you need two free API keys from Google. Follow these instructions carefully.

### 1. Get Your Google AI Studio API Key (`GOOGLE_API_KEY`)

1.  Go to **[Google AI Studio](https://aistudio.google.com/app/apikey)**.
2.  Click **"Create API key in new project"**.
3.  Copy the generated API key.

### 2. Get Your Google Programmable Search Engine Keys (`GOOGLE_CSE_ID`)

This is a two-part process to get a Search Engine ID and enable the API.

**Part A: Create the Search Engine**
1.  Go to the **[Programmable Search Engine control panel](https://programmablesearchengine.google.com/controlpanel/all)**.
2.  Click **"Add"** to create a new search engine.
3.  Name your search engine (e.g., "AI Agent Search").
4.  Crucially, select the option to **"Search the entire web"**.
5.  After it's created, go to the "Basics" tab and find the **"Search engine ID"**. Copy this ID.

**Part B: Enable the Custom Search API**
1.  Go to the **[Google Cloud Console API Library](https://console.cloud.google.com/apis/library/customsearch.googleapis.com)**.
2.  Ensure the project selected in the top navigation bar is the same one you created for your Google AI Studio key.
3.  Click the **"Enable"** button. If it's already enabled, you're all set.

### 3. Add Keys to Colab Secrets Manager

1.  In this notebook, click the **key icon (🔑)** in the left sidebar.
2.  Create a new secret named `GOOGLE_API_KEY` and paste your Google AI Studio key.
3.  Create another new secret named `GOOGLE_CSE_ID` and paste your Search Engine ID.

## Step 2: Install Dependencies & Setup Environment

This cell installs the required libraries, clones the project repository from GitHub to make the custom modules available, and sets up the necessary API keys for the agent to function.

In [1]:
import importlib
import os
import sys
import requests
import re
import subprocess
from google.colab import userdata

def check_and_install_dependencies():
    requirements_url = 'https://raw.githubusercontent.com/eriktaylor/ai-agent-moat/main/requirements.txt'
    try:
        response = requests.get(requirements_url)
        response.raise_for_status()
        requirements = response.text.splitlines()

        import_name_map = {
            'faiss-cpu': 'faiss',
            'PyMuPDF': 'fitz',
            'google-api-python-client': 'googleapiclient',
            'beautifulsoup4': 'bs4',
            'sentence-transformers': 'sentence_transformers'
        }

        missing_packages = False
        for req in requirements:
            if req.strip() and not req.startswith('#'):
                package_name = re.split('[<>=~]=', req)[0].strip()
                import_name = import_name_map.get(package_name, package_name.replace('-', '_'))
                if not importlib.util.find_spec(import_name):
                    missing_packages = True
                    break

        if missing_packages:
            print("Installing dependencies from requirements.txt...")
            !pip install -q -r {requirements_url}
            print("Installation complete.")
        else:
            print("All dependencies are already installed.")

    except requests.RequestException as e:
        print(f"Error fetching requirements.txt: {e}")
        print("Proceeding with default installation...")
        !pip install -q -r {requirements_url}

check_and_install_dependencies()

repo_path = 'ai-agent-moat'
if os.path.exists(repo_path):
    print("Repository already exists. Pulling latest changes...")
    try:
        result = subprocess.run(['git', '-C', repo_path, 'pull'], capture_output=True, text=True, check=True)
        print(result.stdout)
        if 'Already up to date.' not in result.stdout:
            print("\nIMPORTANT: New updates were pulled from GitHub. Please restart the runtime to ensure all changes are loaded correctly (Runtime > Restart session).")
    except subprocess.CalledProcessError as e:
        print(f"Error pulling repository: {e.stderr}")
else:
    print(f"Cloning repository...")
    !git clone https://github.com/eriktaylor/ai-agent-moat.git

if repo_path not in sys.path:
    sys.path.append(repo_path)
    print(f"Added {repo_path} to system path.")

os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')
os.environ['GOOGLE_CSE_ID'] = userdata.get('GOOGLE_CSE_ID')

All dependencies are already installed.
Repository already exists. Pulling latest changes...
Already up to date.

Added ai-agent-moat to system path.


## Step 3: Candidate Generation

In [2]:
# --- Step 3: Interactive Quantitative Screening & Candidate Generation ---
import pandas as pd
import yfinance as yf
from tqdm.notebook import tqdm
import os
import shutil
from google.colab import drive
import ipywidgets as widgets
from IPython.display import display, clear_output

# --- Part 1: Helper Functions (Largely unchanged) ---

def get_sp500_tickers():
    """Fetches S&P 500 holdings from the SSGA website."""
    # (This function is the same as before)
    print("Fetching S&P 500 tickers from SSGA...")
    try:
        url = 'https://www.ssga.com/us/en/intermediary/etfs/library-content/products/fund-data/etfs/us/holdings-daily-us-en-spy.xlsx'
        df = pd.read_excel(url, engine='openpyxl', skiprows=4).dropna(subset=['Ticker'])
        tickers = [str(ticker).replace(' ', '-') for ticker in df['Ticker'].tolist() if isinstance(ticker, str)]
        print(f"✅ Successfully fetched {len(tickers)} tickers.")
        return tickers
    except Exception as e:
        print(f"❌ Error fetching S&P 500 tickers: {e}")
        return []

def fetch_financial_data(tickers):
    """Fetches key financial data for a list of stock tickers."""
    # (This function is the same as before)
    all_stock_data = []
    print(f"Fetching financial data for {len(tickers)} stocks... (This may take 10-15 minutes)")
    pbar = tqdm(total=len(tickers), desc="Fetching data")
    for ticker in tickers:
        try:
            stock = yf.Ticker(ticker)
            info = stock.info
            if 'marketCap' in info and info['marketCap'] is not None:
                all_stock_data.append(info)
        except Exception:
            continue
        finally:
            pbar.update(1)
    pbar.close()
    print(f"\n✅ Successfully fetched data for {len(all_stock_data)} stocks.")
    return pd.DataFrame(all_stock_data)

def screen_and_score_stocks(df, weights):
    """Screens and scores a DataFrame of stocks based on user-defined weights."""
    # (This function is the same as before)
    # --- Data Cleaning and Metric Calculation ---
    if 'symbol' not in df.columns:
        df.reset_index(inplace=True)
    df.set_index('symbol', inplace=True, drop=False)
    df['value_pe'] = df['trailingPE'].apply(lambda x: x if x > 0 else None)
    df['growth_rev'] = df['revenueGrowth'].fillna(0) * 100
    df['momentum_52w'] = df['currentPrice'] / df['fiftyTwoWeekHigh']

    # --- Scoring via Percentile Ranks ---
    df['value_score'] = df['value_pe'].rank(ascending=False, pct=True)
    df['growth_score'] = df['growth_rev'].rank(pct=True)
    df['momentum_score'] = df['momentum_52w'].rank(pct=True)

    # --- Composite Score ---
    df['composite_score'] = (df['value_score'] * weights['value'] +
                             df['growth_score'] * weights['growth'] +
                             df['momentum_score'] * weights['momentum'])

    # --- Final Ranking ---
    ranked_df = df[[
        'longName', 'value_pe', 'growth_rev', 'momentum_52w', 'composite_score'
    ]].copy().dropna()
    ranked_df.sort_values(by='composite_score', ascending=False, inplace=True)
    return ranked_df

# --- Part 2: Main Application Logic and Widgets ---

# Define file paths
local_cache_path = 'sp500_financial_data.csv'
drive_cache_path = '/content/drive/My Drive/sp500_financial_data.csv'

# Create a global variable to hold our main dataframe
financial_df = None

# Create UI Widgets
style = {'description_width': 'initial'}
use_drive_cb = widgets.Checkbox(value=True, description='Use Google Drive for Cache', style=style)
force_refresh_cb = widgets.Checkbox(value=False, description='Force Refresh (ignore all cache)')
load_button = widgets.Button(description="▶️ Load/Refresh", button_style='success')
log_output = widgets.Output()

value_slider = widgets.FloatSlider(value=0.4, min=0, max=1.0, step=0.05, description='Value Weight:', style=style)
growth_slider = widgets.FloatSlider(value=0.3, min=0, max=1.0, step=0.05, description='Growth Weight:', style=style)
momentum_slider = widgets.FloatSlider(value=0.3, min=0, max=1.0, step=0.05, description='Momentum Weight:', style=style)

# Function for the "Load Data" button
def load_data_and_run(b):
    global financial_df
    with log_output:
        clear_output(wait=True)
        # 1. Check for cache in Google Drive first if requested
        if use_drive_cb.value and not force_refresh_cb.value:
            print("Trying to load cache from Google Drive...")
            drive.mount('/content/drive', force_remount=True)
            if os.path.exists(drive_cache_path):
                shutil.copyfile(drive_cache_path, local_cache_path)
                print(f"✅ Copied cache from Google Drive to local session.")

        # 2. Decide whether to fetch fresh data
        if force_refresh_cb.value or not os.path.exists(local_cache_path):
            tickers = get_sp500_tickers()
            if tickers:
                df = fetch_financial_data(tickers)
                df.to_csv(local_cache_path)
                print(f"\n💾 Saved fresh data to local cache: '{local_cache_path}'")
                if use_drive_cb.value:
                    df.to_csv(drive_cache_path)
                    print(f"💾 Saved fresh data to Google Drive: '{drive_cache_path}'")
        else:
            print(f"📂 Using existing cache file.")

        # 3. Load the dataframe into the global variable for interactive use
        financial_df = pd.read_csv(local_cache_path)
        print("Data is loaded and ready for interactive screening.")

load_button.on_click(load_data_and_run)

# Function to handle the interactive screening triggered by sliders
def interactive_screening(value_w, growth_w, momentum_w):
    global top_20_tickers
    if financial_df is not None:
        user_weights = {'value': value_w, 'growth': growth_w, 'momentum': momentum_w}
        ranked_df = screen_and_score_stocks(financial_df.copy(), user_weights)

        top_20_tickers = ranked_df.head(20).index.tolist()

        # Display the results table
        display(ranked_df.head(20))
    else:
        print("Adjust the bars to begin screening the candidates.")

# Link the interactive function to the sliders' output
interactive_ui = widgets.interactive_output(interactive_screening, {
    'value_w': value_slider,
    'growth_w': growth_slider,
    'momentum_w': momentum_slider
})

# Display the final UI layout
print("--- Quantitative Screener for Candidate Generation ---")
print("First, click 'Load/Refresh'. Then, adjust sliders to see candidates.")
display(
    widgets.VBox([
        widgets.HBox([load_button, use_drive_cb, force_refresh_cb]),
        log_output,
        widgets.HBox([value_slider, growth_slider, momentum_slider]),
        interactive_ui
    ])
)

--- Quantitative Screener for Candidate Generation ---
First, click 'Load/Refresh'. Then, adjust sliders to see candidates.


VBox(children=(HBox(children=(Button(button_style='success', description='▶️ Load/Refresh', style=ButtonStyle(…

## Step 4: Import and Initialize the Agent

Now we import our custom-built agent and tools. The heavy lifting and complex logic are handled in the background by our `research_agent_colab.py` and `tools_colab.py` files.

In [3]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_huggingface import HuggingFaceEmbeddings
from research_agent_colab import ResearchAgent
from tools_colab import get_stock_info

# Initialize models
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0.2)
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Instantiate the agent
research_agent = ResearchAgent(llm=llm, embeddings_model=embeddings)

print("AI Research Agent is initialized and ready.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


AI Research Agent is initialized and ready.


  self.search_wrapper = GoogleSearchAPIWrapper(


#Step 5: Run the scout agent to narrow down the results.

In [5]:
# --- Step 4: Scout Agent - Qualitative Triage (Improved Version) ---
import json
import pandas as pd
from tqdm.notebook import tqdm
import ipywidgets as widgets
from IPython.display import display, clear_output
import warnings

# Suppress common FutureWarnings from torch/huggingface to clean up the output
warnings.filterwarnings("ignore", category=FutureWarning, module="torch.nn.modules.module")

# UI Widgets
scout_button = widgets.Button(description="🔍 Run Scout Agent on Top 20", button_style='primary')
scout_output = widgets.Output()

def run_scout_process(b):
    with scout_output:
        clear_output(wait=True)

        if 'top_20_tickers' not in globals() or not top_20_tickers:
            print("❌ Error: No tickers found. Please run the quantitative screener first.")
            return

        print(f"🕵️ Running Scout Agent on {len(top_20_tickers)} candidates...")
        scout_results = []
        pbar = tqdm(total=len(top_20_tickers), desc="Scouting candidates")

        for ticker in top_20_tickers:
            company_name = financial_df[financial_df['symbol'] == ticker]['longName'].iloc[0]
            result = research_agent.generate_scout_analysis(company_name, ticker)

            # This try/except block is now more robust to prevent the KeyError
            try:
                # Clean the response and load as JSON
                json_str = result['answer'].strip().replace("```json", "").replace("```", "")
                json_data = json.loads(json_str)
                json_data['ticker'] = ticker
                json_data['longName'] = company_name
                scout_results.append(json_data)
            except (json.JSONDecodeError, TypeError, KeyError):
                # If JSON fails or keys are missing, append a default error record
                scout_results.append({
                    'ticker': ticker,
                    'longName': company_name,
                    'compelling_score': 0,
                    'news_summary': 'Error parsing LLM response or search failed.',
                    'positive_catalyst': False,
                    'negative_catalyst': False
                })

            pbar.update(1)
        pbar.close()

        # Process and display results
        results_df = pd.DataFrame(scout_results)
        results_df.sort_values(by='compelling_score', ascending=False, inplace=True)

        print("\n--- ✅ Top Scouted Candidates ---")
        # Define columns to ensure they always exist, preventing KeyError
        display_cols = ['ticker', 'longName', 'compelling_score', 'news_summary', 'positive_catalyst', 'negative_catalyst']
        display(results_df[display_cols])

        # Store final candidates
        global final_candidates
        final_candidates = results_df.head(5)['ticker'].tolist()
        print(f"\n🏆 Top 5 candidates selected for deep-dive: {', '.join(final_candidates)}")

scout_button.on_click(run_scout_process)

# Display the UI
print("\n--- Scout Agent Controls ---")
print("This step runs a lightweight search (1 query per stock) to rank candidates by news relevance.")
display(scout_button, scout_output)


--- Scout Agent Controls ---
This step runs a lightweight search (1 query per stock) to rank candidates by news relevance.


Button(button_style='primary', description='🔍 Run Scout Agent on Top 20', style=ButtonStyle())

Output()

## Step 6: Run the Final Analysis

In [9]:
# --- Step 5: Final Deep-Dive Analysis on a Top Candidate ---
import ipywidgets as widgets
from IPython.display import display, clear_output
from tqdm.notebook import tqdm
from display_utils_colab import display_analysis # <<< CHANGE: Import the missing display utility

# This cell uses the `final_candidates` list generated by the Scout Agent

# --- UI Widgets for the Final Analysis Step ---
deep_dive_button = widgets.Button(description="🚀 Run Deep-Dive Analysis on Top 5", button_style='info')
deep_dive_output = widgets.Output()

def run_deep_dive_process(b):
    with deep_dive_output:
        clear_output(wait=True)

        if 'final_candidates' not in globals() or not final_candidates:
            print("❌ Error: No final candidates found. Please run the Scout Agent in the cell above first.")
            return

        print(f"🚀 Kicking off deep-dive analysis for {len(final_candidates)} top candidates...")

        accordion_children = []
        company_titles = []

        for ticker in tqdm(final_candidates, desc="Deep-Diving Candidates"):
            company_name = financial_df[financial_df['symbol'] == ticker]['longName'].iloc[0]
            company_titles.append(f"{company_name} ({ticker})")

            company_output = widgets.Output()

            with company_output:
                print(f"Running full analysis for {company_name}...")
                print("Note: This is faster as it uses cached search results.")

                # --- 1. MARKET INVESTOR OUTLOOK ---
                market_outlook_result = research_agent.generate_market_outlook(company_name, ticker)
                display_analysis("Market Investor Outlook", company_name, market_outlook_result)

                # --- 2. VALUE INVESTOR ANALYSIS ---
                value_analysis_result = research_agent.generate_value_analysis(company_name, ticker)
                display_analysis("Value Investor Analysis", company_name, value_analysis_result)

                # --- 3. DEVIL'S ADVOCATE VIEW ---
                devils_advocate_result = research_agent.generate_devils_advocate_view(company_name, ticker)
                display_analysis("Devil's Advocate View", company_name, devils_advocate_result)

                # --- 4. FINAL CONSENSUS SUMMARY ---
                final_summary = research_agent.generate_final_summary(
                    market_outlook_result.get('answer', ''),
                    value_analysis_result.get('answer', ''),
                    devils_advocate_result.get('answer', '')
                )
                display_analysis("FINAL CONSENSUS SUMMARY", company_name, final_summary, is_summary=True)

            accordion_children.append(company_output)

        # Create and display the final accordion
        final_accordion = widgets.Accordion(children=accordion_children)
        for i, title in enumerate(company_titles):
            final_accordion.set_title(i, title)

        print("\n--- ✅ Final Analysis Complete ---")
        display(final_accordion)

deep_dive_button.on_click(run_deep_dive_process)

# Display the UI for this step
print("\n--- Deep-Dive Analysis Controls ---")
print("Click the button below to run the full, multi-perspective analysis on your top 5 scouted candidates.")
display(deep_dive_button, deep_dive_output)


--- Deep-Dive Analysis Controls ---
Click the button below to run the full, multi-perspective analysis on your top 5 scouted candidates.


Button(button_style='info', description='🚀 Run Deep-Dive Analysis on Top 5', style=ButtonStyle())

Output()

#Bonus: Run custom analysis
Enter a company name and its corresponding stock ticker below to begin the analysis.

In [11]:
from IPython.display import display, HTML, clear_output
# <<< CHANGE: Import the new display utility >>>
from display_utils_colab import display_analysis

company_name = input("Enter the company name (e.g., NVIDIA): ")
stock_ticker = input("Enter the stock ticker (e.g., NVDA): ")

clear_output(wait=True) # Clears the input prompts for a cleaner display

# To get fresh data and not use the cache, you can uncomment the next line:
# research_agent.clear_cache()

# --- 1. KEY FINANCIAL DATA ---
print(f"--- 1. KEY FINANCIAL DATA for {stock_ticker.upper()} ---")
financial_data_raw = get_stock_info.run(stock_ticker) if stock_ticker else "No ticker provided."
display(HTML(f"<div style='border: 1px solid #444; border-radius: 8px; padding: 20px; white-space: pre-wrap; font-family: monospace; line-height: 1.6; background-color: #2c2c2e; color: #f0f0f0;'>{financial_data_raw}</div>"))

# --- 2. AI-GENERATED MARKET INVESTOR OUTLOOK ---
market_outlook_result = research_agent.generate_market_outlook(company_name, stock_ticker)
display_analysis("2. AI-GENERATED MARKET INVESTOR OUTLOOK", company_name, market_outlook_result)

# --- 3. AI-GENERATED VALUE INVESTOR ANALYSIS ---
value_analysis_result = research_agent.generate_value_analysis(company_name, stock_ticker)
display_analysis("3. AI-GENERATED VALUE INVESTOR ANALYSIS", company_name, value_analysis_result)

# --- 4. AI-GENERATED DEVIL'S ADVOCATE VIEW ---
devils_advocate_result = research_agent.generate_devils_advocate_view(company_name, stock_ticker)
display_analysis("4. AI-GENERATED DEVIL'S ADVOCATE VIEW", company_name, devils_advocate_result)

# --- 5. FINAL CONSENSUS SUMMARY ---
final_summary = research_agent.generate_final_summary(
    market_outlook_result.get('answer', ''),
    value_analysis_result.get('answer', ''),
    devils_advocate_result.get('answer', '')
)
display_analysis("5. FINAL CONSENSUS SUMMARY", company_name, final_summary, is_summary=True)

--- 1. KEY FINANCIAL DATA for NVDA ---



Generating Market Investor Outlook...
--- Getting Financial Data for NVDA ---
Successfully collected financial data.
--- Tier 1: Official News & Analysis ---
Collected 3 headlines and snippets.
--- Tier 2: Critical News & Sentiment ---


HttpError: <HttpError 429 when requesting https://customsearch.googleapis.com/customsearch/v1?q=%22NVIDIA%22+issues+OR+concerns+OR+investigation+OR+recall+OR+safety+OR+%22short+interest%22&cx=665d5058e7e3a44b1&num=3&key=AIzaSyDNkKQzhE5iFpTHT3lGXKpjWl7V301zKjk&alt=json returned "Quota exceeded for quota metric 'Queries' and limit 'Queries per day' of service 'customsearch.googleapis.com' for consumer 'project_number:405584559381'.". Details: "[{'message': "Quota exceeded for quota metric 'Queries' and limit 'Queries per day' of service 'customsearch.googleapis.com' for consumer 'project_number:405584559381'.", 'domain': 'global', 'reason': 'rateLimitExceeded'}]">