# Financial Prompt Optimization Interface
## Explaining the "Financial Prompt Optimization Interface" Project

**Project Goal:** To create a system that takes a natural language financial analysis request and transforms it into a highly optimized, structured, and regulatory-compliant prompt, suitable for a specialized financial LLM. This system will enforce specific output formats and integrate with LangSmith for traceability and debugging.



### Phase 1: Establishing the Financial Intelligence Core: Environment, Rules, and Dynamic Entity Mapping

**What it is:**
This is the foundational stage of our project. It's where we set up our development environment, install necessary tools, define our core financial domain knowledge (like compliance rules and company mappings), and establish the initial "data backbone" that our AI system will rely on.

**Why it's needed:**
*   **Foundation:** Just like building a house, you need a strong foundation. This phase ensures all our tools are ready and our system understands the specific language and rules of finance.
*   **Domain Specificity:** Financial analysis isn't generic. It requires knowledge of regulations, company names, and specific financial tasks. This phase injects that crucial domain-specific intelligence.
*   **Dynamic Data:** Financial markets change! We need a way to look up information (like stock tickers) dynamically, not just rely on a static list.

**How it works (Code Meaning):**

1.  **`1.1. Environment Setup & Dependencies`**
    *   `!pip install ...`: This command installs all the Python libraries our project needs: `transformers` (often used with LLMs, though not directly in our current simple mock LLM setup), `langchain` (for building the AI pipeline), `langsmith` (for observability), `openai` (if we were to use a real OpenAI LLM), `requests` (for making API calls to FMP), and `pandas` (for making output tables nice).
    *   `import os, re, json, datetime, requests, pandas`: These lines bring in modules for interacting with the operating system (environment variables), regular expressions (pattern matching), JSON data handling, date/time operations, web requests, and data tables.
    *   `os.environ[...] = "..."`: This is critical. It sets up environment variables for API keys (`LANGCHAIN_API_KEY`, `FMP_API_KEY`) and LangSmith project name. These keys allow our system to communicate with external services (like LangSmith for tracing, and FMP for financial data).
    *   `from langsmith import traceable ...`: These import specific tools from LangSmith and LangChain for building our AI chain. `@traceable` is particularly important; it tells LangSmith to record the execution of the function it decorates.

2.  **`1.2. Data Discovery: Define Domain-Specific Knowledge Base & Rules`**
    *   **`COMPLIANCE_RULES`:** This is a Python dictionary that acts as our "rulebook."
        *   **Keys:** Represent the different types of financial analysis (e.g., `"stock_analysis"`, `"quarterly_report_summary"`).
        *   **Values:** Are themselves dictionaries containing:
            *   `"rule_name"`: The specific regulatory rule (e.g., "SEC Rule 13f"). This is crucial for compliance.
            *   `"description"`: A brief explanation of the rule.
            *   `"tasks_template"`: A list of specific sub-tasks the LLM *must* perform for this type of analysis. This ensures the output is comprehensive and consistent. We added a "(d) Concise narrative summary" here to tell the LLM to generate human-like text *within* the structured output.
            *   `"output_format_spec"`: The exact JSON structure the LLM is expected to return. This is vital for machine readability and strict compliance.
    *   **`ENTITY_MAPPINGS`:** This dictionary serves as a local "cache" or initial lookup table for common company names and their stock ticker symbols (e.g., `"apple": "AAPL"`). It's pre-populated for speed.
    *   **`INTENT_MAPPINGS`:** This is a dictionary that maps common keywords or phrases from user queries to our predefined financial intents.
        *   **Keys:** Phrases users might type (e.g., "financial report", "bond analysis", "long-term outlook").
        *   **Values:** Our internal, standardized intent labels (e.g., "quarterly_report_summary", "fixed_income_analysis"). This was significantly expanded using insights from your synthetic dataset to cover diverse phrasing.
    *   **`get_current_quarter_year()`:** A simple function to automatically figure out the current financial quarter and year (e.g., "Q3-2024"). This makes the system more dynamic.
    *   **`get_ticker_from_name(company_name: str)`:** This is our **dynamic lookup mechanism**.
        *   It first checks if the `company_name` is already in our local `ENTITY_MAPPINGS` cache (fast lookup).
        *   If not found, it makes an API call to **Financial Modeling Prep (FMP)** using your `FMP_API_KEY`. FMP's API searches for the company name and returns its ticker.
        *   If found via FMP, it adds the new mapping to our `ENTITY_MAPPINGS` cache for future faster access.
        *   `@traceable(run_type="tool")`: LangSmith will record every time this function is called, showing the input (company name) and output (ticker or None).


In [4]:


# 1.1. Environment Setup & Dependencies
print("--- Phase 1: Environment Setup & Data Definition ---")
print("1.1. Installing dependencies...")

# Install necessary libraries (run only once or if not installed)
!pip install transformers langchain langsmith "langchain-core" "langchain-community" openai requests pandas --upgrade

# Import necessary libraries
import os
import re
import json
from datetime import datetime
import requests # For making HTTP requests to FMP API
import pandas as pd # For visualization in Phase 3

# LangSmith setup
from langsmith import traceable
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda


--- Phase 1: Environment Setup & Data Definition ---
1.1. Installing dependencies...
Collecting langsmith
  Downloading langsmith-0.4.5-py3-none-any.whl.metadata (15 kB)
Collecting openai
  Downloading openai-1.95.0-py3-none-any.whl.metadata (29 kB)
Downloading langsmith-0.4.5-py3-none-any.whl (367 kB)
Downloading openai-1.95.0-py3-none-any.whl (755 kB)
   ---------------------------------------- 0.0/755.6 kB ? eta -:--:--
   ---------------------------------------- 0.0/755.6 kB ? eta -:--:--
   --------------------------- ------------ 524.3/755.6 kB 3.4 MB/s eta 0:00:01
   ---------------------------------------- 755.6/755.6 kB 2.0 MB/s eta 0:00:00
Installing collected packages: openai, langsmith
  Attempting uninstall: openai
    Found existing installation: openai 1.93.2
    Uninstalling openai-1.93.2:
      Successfully uninstalled openai-1.93.2
  Attempting uninstall: langsmith
    Found existing installation: langsmith 0.4.4
    Uninstalling langsmith-0.4.4:
      Successfully un

In [5]:

# --- IMPORTANT: Set your API Keys ---
# Replace with your actual keys. LangSmith is required for tracing.
# OpenAI (or another LLM provider like Anthropic, HuggingFace) is optional for a real LLM simulation.
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "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" # <--- REPLACE THIS
os.environ["LANGCHAIN_PROJECT"] = "FinancialPromptOptimizer" # Name for your LangSmith project

# Optional: If you want to use a real LLM (e.g., OpenAI's GPT-3.5-turbo) for simulation
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY" # <--- REPLACE THIS (if using OpenAI)

# --- FMP API Key for dynamic entity mapping ---
os.environ["FMP_API_KEY"] = "eSh1n1sIORuAqD4k40d0b9iqetqXovNe" # <--- REPLACE THIS with your FMP API Key!

print("\nEnvironment setup complete. Remember to replace YOUR_API_KEY placeholders if you haven't!")

# 1.2. Data Discovery: Define Domain-Specific Knowledge Base & Rules
print("\n1.2. Defining Domain-Specific Knowledge Base and Rules...")

COMPLIANCE_RULES = {
    "stock_analysis": {
        "rule_name": "SEC Rule 13f",
        "description": "Requires institutional investment managers to report their security holdings.",
        "tasks_template": [
            "(a) 5-year volatility trend",
            "(b) ESG risk exposure",
            "(c) Institutional ownership changes",
            "(d) Concise narrative summary of key findings."
        ],
        "output_format_spec": "JSON with {metrics: [], risk_assessment: {}, compliance_check: bool, narrative: string}"
    },
    "quarterly_report_summary": {
        "rule_name": "GAAP Principles (General Accounting)",
        "description": "Adherence to generally accepted accounting principles for financial statements.",
        "tasks_template": [
            "(a) Revenue and Net Income trends (QoQ, YoY)",
            "(b) Key balance sheet items (Assets, Liabilities, Equity)",
            "(c) Cash flow analysis (Operating, Investing, Financing)",
            "(d) Concise narrative summary of key findings and outlook."
        ],
        "output_format_spec": "JSON with {summary: {}, financials: {}, narrative: string, compliance_check: bool}"
    },
    "fixed_income_analysis": {
        "rule_name": "FINRA Rule 2210 (Communications with the Public)",
        "description": "Ensures communications about bonds are fair and balanced, avoiding exaggerated claims.",
        "tasks_template": [
            "(a) Bond issuer's credit rating (e.g., S&P, Moody's)",
            "(b) Current yield and yield to maturity (YTM)",
            "(c) Bond covenants and call provisions",
            "(d) Overall assessment narrative."
        ],
        "output_format_spec": "JSON with {bond_details: {}, risk_factors: {}, compliance_notes: string, overall_assessment: string}"
    },
    "sector_analysis": {
        "rule_name": "SEC Regulation Fair Disclosure",
        "description": "Prohibits selective disclosure of material information",
        "tasks_template": [
            "(a) Sector performance vs. broader market",
            "(b) Key growth drivers in the sector",
            "(c) Regulatory risk factors",
            "(d) Market commentary and future outlook."
        ],
        "output_format_spec": "JSON with {sector_metrics: [], top_performers: [], risk_factors: {}, commentary: string}"
    }
}

ENTITY_MAPPINGS = {
    # Tech
    "tesla": "TSLA", "apple": "AAPL", "microsoft": "MSFT", "google": "GOOGL",
    "alphabet": "GOOGL", "amazon": "AMZN", "nvidia": "NVDA", "meta": "META",
    "adobe": "ADBE", "salesforce": "CRM", "intel": "INTC", "amd": "AMD",
    "cisco": "CSCO", "oracle": "ORCL", "netflix": "NFLX", "dis": "DIS", # Added DIS from CSV
    
    # Healthcare
    "johnson & johnson": "JNJ", "pfizer": "PFE", "merck": "MRK", "abbvie": "ABBV",
    "eli lilly": "LLY", "novo nordisk": "NVO", "thermo fisher": "TMO",
    
    # Financials
    "jpmorgan": "JPM", "goldman sachs": "GS", "visa": "V", "mastercard": "MA",
    "bank of america": "BAC", "wells fargo": "WFC", "morgan stanley": "MS",
    "charles schwab": "SCHW",
    
    # Consumer Goods
    "walmart": "WMT", "procter & gamble": "PG", "coca cola": "KO", "pepsi": "PEP",
    "mcdonald's": "MCD", "starbucks": "SBUX", "costco": "COST",
    
    # Industrials/Energy
    "exxon mobil": "XOM", "chevron": "CVX", "boeing": "BA", "lockheed martin": "LMT",
    "raytheon": "RTX", "nextera energy": "NEE", "ford": "F", # Added Ford from CSV
}

# --- UPDATED INTENT_MAPPINGS based on your synthetic dataset analysis ---
INTENT_MAPPINGS = {
    # --- Stock Analysis ---
    "stock analysis": "stock_analysis", "analyze stock": "stock_analysis",
    "company performance": "stock_analysis", "equity analysis": "stock_analysis",
    "investment review": "stock_analysis", "stock insights": "stock_analysis",
    "share performance": "stock_analysis", "how is doing": "stock_analysis",
    "tell me about": "stock_analysis", "technical analysis": "stock_analysis",
    "fundamental analysis": "stock_analysis", "price target": "stock_analysis",
    "valuation analysis": "stock_analysis", "stock forecast": "stock_analysis",
    "long-term outlook": "stock_analysis", "stock trends": "stock_analysis",
    "price action": "stock_analysis", "valuation metrics": "stock_analysis",
    "investment potential": "stock_analysis", "shares": "stock_analysis",
    "forecast": "stock_analysis", "price targets": "stock_analysis",
    "performing": "stock_analysis", "stock valuation": "stock_analysis",
    "comparative analysis": "stock_analysis", # Can be broad, but often applied to stocks
    "fundamentals": "stock_analysis",

    # --- Quarterly Report Summary ---
    "financial report": "quarterly_report_summary", "q report": "quarterly_report_summary",
    "quarterly earnings": "quarterly_report_summary", "earnings summary": "quarterly_report_summary",
    "fiscal results": "quarterly_report_summary", "latest financials": "quarterly_report_summary",
    "income statement": "quarterly_report_summary", "balance sheet": "quarterly_report_summary",
    "cash flow statement": "quarterly_report_summary", "earnings call": "quarterly_report_summary",
    "10-q report": "quarterly_report_summary", "10-k report": "quarterly_report_summary",
    "financial results": "quarterly_report_summary", "profit and loss": "quarterly_report_summary",
    "financial highlights": "quarterly_report_summary", "beat estimates": "quarterly_report_summary",
    "earnings call summary": "quarterly_report_summary", "revenue and profit": "quarterly_report_summary",
    "earnings per share": "quarterly_report_summary", "recent report": "quarterly_report_summary",
    "balance sheet summary": "quarterly_report_summary", "operating margins": "quarterly_report_summary",

    # --- Fixed Income Analysis ---
    "bond analysis": "fixed_income_analysis", "analyze bond": "fixed_income_analysis",
    "fixed income": "fixed_income_analysis", "bond details": "fixed_income_analysis",
    "credit rating": "fixed_income_analysis", "bond yield": "fixed_income_analysis",
    "debt security": "fixed_income_analysis", "corporate bond": "fixed_income_analysis",
    "treasury yield": "fixed_income_analysis", "credit risk": "fixed_income_analysis",
    "yield curve": "fixed_income_analysis", "bond duration": "fixed_income_analysis",
    "current rates": "fixed_income_analysis", "municipal bonds": "fixed_income_analysis",
    "default probabilities": "fixed_income_analysis", "duration and convexity": "fixed_income_analysis",
    "credit risk assessment": "fixed_income_analysis", "yield analysis": "fixed_income_analysis",
    "spreads": "fixed_income_analysis", "floating rate notes": "fixed_income_analysis",
    "debt instrument analysis": "fixed_income_analysis", "zero-coupon bonds": "fixed_income_analysis",
    "investment grade debt": "fixed_income_analysis", "commercial paper": "fixed_income_analysis",

    # --- Sector Analysis ---
    "sector performance": "sector_analysis", "industry comparison": "sector_analysis",
    "sector analysis": "sector_analysis", "industry outlook": "sector_analysis",
    "market segment": "sector_analysis", "sector trends": "sector_analysis",
    "regulatory impact": "sector_analysis", "profitability metrics": "sector_analysis",
    "competitive landscape": "sector_analysis", "supply chain dynamics": "sector_analysis",
    "emerging trends": "sector_analysis", "growth projections": "sector_analysis",
    "top performers": "sector_analysis", "industry analysis": "sector_analysis",
    "real estate sector": "sector_analysis", "communication services sector": "sector_analysis",
    "healthcare sector": "sector_analysis", "financial sector": "sector_analysis",
    "industrials sector": "sector_analysis", "energy sector": "sector_analysis",
    "materials sector": "sector_analysis", "utilities sector": "sector_analysis"
}

def get_current_quarter_year():
    """Returns the current quarter and year in QX-YYYY format."""
    now = datetime.now()
    quarter = (now.month - 1) // 3 + 1
    return f"Q{quarter}-{now.year}"

@traceable(run_type="tool") # Mark for LangSmith tracing
def get_ticker_from_name(company_name: str) -> str | None:
    """
    Attempts to find a stock ticker for a given company name,
    first checking a local cache, then using the FMP API.
    """
    normalized_name = company_name.lower().strip()

    # 1. Check local ENTITY_MAPPINGS cache
    if normalized_name in ENTITY_MAPPINGS:
        print(f"[Ticker Lookup] Cache hit for '{company_name}': {ENTITY_MAPPINGS[normalized_name]}")
        return ENTITY_MAPPINGS[normalized_name]
    
    # 2. If not in cache, try FMP API
    fmp_api_key = os.getenv("FMP_API_KEY")
    if not fmp_api_key:
        print("[Ticker Lookup] FMP_API_KEY is not set. Skipping API lookup.")
        return None

    url = f"https://financialmodelingprep.com/api/v3/search?query={normalized_name}&limit=1&exchange=NASDAQ&apikey={fmp_api_key}"
    
    try:
        response = requests.get(url, timeout=5)
        response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
        data = response.json()

        if data and len(data) > 0:
            ticker = data[0].get('symbol')
            if ticker:
                ENTITY_MAPPINGS[normalized_name] = ticker # Add to cache
                print(f"[Ticker Lookup] FMP API found '{company_name}' -> {ticker}. Added to cache.")
                return ticker
            else:
                print(f"[Ticker Lookup] FMP API found data but no symbol for '{company_name}'.")
        else:
            print(f"[Ticker Lookup] No results from FMP API for '{company_name}'.")
    except requests.exceptions.Timeout:
        print(f"[Ticker Lookup] FMP API request timed out for '{company_name}'.")
    except requests.exceptions.RequestException as e:
        print(f"[Ticker Lookup] Error calling FMP API for '{company_name}': {e}")
    
    return None

print("Domain knowledge base and rules defined.")
print("Dynamic ticker lookup (FMP API) function enabled.")


Environment setup complete. Remember to replace YOUR_API_KEY placeholders if you haven't!

1.2. Defining Domain-Specific Knowledge Base and Rules...
Domain knowledge base and rules defined.
Dynamic ticker lookup (FMP API) function enabled.



### Phase 2: Crafting the Financial Intelligence: Prompt Optimizer Logic & LLM Simulation Pipeline

**What it is:**
This is the "brain" of our system. It contains the core logic that transforms a user's messy, natural language request into a precise, structured, and compliant prompt that our LLM can understand and act upon. It also sets up the sequential flow (pipeline) using LangChain.

**Why it's needed:**
*   **Bridge the Gap:** LLMs are powerful but often need specific instructions. This phase translates vague human intent into crystal-clear AI commands.
*   **Enforce Compliance & Structure:** It injects the regulatory rules and desired output formats directly into the prompt, ensuring the LLM's response meets strict financial requirements.
*   **Orchestration:** LangChain allows us to build a series of steps (optimizer -> LLM) that execute in order, making the workflow transparent and manageable.

**How it works (Code Meaning):**

1.  **`financial_prompt_optimizer(original_prompt: str)`:** This is our custom "optimizer" function.
    *   `@traceable(run_type="tool")`: Again, LangSmith tracks this entire function's execution.
    *   **Input & Normalization:** Takes the `original_prompt` (e.g., "Analyze Apple stock") and converts it to `lower_prompt` for case-insensitive matching.
    *   **1. Entity Extraction:**
        *   It first tries to find company names from `ENTITY_MAPPINGS` within the `lower_prompt`.
        *   It then uses a regular expression (`re.search(r'\b[A-Z]{2,5}\b'`) to find potential stock tickers directly (e.g., "TSLA").
        *   **Crucially, it then uses a heuristic to try to identify other capitalized words in the prompt that might be company names (e.g., "Microsoft") and calls `get_ticker_from_name()` (from Phase 1) to dynamically look them up via FMP.** This significantly improves its ability to handle companies not in the initial cache.
    *   **Timeframe Extraction:** Uses regular expressions to find various date/time mentions (`Q1-2023`, `2024`, `YTD`, `last quarter`, `this year`, `last year`). It calculates the actual quarter/year for relative terms.
    *   **2. Intent Recognition & Rule Application:**
        *   It iterates through the `INTENT_MAPPINGS` dictionary. If any `keyword` (e.g., "quarterly earnings") is found as a substring in the `lower_prompt`, the corresponding `identified_intent` (e.g., "quarterly_report_summary") is set. **The order of keywords in `INTENT_MAPPINGS` can matter here, as it stops at the first match.**
        *   **Fallback:** If no explicit intent keyword is found *but a ticker was extracted*, it defaults to `stock_analysis` (a common user pattern).
        *   **Error Handling:** If no clear intent is found, it returns a user-friendly error message, guiding them to be more specific.
        *   Once the `identified_intent` is determined, it fetches the corresponding `rules` (rule name, tasks, output format) from the `COMPLIANCE_RULES` dictionary.
    *   **3. Construct the Optimized Prompt:**
        *   This is the core "prompt engineering" step. It uses an f-string to build a highly specific and structured prompt for the LLM.
        *   It includes a persona ("You are a highly specialized financial AI assistant").
        *   It injects the `rule_name`, extracted `ticker` (or a placeholder), extracted `timeframe`, the specific `tasks` (from `tasks_template`), and the precise `output_format_spec` into the prompt.
        *   This ensures the LLM receives clear instructions on *what* to do, *for whom*, *when*, and *in what exact format*.
    *   Returns the `optimized_prompt` string.

2.  **`mock_financial_llm(optimized_prompt: str)`:** This function simulates the behavior of a real financial LLM.
    *   `@traceable(run_type="llm")`: LangSmith will track this as an "LLM call."
    *   **Purpose:** Since a real financial LLM is complex and costly to run repeatedly during development, this mock function acts as a stand-in.
    *   **Behavior:** It inspects the `optimized_prompt` to identify the `rule_name` (e.g., "SEC Rule 13f"). Based on this, it returns a **hardcoded JSON string** that mimics the *expected structure and type of data* a real LLM would produce for that rule, including simulated narratives.
    *   This is crucial for testing that your `financial_prompt_optimizer` is generating the correct prompts and that your downstream parsing/visualization logic can handle the expected output formats.

3.  **`financial_analysis_chain` (LangChain Pipeline):**
    *   This is where we define the execution flow:
        *   `RunnableLambda(financial_prompt_optimizer)`: The user's input first goes to our `financial_prompt_optimizer`.
        *   `|`: This is LangChain's "pipe" operator, meaning the output of the left side becomes the input of the right side.
        *   `RunnableLambda(mock_financial_llm)`: The optimized prompt from the optimizer is then passed to our `mock_financial_llm`.
    *   This chain defines the entire logical sequence for processing a user's request. We explicitly chose the `mock_financial_llm` here to avoid API costs and focus on testing the optimizer.


In [7]:
print("\n--- Phase 2: Designing & Building the Optimizer and LLM Simulation ---")

# 2.1. Model Selection (Our "Optimizer Model" and LLM Simulation)
# Our "model" for prompt optimization is a rule-based Python function.
# We will use the mock LLM for testing.

# 2.2. Prompt Engineering & Fine-Tuning (for the Optimizer)
@traceable(run_type="tool") # Mark this function for LangSmith tracing
def financial_prompt_optimizer(original_prompt: str) -> str:
    """
    Analyzes an original user prompt and transforms it into an optimized,
    compliance-aware, and structured prompt for a financial LLM.
    """
    print(f"\n[Optimizer] Original Input: '{original_prompt}'")

    # Initialize extracted entities
    ticker = None
    timeframe = None
    
    lower_prompt = original_prompt.lower()

    # 1. Entity Extraction
    # Try to extract ticker directly from ENTITY_MAPPINGS (our cache)
    for company_name, stock_ticker in ENTITY_MAPPINGS.items():
        if company_name in lower_prompt:
            ticker = stock_ticker
            break
    
    # Also look for direct ticker patterns (e.g., "TSLA", "$AAPL") if not found by company name
    if not ticker:
        ticker_match = re.search(r'\b[A-Z]{2,5}\b', original_prompt)
        if ticker_match:
            potential_ticker = ticker_match.group().upper()
            # If the directly matched ticker is in our known values, use it.
            # Else, consider it not found by direct ticker pattern for this simplified flow.
            if potential_ticker in ENTITY_MAPPINGS.values():
                ticker = potential_ticker

    # If ticker still not found after checking cache or direct ticker match,
    # try dynamic lookup by iterating words in the prompt that might be company names.
    if not ticker:
        words = original_prompt.split()
        for word in words:
            # Simple heuristic: try to find capitalized words (potential proper nouns)
            # and attempt dynamic lookup if not a common small word.
            if word.istitle() and len(word) > 1 and word.lower() not in ["the", "a", "an", "is", "of", "for", "in", "and", "or", "how", "what", "that", "which", "this", "these", "those", "my", "your", "our", "its", "their", "his", "her", "they", "we", "you", "it"]:
                found_ticker = get_ticker_from_name(word.strip(".,!?"))
                if found_ticker:
                    ticker = found_ticker
                    break

    # Extract Timeframe (e.g., Q1-2023, 2024, YTD, last quarter, current quarter, this year, last year)
    timeframe_match = re.search(r'(Q[1-4]-\d{4}|\b\d{4}\b|\bYTD\b|\blast quarter\b|\bcurrent quarter\b|\bthis year\b|\blast year\b)', original_prompt, re.IGNORECASE)
    if timeframe_match:
        extracted_timeframe = timeframe_match.group().lower()
        now = datetime.now()
        if "last quarter" in extracted_timeframe:
            if now.month <= 3: prev_quarter_num, prev_quarter_year = 4, now.year - 1
            elif now.month <= 6: prev_quarter_num, prev_quarter_year = 1, now.year
            elif now.month <= 9: prev_quarter_num, prev_quarter_year = 2, now.year
            else: prev_quarter_num, prev_quarter_year = 3, now.year
            timeframe = f"Q{prev_quarter_num}-{prev_quarter_year}"
        elif "current quarter" in extracted_timeframe:
            timeframe = get_current_quarter_year()
        elif "last year" in extracted_timeframe:
            timeframe = str(now.year - 1)
        elif "this year" in extracted_timeframe:
            timeframe = str(now.year)
        else:
            timeframe = extracted_timeframe.upper() # Ensure QX-YYYY or YYYY is uppercase
    else:
        timeframe = get_current_quarter_year() # Default to current quarter if not specified

    print(f"[Optimizer] Extracted: Ticker={ticker if ticker else 'N/A'}, Timeframe={timeframe}")

    # 2. Intent Recognition & Rule Application
    identified_intent = None
    # Iterate through keywords to find intent. Order matters: More specific phrases first is generally better.
    # The current order in INTENT_MAPPINGS tries to group by intent type.
    for keyword, intent_type in INTENT_MAPPINGS.items():
        if keyword in lower_prompt:
            identified_intent = intent_type
            break
    
    # Default intent if a ticker is present but no specific intent found
    if not identified_intent and ticker:
        identified_intent = "stock_analysis"
    
    # Handle cases where no clear intent can be determined
    if not identified_intent:
        error_msg = f"Error: Could not determine clear intent for '{original_prompt}'. Please specify the financial task (e.g., 'stock analysis', 'quarterly report summary', 'bond analysis', 'sector analysis')."
        print(f"[Optimizer] {error_msg}")
        return error_msg # Return error message instead of an optimized prompt

    rules = COMPLIANCE_RULES.get(identified_intent)
    if not rules:
        error_msg = f"Error: No compliance rules defined for inferred intent '{identified_intent}'. Please refine intent mapping or rules."
        print(f"[Optimizer] {error_msg}")
        return error_msg # Return error message

    rule_name = rules["rule_name"]
    tasks = "\n".join(rules["tasks_template"])
    output_format_spec = rules["output_format_spec"]

    # 3. Construct the Optimized Prompt
    optimized_prompt = f"""
You are a highly specialized financial AI assistant. Perform the requested analysis following all specified rules and output formats.

Perform {rule_name}-compliant analysis of {ticker if ticker else 'the specified entity'} for {timeframe}:
{tasks}

Output: {output_format_spec}
"""
    optimized_prompt = optimized_prompt.strip() # Clean up leading/trailing whitespace

    print(f"[Optimizer] Optimized Prompt Generated:\n---\n{optimized_prompt}\n---")
    return optimized_prompt

print("Prompt optimization engine defined.")

# 2.3. Prototype Development: Simulate LLM Interaction with LangChain and LangSmith

# Option A: Mock LLM (This is NOW ACTIVE)
@traceable(run_type="llm") # Mark this function for LangSmith tracing
def mock_financial_llm(optimized_prompt: str) -> str:
    """
    A mock LLM that simulates a financial analysis response based on the optimized prompt.
    It returns a generic JSON structure as specified.
    """
    print(f"\n[Mock LLM] Receiving Input:\n---\n{optimized_prompt}\n---")

    mock_response_data = {
        "status": "mock_analysis_completed",
        "notes": "This is a simulated LLM response based on the optimized prompt.",
        "compliance_checked": True
    }

    # Craft a more specific mock response based on the identified rule
    if "SEC Rule 13f" in optimized_prompt:
        mock_response_data.update({
            "metrics": [
                {"volatility_5yr": "2.8%"},
                {"beta": "1.35"},
                {"institutional_ownership_change_QoQ": "+2.1%"}
            ],
            "risk_assessment": {
                "ESG_exposure": "Moderate (Carbon Emissions, Supply Chain)",
                "regulatory_risk": "Low"
            },
            "compliance_check": True,
            "narrative": "Mock analysis following SEC Rule 13f guidelines for stock analysis. This is a simulated narrative demonstrating human-like text."
        })
    elif "GAAP Principles" in optimized_prompt:
         mock_response_data.update({
            "summary": {
                "revenue_QoQ": "+7.2%",
                "net_income_YoY": "+12.5%",
                "EPS": "1.52"
            },
            "financials": {
                "assets": "120M USD",
                "liabilities": "60M USD",
                "equity": "60M USD"
            },
            "cash_flow": {
                "operating": "20M USD",
                "investing": "-5M USD"
            },
            "narrative": "Mock quarterly report summary adhering to GAAP principles, highlighting key financial trends and a simulated outlook, designed for readability.",
            "compliance_check": True
        })
    elif "FINRA Rule 2210" in optimized_prompt:
        mock_response_data.update({
            "bond_details": {
                "issuer": "Corp Bond A",
                "credit_rating_sp": "AA+",
                "current_yield": "4.2%",
                "ytm": "4.55%"
            },
            "risk_factors": {
                "call_provision": "Yes (after 5 years)",
                "covenants_summary": "Standard financial covenants apply"
            },
            "compliance_notes": "Mock bond communication analysis per FINRA Rule 2210, ensuring balanced presentation.",
            "overall_assessment": "This is a simulated overall assessment of the bond's characteristics, providing a human-like summary."
        })
    elif "SEC Regulation Fair Disclosure" in optimized_prompt:
        mock_response_data.update({
            "sector_metrics": [
                {"performance_vs_market": "+3.5%"},
                {"q_q_growth": "+5.1%"},
                {"leading_companies": ["Company A", "Company B"]}
            ],
            "top_performers": ["Specific Company 1", "Specific Company 2"],
            "risk_factors": {
                "regulatory_changes": "Moderate",
                "supply_chain_disruptions": "Low"
            },
            "compliance_checked": True,
            "commentary": "Mock sector analysis adhering to SEC Reg FD, providing key insights and a simulated market commentary on emerging trends and outlook."
        })
    
    mock_json_output = json.dumps(mock_response_data, indent=2)
    print(f"[Mock LLM] Responding with (truncated):\n---\n{mock_json_output[:200]}...\n---") # Truncate for display
    return mock_json_output

print("Mock LLM defined for testing and active in the chain.")

# Option B: Real LLM (e.g., OpenAI's GPT) - This block is explicitly commented out as requested.
# from langchain_openai import ChatOpenAI
# real_llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
# print("Real LLM (OpenAI) initialized (but bypassed in the chain).")


# Construct the LangChain Pipeline
print("\nConfiguring LangChain pipeline...")

financial_analysis_chain = (
    RunnableLambda(financial_prompt_optimizer) | # Step 1: Optimize the prompt
    RunnableLambda(mock_financial_llm)           # <--- Step 2: MOCK LLM (ACTIVE)
)

print("LangChain pipeline configured.")


--- Phase 2: Designing & Building the Optimizer and LLM Simulation ---
Prompt optimization engine defined.
Mock LLM defined for testing and active in the chain.

Configuring LangChain pipeline...
LangChain pipeline configured.




### Phase 3: Interactive Testing and Evaluation

**What it is:**
This is the interactive part of the Jupyter Notebook. It allows you to enter natural language financial requests and see the system's output in real-time. It also initiates the crucial traceability in LangSmith.

**Why it's needed:**
*   **Rapid Iteration:** Allows developers to quickly test different user queries and see how the optimizer and LLM respond.
*   **Debugging:** Immediate feedback helps identify issues in prompt optimization, entity extraction, or intent recognition.
*   **Observability in LangSmith:** Every query run in this phase automatically generates a detailed "trace" in your LangSmith dashboard, providing an unparalleled audit trail and debugging tool.
*   **Human-Friendly Output:** This phase also includes code to take the structured JSON and present it in a more readable format (tables, narratives).

**How it works (Code Meaning):**

1.  **`while True: ... input(...)`:** This creates an infinite loop that constantly prompts you to `Enter your request: `. You type your query, press Enter, and the system processes it. Typing `exit` breaks the loop.
2.  **`financial_analysis_chain.invoke(user_input)`:** This is the magic line! It takes your typed `user_input` and starts the LangChain pipeline defined in Phase 2. Because `LANGCHAIN_TRACING_V2` is enabled, LangSmith automatically starts a new trace for this invocation.
3.  **JSON Parsing & Display:**
    *   `json.loads(final_output)`: Attempts to convert the LLM's (mock) JSON output string into a Python dictionary.
    *   `json.dumps(parsed_output, indent=2)`: Pretty-prints the JSON dictionary to the console, making it easy to read.
    *   `except json.JSONDecodeError`: Catches cases where the output isn't valid JSON (e.g., if the optimizer returned an error message like "Error: Could not determine clear intent..."). In such cases, it just prints the raw output.
4.  **Enhanced Basic Visualization Logic:**
    *   `if parsed_output:`: This block activates only if the LLM output was valid JSON.
    *   `if "metrics" in parsed_output ... elif "summary" in parsed_output ...`: These `if/elif` statements check for specific keys within the returned JSON to identify the type of analysis (stock, quarterly, fixed income, sector).
    *   `import pandas as pd`: Pandas DataFrames are used to format numerical or tabular data (like metrics, financials, bond details) into clean, readable tables directly in the Jupyter output.
    *   `print(df.to_string(index=False))`: Prints the DataFrame without the usual row index, for a cleaner look.
    *   `if "narrative" in parsed_output ...`: These separate checks specifically look for and print the human-like narrative/commentary fields (`narrative`, `analysis_details`, `overall_assessment`, `commentary`) that were requested in the `COMPLIANCE_RULES` `output_format_spec`. This fulfills the "human-like" output requirement.
5.  **`except Exception as e:`:** A general error handler to catch any unexpected issues during the process, providing feedback.
6.  **LangSmith Call to Action:** At the end, it reminds you to visit `smith.langchain.com` to see the detailed traces, which is crucial for understanding how each query was processed step-by-step.


In [9]:

print("\n--- Phase 3: Interactive Testing and Evaluation ---")
print("Enter your financial analysis requests below. Type 'exit' to quit.")
print("Each request will generate a trace in LangSmith (smith.langchain.com).")

# pandas is imported in Phase 1 now, but including here for self-containment if running just this cell
# import pandas as pd 

while True:
    user_input = input("\nEnter your request: ")
    if user_input.lower() == 'exit':
        break

    try:
        print("\n[System] Processing request...")
        final_output = financial_analysis_chain.invoke(user_input)
        
        print("\n--- Final LLM (Mock) Output ---")
        
        parsed_output = None
        try:
            parsed_output = json.loads(final_output)
            print(json.dumps(parsed_output, indent=2))
        except json.JSONDecodeError:
            print(final_output)
            print("\n[Visualization] Skipping visualization: Output is not valid JSON.")
            continue
            
        # --- Basic Visualization Logic (Enhanced) ---
        print("\n--- Generating Basic Visualization ---")
        
        if parsed_output:
            # Example 1: Stock Analysis Visualization
            if "metrics" in parsed_output and isinstance(parsed_output["metrics"], list):
                print("\n--- Stock Metrics ---")
                flat_metrics = {}
                for item in parsed_output["metrics"]:
                    flat_metrics.update(item)
                df_metrics = pd.DataFrame([flat_metrics])
                print(df_metrics.to_string(index=False))
                
                print("\n--- Risk Assessment ---")
                if "risk_assessment" in parsed_output:
                    for k, v in parsed_output["risk_assessment"].items():
                        print(f"- {k.replace('_', ' ').title()}: {v}")
                print(f"Compliance Check: {parsed_output.get('compliance_check', 'N/A')}")

            # Example 2: Quarterly Report Summary Visualization
            elif "summary" in parsed_output and "financials" in parsed_output:
                print("\n--- Quarterly Report Summary ---")
                print(f"Revenue QoQ: {parsed_output['summary'].get('revenue_QoQ', 'N/A')}")
                print(f"Net Income YoY: {parsed_output['summary'].get('net_income_YoY', 'N/A')}")
                print(f"EPS: {parsed_output['summary'].get('EPS', 'N/A')}")

                print("\n--- Key Financials ---")
                financial_df = pd.DataFrame([parsed_output["financials"]])
                print(financial_df.to_string(index=False))

                print("\n--- Cash Flow Analysis ---")
                cash_flow_df = pd.DataFrame([parsed_output["cash_flow"]])
                print(cash_flow_df.to_string(index=False))
                
                print(f"\nCompliance Check: {parsed_output.get('compliance_check', 'N/A')}")
            
            # Example 3: Fixed Income Analysis Visualization
            elif "bond_details" in parsed_output and "risk_factors" in parsed_output:
                print("\n--- Bond Details ---")
                bond_details_df = pd.DataFrame([parsed_output["bond_details"]])
                print(bond_details_df.to_string(index=False))

                print("\n--- Risk Factors ---")
                if "risk_factors" in parsed_output:
                    for k, v in parsed_output["risk_factors"].items():
                        print(f"- {k.replace('_', ' ').title()}: {v}")
                print(f"Compliance Notes: {parsed_output.get('compliance_notes', 'N/A')}")

            # Example 4: Sector Analysis Visualization
            elif "sector_metrics" in parsed_output and isinstance(parsed_output["sector_metrics"], list):
                print("\n--- Sector Metrics ---")
                flat_sector_metrics = {}
                for item in parsed_output["sector_metrics"]:
                    flat_sector_metrics.update(item)
                df_sector_metrics = pd.DataFrame([flat_sector_metrics])
                print(df_sector_metrics.to_string(index=False))

                print("\n--- Top Performers ---")
                if "top_performers" in parsed_output and parsed_output["top_performers"]:
                    print(", ".join(parsed_output["top_performers"]))

                print("\n--- Risk Factors ---")
                if "risk_factors" in parsed_output:
                    for k, v in parsed_output["risk_factors"].items():
                        print(f"- {k.replace('_', ' ').title()}: {v}")
                print(f"Compliance Check: {parsed_output.get('compliance_checked', 'N/A')}")

            # --- NEW: Always check for and print narrative/commentary fields ---
            if "narrative" in parsed_output and parsed_output["narrative"]:
                print(f"\n--- Analysis Narrative ---")
                print(parsed_output["narrative"])
            elif "analysis_details" in parsed_output and parsed_output["analysis_details"]:
                print(f"\n--- Analysis Details ---")
                print(parsed_output["analysis_details"])
            elif "overall_assessment" in parsed_output and parsed_output["overall_assessment"]:
                print(f"\n--- Overall Assessment ---")
                print(parsed_output["overall_assessment"])
            elif "commentary" in parsed_output and parsed_output["commentary"]:
                print(f"\n--- Market Commentary ---")
                print(parsed_output["commentary"])
            
            else:
                print("[Visualization] No specific visualization logic found for this output structure or narrative fields.")
        else:
            print("[Visualization] No parsed output available for visualization.")

    except Exception as e:
        print(f"\n[System Error] An unexpected error occurred during processing or visualization: {e}")
        print("Please review the prompt optimization logic, your LLM configuration, or the new visualization code.")

print("\n--- Testing Session Ended ---")
print("Now, go to LangSmith (smith.langchain.com) to review the traces under the 'FinancialPromptOptimizer' project.")
print("Look for the 'tool' (our optimizer) and 'llm' (our mock/real LLM) runs within each trace.")

# Example test prompts you can use:
# 1. Analyze Tesla stock
# 2. Give me a Q4-2023 financial report summary for Apple
# 3. Analyze MSFT company performance 2022
# 4. Perform stock analysis on Google for last quarter
# 5. Analyze the bond issued by JP Morgan
# 6. Summarize the quarterly earnings of Amazon
# 7. What about fixed income analysis for a corporate bond?
# 8. Analyze IBM
# 9. Just give me some random info (to test error handling)
# 10. Analyze the tech sector performance
# 11. How is Netflix performing this year? (Tests new timeframe)
# 12. Show me the 10-K report for Apple (Tests new report type)
# 13. What are the recent financials for Adobe? (Tests new phrasing for quarterly)
# 14. What's the credit risk on the KO bond? (Tests new bond phrasing)


--- Phase 3: Interactive Testing and Evaluation ---
Enter your financial analysis requests below. Type 'exit' to quit.
Each request will generate a trace in LangSmith (smith.langchain.com).



Enter your request:  Growth projections for healthcare sector



[System] Processing request...

[Optimizer] Original Input: 'Growth projections for healthcare sector'
[Ticker Lookup] FMP API found 'Growth' -> GCACW. Added to cache.
[Optimizer] Extracted: Ticker=GCACW, Timeframe=Q3-2025
[Optimizer] Optimized Prompt Generated:
---
You are a highly specialized financial AI assistant. Perform the requested analysis following all specified rules and output formats.

Perform SEC Regulation Fair Disclosure-compliant analysis of GCACW for Q3-2025:
(a) Sector performance vs. broader market
(b) Key growth drivers in the sector
(c) Regulatory risk factors
(d) Market commentary and future outlook.

Output: JSON with {sector_metrics: [], top_performers: [], risk_factors: {}, commentary: string}
---

[Mock LLM] Receiving Input:
---
You are a highly specialized financial AI assistant. Perform the requested analysis following all specified rules and output formats.

Perform SEC Regulation Fair Disclosure-compliant analysis of GCACW for Q3-2025:
(a) Sector perform


Enter your request:  exit



--- Testing Session Ended ---
Now, go to LangSmith (smith.langchain.com) to review the traces under the 'FinancialPromptOptimizer' project.
Look for the 'tool' (our optimizer) and 'llm' (our mock/real LLM) runs within each trace.


#### Q as a user i asked = Growth projections for healthcare sector

**The provided output is a mock financial analysis for the healthcare sector, specifically for Q3-2025, adhering to SEC Regulation Fair Disclosure (Reg FD). It includes simulated sector performance metrics (e.g., +3.5% vs. broader market, +5.1% quarter-over-quarter growth), lists top-performing companies (Company A, Company B, Specific Company 1, Specific Company 2), highlights risk factors like "Moderate" regulatory changes and "Low" supply chain disruptions, and offers a market commentary on emerging trends and outlook, all as part of an interactive testing and evaluation phase.**


### Phase 4: Deploy & Monitor (Conceptual)

**What it is:**
This phase shifts from hands-on coding to a conceptual discussion about taking this prototype to a real-world production environment. It emphasizes the importance of deployment strategies, user interfaces, and continuous monitoring.

**Why it's needed:**
*   **Real-World Application:** A Jupyter Notebook is for development. To provide a service to customers, the code needs to run reliably on a server.
*   **Scalability & Maintainability:** Production systems require robust packaging, deployment, and operational oversight.
*   **Continuous Improvement:** AI systems, especially those dealing with dynamic data and user language, need constant monitoring and updates.

**How it works (Code Meaning / Explanation):**

1.  **`4.1. Deployment Strategy (Conceptual):`**
    *   Discusses packaging the code (Python module), containerization (Docker for consistency), and deploying as microservices (AWS Lambda, Google Cloud Run) with API endpoints. This explains how the "brain" you built would become an accessible service.
2.  **`4.2. Application Integration (Conceptual Web Interface):`**
    *   Outlines how a user-friendly web front-end (e.g., using Streamlit, a Python framework for simple web apps) would connect to your deployed backend. The pseudo-code shows how a user might type in a request and see the structured JSON output displayed beautifully. This addresses the "what about the interface?" question.
3.  **`4.3. Monitoring & Maintenance (Crucial Role of LangSmith):`**
    *   This section highlights **LangSmith's critical role in a production setting.**
        *   **Automatic Logging & Audit Trail:** Every transaction is logged, providing a complete history for compliance and troubleshooting.
        *   **Debugging & Performance:** You can track latency, success rates, and pinpoint exact issues by reviewing the inputs and outputs of each step in the pipeline.
        *   **Drift Detection:** Monitoring user queries helps identify new trends or language that might require updates to your `INTENT_MAPPINGS` or `COMPLIANCE_RULES`.
        *   **Feedback Loop:** Emphasizes collecting user feedback to further refine the AI's performance and prompt optimization logic.
        *   **Periodic Updates:** Reinforces that financial systems require constant review due to changing regulations and market conditions.


In [11]:
print("\n--- PHASE 4: Deployment & Monitoring Considerations ---")

print("\n4.1. Deployment Strategy (Conceptual):")
print("Once the prompt optimization logic is robust and tested in Jupyter, "
      "for production deployment, you would typically:")
print("  - Package the `financial_prompt_optimizer` function and its dependencies "
      "into a Python module/library.")
print("  - Containerize the application (e.g., using Docker) to ensure consistency across environments.")
print("  - Deploy it as a microservice on a cloud platform (e.g., AWS Lambda, Google Cloud Run, Azure Container Apps) "
      "with an API endpoint.")
print("  - Or, integrate it directly into a larger financial application backend.")

print("\n4.2. Application Integration (Conceptual Web Interface):")
print("You could build a simple web front-end using frameworks like Streamlit or Gradio for a user-friendly interface.")
print("  - The web interface would send user queries to your deployed prompt optimization service.")
print("  - The service would return the optimized prompt's output (from the LLM simulation), which the web UI displays.")
print("  - Example (pseudo-code for Streamlit):")
print("""
# In a separate app.py file:
# import streamlit as st
# from your_module import financial_analysis_chain # Assuming you packaged your chain

# st.title("Financial Prompt Optimizer Demo")
# user_input = st.text_area("Enter your financial analysis request:")

# if st.button("Analyze"):
#     if user_input:
#         with st.spinner("Optimizing and Analyzing..."):
#             result = financial_analysis_chain.invoke(user_input)
#         st.subheader("Optimized LLM Output:")
#         try:
#             st.json(json.loads(result))
#         except json.JSONDecodeError:
#             st.code(result)
#     else:
#         st.warning("Please enter a request.")
""")

print("\n4.3. Monitoring & Maintenance (Crucial Role of LangSmith):")
print("  - **LangSmith for Monitoring:** This is where LangSmith truly shines in a production environment.")
print("    - All runs (prompt optimization, LLM calls) are automatically logged.")
print("    - You can track latency, success rates, and errors for individual components.")
print("    - Provides a complete audit trail for compliance, showing exactly how each user request was processed.")
print("    - Allows for quick debugging if the LLM output is not as expected, by inspecting the optimized prompt.")
print("  - **Drift Detection:** Monitor changes in user query patterns. If new terms or analysis types emerge, "
      "you might need to update your `ENTITY_MAPPINGS` or `INTENT_MAPPINGS`.")
print("  - **Feedback Loop:** Collect user feedback on the quality and compliance of the LLM's outputs. "
      "This feedback is vital for refining the prompt optimization logic.")
print("  - **Periodic Updates:** Financial regulations and market conditions change. Schedule regular reviews "
      "and updates to your `COMPLIANCE_RULES` and related data to ensure continued accuracy and compliance.")

print("\n--- Project Complete (within Jupyter Scope) ---")
print("You have successfully built a foundational 'Financial Prompt Optimization Interface' with observability!")


--- PHASE 4: Deployment & Monitoring Considerations ---

4.1. Deployment Strategy (Conceptual):
Once the prompt optimization logic is robust and tested in Jupyter, for production deployment, you would typically:
  - Package the `financial_prompt_optimizer` function and its dependencies into a Python module/library.
  - Containerize the application (e.g., using Docker) to ensure consistency across environments.
  - Deploy it as a microservice on a cloud platform (e.g., AWS Lambda, Google Cloud Run, Azure Container Apps) with an API endpoint.
  - Or, integrate it directly into a larger financial application backend.

4.2. Application Integration (Conceptual Web Interface):
You could build a simple web front-end using frameworks like Streamlit or Gradio for a user-friendly interface.
  - The web interface would send user queries to your deployed prompt optimization service.
  - The service would return the optimized prompt's output (from the LLM simulation), which the web UI displays.
  

## Thanku