# Repo Link
https://github.com/mr-kelsey/fa25-aai520-group6

# Objective
Build an autonomous Investment Research Agent that:
1. Plans its research steps for a given stock symbol.
2. Uses tools dynamically (APIs, datasets, retrieval).
3. Self-reflects to assess the quality of its output.
4. Learns across runs (e.g., keeps brief memories or notes to improve future analyses).

# Code

In [None]:
"""Import required libraies and load environmentals"""
from dotenv import dotenv_values
from smolagents import ToolCallingAgent, TransformersModel, InferenceClientModel
from warnings import filterwarnings

from tools import chaining, performance, risk, sentiment, impact, commander, evaluator

# Gluonts uses an outdated pd.df access method that causes a warning.  We are silencing it here to provide a cleaner output
filterwarnings("ignore")
env = dotenv_values(".env")

In [None]:
"""Set up initial agent"""
base_model = TransformersModel(model_id="Qwen/Qwen3-Coder-30B-A3B-Instruct", device_map="auto", max_new_tokens=4096)

## Prompt Chaining:

In [None]:
chaining_agent = ToolCallingAgent(
    model=base_model,
    tools=chaining.tools,
    )

chaining_agent.visualize()

### Ingest News

In [None]:
NVDA_links = chaining_agent.run("Retrieve a list of URLs for the financial instrument `NVDA`.  Make sure you return just the list of URLs without any further processing.")

### Preprocess

In [None]:
NVDA_articles = chaining_agent.run("Use the list of URLs to retrieve the full article at each URL.  Simply retrieve all articles and nothing more.",
                                   additional_args={"urls": NVDA_links})

### Classify

### Extract

### Summarize

## Routing:

In [None]:
"""Configure agents"""
performance_agent = ToolCallingAgent(
    name="Perforance",
    description="Analyze stock performance, historical growth, and future potential.",
    model=InferenceClientModel(),
    tools=performance.tools,
    )

risk_agent = ToolCallingAgent(
    name="Risk",
    description="Measure risk, volatility, and downside probability.",
    model=InferenceClientModel(),
    tools=risk.tools,
    )

sentiment_agent = ToolCallingAgent(
    name="Sentiment",
    description="Gauge market mood through news and media sentiment.",
    model=InferenceClientModel(),
    tools=sentiment.tools,
    )

impact_agent = ToolCallingAgent(
    name="Impact",
    description="Evaluate sustainability, innovation, and ethical governance.",
    model=InferenceClientModel(),
    tools=impact.tools,
    )

evaluation_agent = ToolCallingAgent(
    name="Evaluator",
    description="Audit, validate, and optimize Commander's final decision.",
    model=InferenceClientModel(),  # probably need a more robust agent here...
    tools=evaluator.tools,
    )

commander_agent = ToolCallingAgent(
    name="Commander",
    description="Synthesize all agent outputs and deliver final classification.",
    model=InferenceClientModel(),
    tools=commander.tools,
    managed_agents=[performance_agent, risk_agent, sentiment_agent, impact_agent, evaluation_agent]
    )

commander_agent.visualize()

In [None]:
task1 = "NVIDIA stock (NVDA)"
task2 = "Bitwise Bitcoin ETF (BITB)"

commander_agent.run(f"""Your task is to classify each of the following financial instruments into “BUY”, “HOLD”, or “AVOID”:
1. {task1}
2. {task2}

You have a very capable team below you that you should leverage to accomplish your task.  Here are the profiles of each of your team members:
`Performance`
	Role: Evaluates core metrics — returns, growth rate, EPS, and expense ratios if analyzing ETFs.
	Logic: Calculates performance over 1, 3, and 5 years; compares it to sector averages.
	Output: A performance score between 0 and 1 with 1 being the most performant.
    
`Risk`
	Role: Measures risk — beta value, standard deviation, and recent drawdowns.
	Logic: 
	Output: A risk score between -1 and 1 where 1 means “High Risk", -1 menas High Reward”, and 0 represents a “Stable Performer”.
    
`Sentiment`
	Role: Reads financial news — determines market sentiment.
	Logic: Uses NLP to classify tone of each article into -1 if negative, 0 if neutral, and 1 if positive.
	Output: A sentiment score consisting of the mean value of the sentiment classifications for all articles analyzed.
    
`Impact`
	Role: Determines impact — calculates ESG or AI exposure scoring.
	Logic: 
	Output: An impact score between 0 and 1 where the higher the score, the greater the impact.

`Evaluator`
	Role: Reviews worker logs — ensures final recommendation is logically derived from information retrieved.
    Logic: 
    Output: A binary classification: "PASS" if the logic is sound, "FAIL" if the logic is unsound.
    
An example workflow would be:
1) Task each of your workers with doing their job:
	i) Performance should calculate and return a performance score.
    ii) Risk should calculate and return a risk score.
    iii) Sentiment should calculate and return a sentiment score.
    iv) Impact should calculate and return an impact score.
2) Use your team member's scores to determine a reccomendation.
3) Send your logs and recommendation to Evaluator for final sign-off.
4) Decide what to do next based on what Evaluator returns:
	i) If Evaluator returns "PASS":
		5) Output your final recommendation: “BUY”, “HOLD”, or “AVOID”
    ii) If Evaluator returns "FAIL":
		5) Use Evaluator's `Additional context` to see where things went wrong.
        6) Adjust accordingly and rerun as many of these steps as needed to pas the evaluation.
""")

## Evaluator–Optimizer:
### Generate analysis

### Evaluate quality

### Refine using feedback

# Summary

* Agent Design and Workflows
* Agent Functions and Capabilities
* Evaluation and Iteration