# Chapter 8: Portfolio Optimization - The Complete System

This is the capstone -- a complete AI-powered portfolio optimization system. Six specialized tools handle everything from data fetching to discrete share allocation. The agent orchestrates these tools to answer complex investment questions.

Uses **real market data** via `yfinance` and **professional optimization** via `PyPortfolioOpt`.

| Pattern | How We Use It |
|---------|---------------|
| **Tool Calling** | Fetch prices, run optimizations |
| **ReAct** | Reason through optimization choices |
| **CodeAct** | Calculate custom metrics, generate reports |
| **Memory** | Remember context for follow-ups |

In [1]:
!pip install smolagents yfinance pyportfolioopt pandas numpy -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/155.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m155.7/155.7 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.1/222.1 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m566.4/566.4 kB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 5.0.0 requires huggingface-hub<2.0,>=1.3.0, but you have huggingface-hub 0.36.2 which is incompatible.[0m[31m
[0m

In [2]:
import numpy as np
import pandas as pd
from smolagents import CodeAgent, tool
from smolagents import OpenAIServerModel
from smolagents.monitoring import LogLevel

In [14]:
import getpass
API_KEY = getpass.getpass("Enter your OpenAI API key: ")

Enter your OpenAI API key: ··········


In [15]:
model = OpenAIServerModel("gpt-4o-mini", api_key=API_KEY)
print("Model initialized!")

Model initialized!


## Understanding PyPortfolioOpt

**PyPortfolioOpt** implements Modern Portfolio Theory (MPT) -- the mathematical framework for constructing portfolios that maximize return for a given level of risk.

### Key Concepts

| Concept | Description |
|---------|-------------|
| **Efficient Frontier** | Set of portfolios offering the highest return for each risk level |
| **Max Sharpe** | Portfolio with the best risk-adjusted return |
| **Min Volatility** | Lowest-risk portfolio possible from the given assets |
| **Expected Returns** | Predicted future returns (estimated from historical data) |
| **Covariance Matrix** | How assets move together (correlation structure) |

### The Optimization Flow

```
Historical Prices → Expected Returns → Covariance Matrix → Optimizer → Optimal Weights
```

## The Tools

Six tools form our portfolio optimization pipeline. Each tool is **self-contained** with all necessary imports inside the function body -- this is important for `smolagents` compatibility, since tools execute in isolated contexts.

| Tool | Purpose |
|------|--------|
| `get_historical_prices` | Fetch price data and statistics |
| `optimize_max_sharpe` | Find best risk-adjusted portfolio |
| `optimize_min_volatility` | Find lowest-risk portfolio |
| `optimize_target_return` | Find portfolio for a specific return goal |
| `calculate_allocation` | Convert weights to actual shares to buy |
| `compare_strategies` | Side-by-side strategy comparison |

In [16]:
@tool
def get_historical_prices(tickers: str, years: int = 2) -> str:
    """
    Fetch historical adjusted close prices for a list of stock tickers.

    Args:
        tickers: Comma-separated stock symbols (e.g., 'AAPL,MSFT,GOOGL')
        years: Number of years of historical data (default: 2)

    Returns:
        JSON string with price statistics and recent prices
    """
    import yfinance as yf
    import json
    import numpy as np
    from datetime import datetime, timedelta

    try:
        ticker_list = [t.strip().upper() for t in tickers.split(',')]
        end_date = datetime.now()
        start_date = end_date - timedelta(days=365*years)

        prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]

        if len(ticker_list) == 1:
            prices = prices.to_frame(name=ticker_list[0])

        stats = {}
        for ticker in ticker_list:
            col = prices[ticker].dropna()
            returns = col.pct_change().dropna()
            stats[ticker] = {
                "latest_price": round(col.iloc[-1], 2),
                "annual_return": round(returns.mean() * 252 * 100, 2),
                "annual_volatility": round(returns.std() * np.sqrt(252) * 100, 2),
                "data_points": len(col)
            }

        return json.dumps({
            "tickers": ticker_list,
            "period": f"{years} years",
            "statistics": stats,
            "data_ready": True
        }, indent=2)
    except Exception as e:
        return json.dumps({"error": str(e)})

print("Tool 1 created: get_historical_prices")

Tool 1 created: get_historical_prices


In [17]:
@tool
def optimize_max_sharpe(tickers: str, risk_free_rate: float = 0.02) -> str:
    """
    Find the portfolio with the maximum Sharpe ratio (best risk-adjusted return).

    Args:
        tickers: Comma-separated stock symbols (e.g., 'AAPL,MSFT,GOOGL,AMZN')
        risk_free_rate: Annual risk-free rate as decimal (default: 0.02 = 2%)

    Returns:
        JSON with optimal weights and expected performance
    """
    import yfinance as yf
    import json
    from datetime import datetime, timedelta
    from pypfopt import EfficientFrontier, expected_returns, risk_models

    try:
        ticker_list = [t.strip().upper() for t in tickers.split(',')]

        if len(ticker_list) == 1:
            return json.dumps({"error": "Need at least 2 tickers for portfolio optimization"})

        end_date = datetime.now()
        start_date = end_date - timedelta(days=365*2)

        prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]

        mu = expected_returns.mean_historical_return(prices)
        S = risk_models.sample_cov(prices)

        ef = EfficientFrontier(mu, S)
        weights = ef.max_sharpe(risk_free_rate=risk_free_rate)
        cleaned_weights = ef.clean_weights()

        expected_return, volatility, sharpe = ef.portfolio_performance(risk_free_rate=risk_free_rate)

        return json.dumps({
            "optimization": "Maximum Sharpe Ratio",
            "risk_free_rate": risk_free_rate,
            "weights": {k: round(v*100, 2) for k, v in cleaned_weights.items()},
            "performance": {
                "expected_annual_return": round(expected_return*100, 2),
                "annual_volatility": round(volatility*100, 2),
                "sharpe_ratio": round(sharpe, 3)
            }
        }, indent=2)
    except Exception as e:
        return json.dumps({"error": str(e)})

print("Tool 2 created: optimize_max_sharpe")

Tool 2 created: optimize_max_sharpe


In [18]:
@tool
def optimize_min_volatility(tickers: str) -> str:
    """
    Find the portfolio with minimum volatility (lowest risk).
    Best for conservative investors who prioritize capital preservation.

    Args:
        tickers: Comma-separated stock symbols

    Returns:
        JSON with optimal weights and expected performance
    """
    import yfinance as yf
    import json
    from datetime import datetime, timedelta
    from pypfopt import EfficientFrontier, expected_returns, risk_models

    try:
        ticker_list = [t.strip().upper() for t in tickers.split(',')]

        if len(ticker_list) == 1:
            return json.dumps({"error": "Need at least 2 tickers for portfolio optimization"})

        end_date = datetime.now()
        start_date = end_date - timedelta(days=365*2)

        prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]

        mu = expected_returns.mean_historical_return(prices)
        S = risk_models.sample_cov(prices)

        ef = EfficientFrontier(mu, S)
        weights = ef.min_volatility()
        cleaned_weights = ef.clean_weights()

        expected_return, volatility, sharpe = ef.portfolio_performance()

        return json.dumps({
            "optimization": "Minimum Volatility",
            "weights": {k: round(v*100, 2) for k, v in cleaned_weights.items()},
            "performance": {
                "expected_annual_return": round(expected_return*100, 2),
                "annual_volatility": round(volatility*100, 2),
                "sharpe_ratio": round(sharpe, 3)
            }
        }, indent=2)
    except Exception as e:
        return json.dumps({"error": str(e)})

print("Tool 3 created: optimize_min_volatility")

Tool 3 created: optimize_min_volatility


In [19]:
@tool
def optimize_target_return(tickers: str, target_return: float) -> str:
    """
    Find the minimum volatility portfolio that achieves a specific target return.

    Args:
        tickers: Comma-separated stock symbols
        target_return: Desired annual return as decimal (e.g., 0.15 = 15%)

    Returns:
        JSON with optimal weights and expected performance
    """
    import yfinance as yf
    import json
    from datetime import datetime, timedelta
    from pypfopt import EfficientFrontier, expected_returns, risk_models

    try:
        ticker_list = [t.strip().upper() for t in tickers.split(',')]

        if len(ticker_list) == 1:
            return json.dumps({"error": "Need at least 2 tickers for portfolio optimization"})

        end_date = datetime.now()
        start_date = end_date - timedelta(days=365*2)

        prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]

        mu = expected_returns.mean_historical_return(prices)
        S = risk_models.sample_cov(prices)

        ef = EfficientFrontier(mu, S)
        weights = ef.efficient_return(target_return=target_return)
        cleaned_weights = ef.clean_weights()

        expected_return, volatility, sharpe = ef.portfolio_performance()

        return json.dumps({
            "optimization": f"Target Return ({target_return*100:.1f}%)",
            "weights": {k: round(v*100, 2) for k, v in cleaned_weights.items()},
            "performance": {
                "expected_annual_return": round(expected_return*100, 2),
                "annual_volatility": round(volatility*100, 2),
                "sharpe_ratio": round(sharpe, 3)
            }
        }, indent=2)
    except Exception as e:
        return json.dumps({"error": str(e)})

print("Tool 4 created: optimize_target_return")

Tool 4 created: optimize_target_return


In [20]:
@tool
def calculate_allocation(tickers: str, total_investment: float,
                         optimization: str = "max_sharpe") -> str:
    """
    Calculate the exact number of shares to buy for a given investment amount.

    Args:
        tickers: Comma-separated stock symbols
        total_investment: Total dollar amount to invest
        optimization: Strategy - 'max_sharpe' or 'min_volatility'

    Returns:
        JSON with shares to buy for each stock and leftover cash
    """
    import yfinance as yf
    import json
    from datetime import datetime, timedelta
    from pypfopt import EfficientFrontier, expected_returns, risk_models
    from pypfopt.discrete_allocation import DiscreteAllocation, get_latest_prices

    try:
        ticker_list = [t.strip().upper() for t in tickers.split(',')]

        if len(ticker_list) == 1:
            return json.dumps({"error": "Need at least 2 tickers for portfolio optimization"})

        end_date = datetime.now()
        start_date = end_date - timedelta(days=365*2)

        prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]

        mu = expected_returns.mean_historical_return(prices)
        S = risk_models.sample_cov(prices)

        ef = EfficientFrontier(mu, S)
        if optimization == "min_volatility":
            ef.min_volatility()
        else:
            ef.max_sharpe()

        cleaned_weights = ef.clean_weights()
        latest_prices = get_latest_prices(prices)

        da = DiscreteAllocation(cleaned_weights, latest_prices,
                                total_portfolio_value=total_investment)
        allocation, leftover = da.greedy_portfolio()

        result = {
            "optimization": optimization,
            "total_investment": total_investment,
            "allocation": {}
        }

        for ticker, shares in allocation.items():
            price = round(latest_prices[ticker], 2)
            result["allocation"][ticker] = {
                "shares": shares,
                "price_per_share": price,
                "total_investment": round(shares * price, 2)
            }

        result["leftover_cash"] = round(leftover, 2)
        result["invested_amount"] = round(total_investment - leftover, 2)

        return json.dumps(result, indent=2)
    except Exception as e:
        return json.dumps({"error": str(e)})

print("Tool 5 created: calculate_allocation")

Tool 5 created: calculate_allocation


In [21]:
@tool
def compare_strategies(tickers: str) -> str:
    """
    Compare Max Sharpe vs Min Volatility strategies for the same assets.

    Args:
        tickers: Comma-separated stock symbols

    Returns:
        JSON comparing both strategies side-by-side
    """
    import yfinance as yf
    import json
    from datetime import datetime, timedelta
    from pypfopt import EfficientFrontier, expected_returns, risk_models

    try:
        ticker_list = [t.strip().upper() for t in tickers.split(',')]

        if len(ticker_list) == 1:
            return json.dumps({"error": "Need at least 2 tickers for comparison"})

        end_date = datetime.now()
        start_date = end_date - timedelta(days=365*2)

        prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]

        mu = expected_returns.mean_historical_return(prices)
        S = risk_models.sample_cov(prices)

        # Max Sharpe
        ef1 = EfficientFrontier(mu, S)
        ef1.max_sharpe()
        w1 = ef1.clean_weights()
        r1, v1, s1 = ef1.portfolio_performance()

        # Min Volatility
        ef2 = EfficientFrontier(mu, S)
        ef2.min_volatility()
        w2 = ef2.clean_weights()
        r2, v2, s2 = ef2.portfolio_performance()

        return json.dumps({
            "max_sharpe": {
                "weights": {k: round(v*100, 2) for k, v in w1.items()},
                "expected_return": round(r1*100, 2),
                "volatility": round(v1*100, 2),
                "sharpe_ratio": round(s1, 3)
            },
            "min_volatility": {
                "weights": {k: round(v*100, 2) for k, v in w2.items()},
                "expected_return": round(r2*100, 2),
                "volatility": round(v2*100, 2),
                "sharpe_ratio": round(s2, 3)
            }
        }, indent=2)
    except Exception as e:
        return json.dumps({"error": str(e)})

print("Tool 6 created: compare_strategies")

Tool 6 created: compare_strategies


## Creating the Agent

The complete portfolio optimizer with all six tools. With `max_steps=10`, it can handle multi-step optimization queries that require calling several tools in sequence. Adding `json` to `additional_authorized_imports` lets the agent parse tool outputs within its generated code.

In [22]:
portfolio_optimizer = CodeAgent(
    tools=[
        get_historical_prices,
        optimize_max_sharpe,
        optimize_min_volatility,
        optimize_target_return,
        calculate_allocation,
        compare_strategies
    ],
    model=model,
    verbosity_level=LogLevel.INFO,
    max_steps=10,
    additional_authorized_imports=["numpy", "pandas", "json"]
)

print("Portfolio optimizer ready with 6 tools!")

Portfolio optimizer ready with 6 tools!


## Output Formatting Helper

Agent outputs are often raw JSON or dictionaries -- machine-readable but not human-friendly. This helper uses a second LLM call to format raw results into clean, readable summaries.

```
User Query → Agent → Raw Output → LLM Formatter → Clean Summary
```

Stage 1 (the agent) does the heavy lifting with tools and reasoning. Stage 2 (the formatter) is a cheap, fast call that just restructures the output for readability.

In [23]:
from openai import OpenAI

def format_response(raw_result, context=""):
    """Format raw agent output into a clean, readable summary."""
    client = OpenAI(api_key=API_KEY)

    prompt = f"""Format this portfolio analysis result into a clean, readable summary.

Context: {context}
Raw Result: {raw_result}

Provide a well-structured summary with key metrics highlighted."""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    return response.choices[0].message.content

print("format_response() helper created!")

format_response() helper created!


## Orchestration in Action

Watch the agent coordinate multiple tools to answer complex investment questions. Each query may trigger a chain of tool calls, code execution, and reasoning -- the agent decides the approach on its own.

In [24]:
raw_result = portfolio_optimizer.run("""
I want to build a tech-focused portfolio with these stocks:
AAPL, MSFT, GOOGL, NVDA, META

Find the optimal allocation using the Max Sharpe strategy.
Show me the expected performance metrics.
""")

formatted = format_response(raw_result, "Tech portfolio optimization using Max Sharpe")
print(formatted)

  prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]


### Tech Portfolio Optimization Summary

**Optimization Method:** Max Sharpe Ratio

**Portfolio Weights:**
- **Alphabet Inc. (GOOGL):** 73.08%
- **NVIDIA Corporation (NVDA):** 26.92%

**Expected Performance Metrics:**
- **Expected Annual Return:** 52.44%
- **Annual Volatility:** 29.96%
- **Sharpe Ratio:** 1.684

This optimized tech portfolio emphasizes a significant allocation towards GOOGL, aiming for a high expected return while maintaining a balanced risk profile, as indicated by the Sharpe ratio.


In [25]:
raw_result = portfolio_optimizer.run("""
I have $50,000 to invest in AAPL, MSFT, GOOGL, NVDA, META.

1. Compare Max Sharpe vs Min Volatility strategies
2. Show me exactly how many shares to buy for each strategy
3. Which would you recommend for someone 5 years from retirement?
""")

formatted = format_response(raw_result, "Strategy comparison with $50K investment")
print(formatted)

  prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]


  prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]
  prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]


### Portfolio Analysis Summary

**Investment Context:**  
- **Initial Investment:** $50,000  
- **Time Horizon:** 5 years until retirement  

**Recommended Strategy:**  
- **Strategy Name:** Minimum Volatility Strategy  
- **Rationale:** This strategy is recommended due to its lower risk profile, making it suitable for individuals nearing retirement.

**Key Metrics:**  
- **Risk Level:** Low  
- **Expected Return:** [Insert expected return percentage if available]  
- **Volatility:** [Insert volatility percentage if available]  

**Conclusion:**  
The Minimum Volatility Strategy is the optimal choice for investors approaching retirement, as it prioritizes capital preservation while still aiming for reasonable returns.


In [26]:
raw_result = portfolio_optimizer.run("""
What if I removed NVDA and added more to the safer stocks?
""", reset=False)

formatted = format_response(raw_result, "Portfolio adjustment - removing NVDA")
print(formatted)

  prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]
  prices = yf.download(ticker_list, start=start_date, end=end_date, progress=False)["Close"]


### Portfolio Analysis Summary

**Context:** Portfolio Adjustment - Removal of NVIDIA (NVDA)

**Recommendation:**  
Adopt the Min Volatility Strategy

**Rationale:**  
- The removal of NVIDIA (NVDA) aligns with a more conservative investment approach.
- The Min Volatility strategy aims to reduce overall portfolio risk while maintaining potential for returns.

**Key Metrics:**
- **Risk Reduction:** Lower volatility in portfolio performance.
- **Return Stability:** Enhanced potential for consistent returns over time.

This adjustment is designed to create a more resilient portfolio, minimizing exposure to market fluctuations while still pursuing growth opportunities.


## Exercise

Use the portfolio optimizer to analyze a portfolio of your choice. Try:

- **Different stock combinations** -- mix sectors (tech + healthcare + energy)
- **A specific dollar amount** -- use `calculate_allocation` to get exact share counts
- **Comparing strategies** for different investor profiles (aggressive vs. conservative)
- **Follow-up questions** with `reset=False` to test memory

In [None]:
raw_result = portfolio_optimizer.run("""
    YOUR PROMPT HERE
""")