
---
### LLM Agents with Reinforcement Learning
---


##### Core Imports

In [32]:
# --- Core Libraries ---
import os
import time
import random
import json
from typing import List, Dict, Optional, Callable, Tuple

# --- Data & Visualization ---
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use("seaborn-v0_8-darkgrid")  # Optional: Apply plotting style

# --- Display Utilities (Jupyter) ---
from IPython.display import display, Markdown

# --- HTTP & API Interaction ---
import requests

# --- Utilities ---
from pprint import pprint  # Optional: Pretty-print JSON responses

# --- Constants for LLM ---
OLLAMA_MODEL = "llama3"  # Adjust as needed
OLLAMA_API_URL = "http://localhost:11434/api/generate"


##### Query LLM Function

In [33]:
def query_llm(
    prompt: str,
    model: str,
    api_url: str,
    stream: bool = False,
    display_markdown: bool = True,
    header: Optional[str] = None,
) -> Optional[str]:
    """
    Query a locally or remotely hosted LLM endpoint and optionally display formatted output in a Jupyter Notebook.

    Args:
        prompt (str): The input prompt to send to the LLM.
        model (str): The name of the model to use.
        api_url (str): The API endpoint URL.
        stream (bool): Whether to stream the response.
        display_markdown (bool): Whether to display output as Markdown in Jupyter.
        header (Optional[str]): Optional header for Markdown display.

    Returns:
        Optional[str]: The response string (if display_markdown is False), else None.
    """
    payload = {
        "model": model,
        "prompt": prompt,
        "stream": stream
    }

    try:
        response = requests.post(api_url, json=payload, timeout=60)
        response.raise_for_status()

        if stream:
            result = ""
            for line in response.iter_lines():
                if line:
                    try:
                        result += json.loads(line.decode())["response"]
                    except Exception:
                        continue
        else:
            result = response.json().get("response", "").strip()

        if display_markdown:
            markdown_text = f"### {header}\n\n{result}" if header else result
            display(Markdown(markdown_text))
            return None

        return result

    except requests.exceptions.RequestException as e:
        error_msg = f"[LLM ERROR] {e}"
        if display_markdown:
            display(Markdown(f"**LLM Error:** `{e}`"))
            return None
        print(error_msg)
        return error_msg


In [34]:
# Test the LLM connection (clean display with Markdown)
test_prompt = "What is the prisoner's dilemma and why is it important in game theory?"
query_llm(
    prompt=test_prompt,
    model=OLLAMA_MODEL,
    api_url=OLLAMA_API_URL,
    header="Prisoner's Dilemma in Game Theory"
)


### Prisoner's Dilemma in Game Theory

The Prisoner's Dilemma is a classic game-theoretic paradox that illustrates the conflict between individual and group rationality. It was first introduced by Merrill Flood and Melvin Dresher in 1950, and later popularized by Albert Tucker.

Here's the scenario:

Two suspects, A and B, are arrested and interrogated separately by the police about a crime they committed together. Each has two options: to confess (C) or to remain silent (S). The payoffs for each player are as follows:

|  | A's Strategy     | B's Strategy      |
| --- | --- | --- |
| **A** | C (Confess)   | C    | S (Silent)   |
| **B** |            | 2, 2        | 3, 0        |
|          |            | 0, 3        | 1, 1        |

The payoffs are as follows:

* If both confess (C, C), they each get a punishment of 2 years in prison.
* If one confesses and the other remains silent, the confessor gets a reduced sentence of 1 year, while the silent prisoner gets a harsh sentence of 3 years.
* If both remain silent (S, S), they each get a moderate sentence of 1 year.

The dilemma arises because each player's rational choice is to defect (confess) regardless of what the other does. This is because the best outcome for each player is to minimize their own punishment by confessing, even if the other player remains silent.

However, this creates a problem: both players would rather cooperate and remain silent, but neither can trust the other to do so. As a result, both end up defecting and getting a worse outcome (2 years in prison) than if they had cooperated and remained silent (1 year each).

The Prisoner's Dilemma is important in game theory for several reasons:

1. **It highlights the tension between individual and group rationality**: The paradox shows that what is rational for an individual may not be rational for a group.
2. **It demonstrates the difficulty of cooperation**: The Prisoner's Dilemma illustrates how cooperation can break down when individuals prioritize their own self-interest over the greater good.
3. **It has implications for real-world scenarios**: The dilemma has been applied to various fields, such as economics, politics, and biology, to understand issues like international relations, trade agreements, and evolutionary strategies.

In summary, the Prisoner's Dilemma is a thought-provoking game that demonstrates how individual rationality can lead to suboptimal outcomes when cooperation is essential. It has far-reaching implications for understanding human behavior, decision-making, and the challenges of achieving collective welfare.

##### Agent with Deep Reinforcement Learning
---

In [43]:
import textwrap
from IPython.display import Markdown, display
import requests
import json

class GenerativeRLAgent:
    def __init__(
        self,
        persona: str,
        objectives: str,
        constraints: str,
        strategies: str,
        model: str = "llama3",
        api_url: str = "http://localhost:11434/api/generate"
    ):
        self.persona = self._validate_input(persona, "Persona", 600)
        self.objectives = self._validate_input(objectives, "Objectives", 600)
        self.constraints = self._validate_input(constraints, "Constraints", 600)
        self.strategies = self._validate_input(strategies, "Strategies", 600)
        self.model = model
        self.api_url = api_url

    def _validate_input(self, value: str, field: str, max_len: int) -> str:
        if not isinstance(value, str) or not (0 < len(value) <= max_len):
            raise ValueError(f"{field} must be a non-empty string with max {max_len} characters.")
        return value.strip()

    def _call_llm(self, prompt: str) -> str:
        payload = {
            "model": self.model,
            "prompt": prompt,
            "stream": False
        }

        try:
            response = requests.post(self.api_url, json=payload, timeout=60)
            response.raise_for_status()
            return response.json().get("response", "").strip() or "[No response]"
        except requests.exceptions.RequestException as e:
            return f"**LLM ERROR:** `{e}`"

    def process_input(self, input_text: str) -> str:
        input_text = self._validate_input(input_text, "Input Text", 300)

        prompt = textwrap.dedent(f"""
        You are an intelligent reinforcement learning agent operating under the following profile:

        Persona: {self.persona}
        Objectives: {self.objectives}
        Constraints: {self.constraints}
        Strategies: {self.strategies}

        The user has provided the following input:
        "{input_text}"

        Perform the following steps, and return your response using Markdown formatting:

        ### STEP 1 - Thinking
        Briefly explain what you understand from the input.

        ### STEP 2 - Unbiased Thinking
        Independently reflect on the situation without applying your persona.

        ### STEP 3 - Agent Response
        Now apply your persona, objectives, constraints, and strategies to give your optimal action.
        Respond with a clear, actionable message between **10 and 300 characters**.
        Then, summarize that response in a single sentence of **10 to 300 characters**, prefixed with:
        "**Agent Summary:**"
        """)

        result = self._call_llm(prompt)
        display(Markdown(result))

        # Extract and validate summary
        summary = ""
        for line in result.splitlines():
            if line.strip().startswith("**Agent Summary:**"):
                summary = line.split("**Agent Summary:**", 1)[-1].strip()
                break

        if not (10 <= len(summary) <= 300):
            summary = "**Agent Summary:** [ERROR] Summary not within 10–300 characters."

        display(Markdown(f"---\n\n{summary}"))
        return summary


[TESTING]

In [48]:
investor_agent = GenerativeRLAgent(
    persona="Cautious AI investor focused on long-term portfolio preservation.",
    objectives="Minimize drawdowns while capturing upside in low-volatility assets.",
    constraints="Avoid leveraged instruments and high-beta tech stocks.",
    strategies="Rebalance into stable sectors during uncertainty; increase cash exposure.",
    model="llama3",
    api_url="http://localhost:11434/api/generate"
)


investor_agent.process_input("Markets are swinging wildly after the Fed interest rate decision.")


### STEP 1 - Thinking
The input suggests market volatility following the Federal Reserve's interest rate decision. This could imply uncertainty or surprise from the Fed's action.

### STEP 2 - Unbiased Thinking
Market fluctuations are natural after significant economic events like a Fed interest rate decision. It's essential to assess the potential impact on asset classes and reevaluate the portfolio in this environment.

### STEP 3 - Agent Response
**Rebalance into stable sectors and increase cash exposure by 5% to minimize potential drawdowns while waiting for market clarity.**

**Agent Summary:** Increase cash allocation by 5% to mitigate potential losses and wait for market stabilization.

---

Increase cash allocation by 5% to mitigate potential losses and wait for market stabilization.

'Increase cash allocation by 5% to mitigate potential losses and wait for market stabilization.'