# Game-Theoretic Approach to NLP

TBD: describe

Reinforcement Learning with Human-in-the-Loop (RLHF) setup, with two agents:

- Agent A: Text generator (e.g., a simple model or GPT-based).
- Agent B: Evaluator that scores Agent A’s output based on a modifiable matrix.
- Human: Oversees the process and can manually adjust the evaluation matrix used by B.

This kind of framework could be used for dialogue training, text summarization, creative writing, or even value alignment experiments.

In [1]:
import random

# === Agent A: text generator (naive implementation) ===
class AgentA:
    def __init__(self):
        self.temperature = 1.0  # will be tuned based on feedback

    def generate(self, prompt):
        # In a real case, this could be a language model
        return f"{prompt} with extra words {random.randint(0, int(10 * self.temperature))}"

    def update(self, reward):
        # Simple feedback: increase "risk" (temperature) if rewarded
        self.temperature += 0.1 * (reward - 0.5)  # normalize around 0.5
        self.temperature = max(0.1, min(self.temperature, 2.0))


# === Agent B: evaluator using a matrix ===
class AgentB:
    def __init__(self, matrix=None):
        self.matrix = matrix or {
            "length_weight": 1.0,
            "keyword_weight": 1.0,
            "positivity_weight": 1.0,
        }

    def evaluate(self, text):
        score = 0
        score += self.matrix["length_weight"] * len(text.split())
        score += self.matrix["keyword_weight"] * ("extra" in text)
        score += self.matrix["positivity_weight"] * ("good" in text)
        return min(score / 20.0, 1.0)  # normalize to [0,1]

    def update_matrix(self, new_matrix):
        self.matrix = new_matrix


# === Human interface ===
def human_adjust_matrix(matrix):
    print("\nCurrent matrix:", matrix)
    key = input("Change which weight (length/keyword/positivity)? Leave empty to skip: ").strip()
    if key in matrix:
        new_val = float(input(f"New value for {key}: "))
        matrix[key] = new_val
    return matrix


# === Training loop ===
def training_loop():
    agent_a = AgentA()
    agent_b = AgentB()
    
    for step in range(10):
        print(f"\n--- Step {step} ---")
        prompt = "Tell me a story"
        text = agent_a.generate(prompt)
        reward = agent_b.evaluate(text)

        print(f"Generated: {text}")
        print(f"Reward: {reward:.2f}")

        agent_a.update(reward)

        user_input = input("Do you want to adjust evaluator? (y/n): ")
        if user_input.lower().startswith('y'):
            new_matrix = human_adjust_matrix(agent_b.matrix)
            agent_b.update_matrix(new_matrix)

training_loop()



--- Step 0 ---
Generated: Tell me a story with extra words 2
Reward: 0.45

--- Step 1 ---
Generated: Tell me a story with extra words 9
Reward: 0.45

--- Step 2 ---
Generated: Tell me a story with extra words 1
Reward: 0.45

--- Step 3 ---
Generated: Tell me a story with extra words 5
Reward: 0.45

--- Step 4 ---
Generated: Tell me a story with extra words 6
Reward: 0.45

--- Step 5 ---
Generated: Tell me a story with extra words 1
Reward: 0.45

--- Step 6 ---
Generated: Tell me a story with extra words 7
Reward: 0.45

--- Step 7 ---
Generated: Tell me a story with extra words 8
Reward: 0.45

--- Step 8 ---
Generated: Tell me a story with extra words 2
Reward: 0.45

--- Step 9 ---
Generated: Tell me a story with extra words 5
Reward: 0.45


## Where to start:
- How to code an agent?
- Vecotrization: 1,2 test data
- Evaluation: binary cross-entropy 
- 

- Be more concentrated on context in dialogue 

## General Idea:
A, B 

B prompts A
A answers B
B evaluates A
B prompts A (2)
A ansers
B evaluates prompt (1) and prompt (2)

We have to have a matrix of context on the side of A

Where game theory is?

Human in the loop (H), H tracks interaction between A and B, and adjustes the matrix that B has;



## Methodology

- Trajection analysis (trajection: sequence of prompts and evaluations)
- Direction alignment 
- Policy gradient algorithm
- Rejection sampling
- 