# Some general ideas related to game theory

## Poker Problem 

Kuhn Poker: 1,2,3 -> 

Two pleyer
$$a_{}$$

## Adversarial Generation of Short Prompts for Improving Feedback Quality

**Motivation**

User feedback (reviews, evaluations, surveys) is often affected by cognitive biases, emotional tone, and low informational density. Respondents tend to provide vague, emotionally charged, or socially biased answers, which reduces the usefulness of feedback for decision-making and analysis. This problem is especially pronounced in educational evaluations, product reviews, and employee assessments.


How can short textual prompts be adaptively generated to maximize the information content and reduce bias in customer feedback, under adversarial evaluation?

feedback quality metrics
- Information content I(r) (number of distinct product aspects mentioned, or entropy of aspect distribution, or coverage of predefined aspect set)
- Bias metric B(r) (sentiment extremeness (absolute sentiment score), demographic skew (if groups exist), lexical subjectivity score.)
- Utility U(r) (Human usefulness. Collect later via human ratings 1‚Äì5. At the beginning, can be ignored)

GAME:
Generator (G) produces a short prompt p, Evaluator (E) evaluates the resulting feedback r.
G: finite set of prompt types (templates); E: fixed scoring function (at first)

Payoff: $u_{G}‚Äã(p)=\alpha I(r)‚àí\beta B(r)$
Evaluator payoff: $uE‚Äã(p)=‚àíuG‚Äã(p)$ -- this is for now a zero-sum game

The game is between a Prompt Generator and an Evaluator of feedback quality, where the generator tries to elicit informative, low-bias feedback, and the evaluator penalizes bias and low information.

BASELINE PIPELINE:
- Create a small prompt set - pure strategies
- Collect responses (may simulate using LLM)
- NLP analysis: extract aspects (rule-based or simple classifier), compute sentiment score.

payoff matrix

Now: eliminate dominated prompts, compute mixed strategies, identify equilibrium distributions.

FURTHER:
adversarial learning
- Parameterize the prompt - Instead of fixed templates, represent prompt as (length,specificity,number of constraints,presence of examples) -> vector $\theta$
- Train generator, using RL (action = generate prompt parameters, reward = $u_{G}$, environment = evaluator + NLP pipeline.)
- 

comparison of prompts before and after optimization, statistical improvement in I and reduction in B, robustness analysis across datasets.

## Adversarial Generation of Short Prompts for Improving Feedback Quality

customer feedback in textual form

$$R=\{all-possible-feedback-texts\}$$

$$P=\{all possible short prompts\}$$

Given a prompt $p\in P$, the user (environment) generates a feedback text:

$$r‚àºP(‚ãÖ‚à£p),r\in R$$

The distribution is unknown and stochastic

In [1]:
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()
SECRET_KEY = os.getenv("OPEN_API_KEY")

PROMPTS = [
    "What did you like about the product?",
    "What problems did you face while using the product?",
    "Please describe your experience with the product.",
    "What should we improve in the product?",
    "How satisfied are you with the product and why?"
]

client = OpenAI(api_key=SECRET_KEY)

def generate_feedback(prompt):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a customer giving honest feedback."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.9, # may be try variate temperature
        max_tokens=120
    )
    return response.choices[0].message.content



Let $A=\{ a_{1}, ...,a_{k} \}$ be a predefined set of product aspects.

Define an aspect extraction function: $\phi :R\rightarrow 2^{A}$

Information content: $I(r)=‚à£\phi (r)‚à£$

Lexical divercity:

$$I(r) = \frac{unique tokens}{total tokens}$$

DEFINE BETTER FUNCTION LATER!

Bias metric:

$$B(r)=|s(r)|$$

Where 
$s(r)\in [‚àí1,1] $ - sentiment polarity score

In [2]:
import re
from textblob import TextBlob


def information_metric(text):
    tokens = re.findall(r"\w+", text.lower())
    if len(tokens) == 0:
        return 0.0
    return len(set(tokens)) / len(tokens)

def bias_metric(text):
    sentiment = TextBlob(text).sentiment.polarity
    return abs(sentiment)

Empirical expectation (Monte Carlo)

Since feedback is stochastic, we work with expectations.

$$E[B‚à£p]=E_{r‚àºP(‚ãÖ‚à£p)}‚Äã[B(r)]$$
$$E[I‚à£p]=E_{r‚àºP(‚ãÖ‚à£p)}‚Äã[I(r)]$$


In [3]:
import numpy as np

def estimate_expectations(prompt, n_samples=10):
    infos, biases = [], []

    for i in range(n_samples):
        r = generate_feedback(prompt)
        infos.append(information_metric(r))
        biases.append(bias_metric(r))

    return np.mean(infos), np.mean(biases)

**Payoff function (parametrized)**

Player G (Generator) chooses prompt p

Player E (Evaluator) enforces quality constraints

Strategy spaces

Generator strategies:

$p\in P$

Evaluator strategies:

$$\lambda = (\alpha, \beta)\in \mathbb{R}_{+}^{2}$$
	‚Äã
Evaluator controls the penalty weights.

Generator payoff:

$$u_G‚Äã(p,Œª)=Œ±E[I‚à£p]‚àíŒ≤E[B‚à£p]$$

Evaluator payoff:

$$u_E‚Äã(p,Œª)=‚àíu_G‚Äã(p,Œª)$$

**Static game with finite strategies:**

Finite prompts: $P={p_1‚Äã,‚Ä¶,p_n‚Äã}$

Define payoff: $U_i‚Äã=Œ±E[I‚à£p_i‚Äã]‚àíŒ≤E[B‚à£p_i‚Äã]$

$p_{i}$ is dominated by $p_{j}$ if $U_j‚Äã‚â•U_i‚Äã$. Dominated prompts can be eliminated.

We seek a minimax equillibrium $\max_{\pi} \min_{\alpha, \beta} U_{G}(\pi, \lambda)$


THINK ABOUT LATER: Dynamic learning formulation, 
Let prompts be generated by parameters: $p=g(\theta),\theta \in \mathbb{R}^{d}$

Constraints on parameters: $\alpha+\beta=1$, $u_{E}‚Äã=‚àíu_{G}‚Äã‚àíc(\beta)$ ($c(\beta)=\gamma \beta^{2}$)

In [4]:
def payoff(prompt, alpha=1.0, beta=1.0, n_samples=10):
    I_hat, B_hat = estimate_expectations(prompt, n_samples)
    return alpha * I_hat - beta * B_hat

Playing the game (best response)

The generator chooses the best prompt given evaluator parameters.

THINK ABOUT LATER: Transformer as Prompt Generator

Prompt as a function of parameters, instead of choosing a prompt directly, define: $p_{\theta} =G_{\theta}‚Äã(z)$

In [None]:
def best_prompt(prompts, alpha, beta):
    scores = {}
    for p in prompts:
        scores[p] = payoff(p, alpha, beta)
    return max(scores, key=scores.get), scores

best, all_scores = best_prompt(PROMPTS, alpha=1.0, beta=0.7)

print("Best prompt:", best)
for p, s in all_scores.items():
    print(f"{p[:40]}... -> {s:.4f}")

# BETAS = [0.2, 0.5, 1.0, 2.0]

# for beta in BETAS:
#     best, _ = best_prompt(PROMPTS, alpha=1.0, beta=beta)
#     print(f"beta={beta:.1f} -> best prompt: {best}")

Best prompt: What problems did you face while using the product?
What did you like about the product?... -> 0.5290
What problems did you face while using t... -> 0.7067
Please describe your experience with the... -> 0.6287
What should we improve in the product?... -> 0.6033
How satisfied are you with the product a... -> 0.5855


: 

TODO
- may be real scenario of application (online shop)
- Named entity recognition (nlp topic) for I(r), think B(r) (fuzzy logic)
- Reinforcement learning (contextual bandit) for players to choose parameters 
- Dynamic learning formulation, Let prompts be generated by parameters: $p=g(\theta),\theta \in \mathbb{R}^{d}$ (transformers)
- qualatative metrics of approach

## Improved Game

- improve calculation of functions I(r) and B(r)
- 



I(r) calculation: $A$ - set of aspects, $R$ - response space

$$aspects: R -> 2^{A}$$

$I_{a}(r) = \frac{|aspects(r)|}{|A|}$

In [8]:
from openai import OpenAI
from dotenv import load_dotenv
import os
import re
from textblob import TextBlob
import spacy 
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

nlp = spacy.load("en_core_web_sm")

load_dotenv()
SECRET_KEY = os.getenv("OPEN_API_KEY")

PROMPTS = [
    "What did you like about the product?",
    "What problems did you face while using the product?",
    "Please describe your experience with the product.",
    "What should we improve in the product?",
    "How satisfied are you with the product and why?"
]

client = OpenAI(api_key=SECRET_KEY)

def generate_feedback(prompt):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a customer giving honest feedback."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.9, # may be try variate temperature
        max_tokens=120
    )
    return response.choices[0].message.content


# utility functions definition
ASPECTS = {
    "price": ["price", "cost", "expensive", "cheap"],
    "delivery": ["delivery", "shipping", "arrived"],
    "quality": ["quality", "broken", "durable"],
    "support": ["support", "service", "help"],
    "usability": ["easy", "difficult", "interface", "ui"],
    "performance": ["fast", "slow", "lag"]
}
SUGGESTION_PATTERNS = [
    "should", "could", "recommend", "improve", "add", "fix" # "it would be better"
]
EMOTION_WORDS = {"amazing", "terrible", "horrible", "fantastic", "worst"}

def information_metric_aspect(text):
    text = text.lower()
    doc = nlp(text)
    lemmas = {token.lemma_ for token in doc if token.is_alpha} # lemmatization

    found = set()
    
    for aspect, keywords in ASPECTS.items():
        keyword_lemmas = {nlp(k)[0].lemma_ for k in keywords}
        if lemmas.intersection(keyword_lemmas):
            found.add(aspect)
    
    return len(found), found


def information_metric_actionability(text):
    text = text.lower()
    doc = nlp(text)
    lemmas = {t.lemma_ for t in doc if t.is_alpha}

    found = []
    suggestion_lemmas = {nlp(s)[0].lemma_ for s in SUGGESTION_PATTERNS}
    for l in suggestion_lemmas:
        if l in lemmas:
            found.append(l)
    
    return len(found)/len(suggestion_lemmas), found

# number of distinct entities mentioned
def information_metric_ner(text):
    doc = nlp(text)
    return len(doc.ents) / len(doc), [e.text for e in doc.ents]

def information_metric_combined(text):
    n = 2
    w = [0.4, 0.3, 0.3]
    aspect_metric, _ = information_metric_aspect(text)
    actionability_metric, _ = information_metric_actionability(text)
    ner_metric, _ = information_metric_ner(text)

    return ((aspect_metric**n)*w[0] + (actionability_metric**n)*w[1] + (ner_metric**n)*w[2])**(1/n)


def bias_metric_ner(text):
    doc = nlp(text.lower())
    lemmas = {t.lemma_ for t in doc if t.is_alpha}

    found = []
    emotion_lemmas = {nlp(s)[0].lemma_ for s in EMOTION_WORDS}
    for l in emotion_lemmas:
        if l in lemmas:
            found.append(l)
    
    return len(found) / len(emotion_lemmas), found

nltk.download("vader_lexicon")
sia = SentimentIntensityAnalyzer()
def bias_sentiment_intencity(text):
    score = sia.polarity_scores(text)["compound"]
    return abs(score)


def bias_metric_combined(text):
    n = 2
    w = [0.5, 0.5]
    ner, _ = bias_metric_ner(text)
    polarity = bias_sentiment_intencity(text)

    return ((ner**n)*w[0] + (polarity**n)*w[1])**(1/n)


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\vmelnyk2\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [9]:
test_response = """"Hey there!
I bought the EchoHome Smart Lamp about three weeks ago, mostly because I was tired of harsh overhead lighting while reading at night. I wasn‚Äôt sure what to expect ‚Äî sometimes ‚Äúsmart‚Äù gadgets feel more complicated than they‚Äôre worth ‚Äî but this one? Totally different story.
Setting it up took maybe five minutes with the app, and since then it‚Äôs become my favorite part of my evening routine. I love that I can adjust the color temperature from bright white for work to a soft amber for winding down. The sunset fade-out feature actually helps me fall asleep ‚Äî no joke!
My cat is also weirdly obsessed with it (she sits under it like it‚Äôs her personal sun), so that‚Äôs an unexpected bonus. üò∏
If I had one tiny wish: I‚Äôd love a physical remote control option for when my phone isn‚Äôt nearby. But honestly, it‚Äôs such a small thing compared to how much I enjoy using it.
Thanks for making a product that feels both thoughtful and genuinely useful. It‚Äôs the little things that make a home cozy, right?
Cheers, Alex"""

print(information_metric_aspect(test_response))
print(information_metric_actionability(test_response))
print(information_metric_ner(test_response))
print(information_metric_combined(test_response))

print(bias_metric_ner(test_response))
print(bias_sentiment_intencity(test_response))
print(bias_metric_combined(test_response))


(1, {'support'})
(0.0, [])
(0.022222222222222223, ['about three weeks ago', 'night', 'five minutes', 'evening', 'one'])
0.6325726425859312
(0.0, [])
0.9958
0.7041369327055641
