In [124]:
import os
from cmdstanpy import CmdStanModel
import numpy as np
import pandas as pd

# Homework 1: Bayesian Cognitive and Rational Speech Act models

This homework assignment is to be completed in groups. It is due on November 27, 2025 (midnight). Please upload *all files you created or modified* to the homework folder of your group in studIP.

Group number:

Names:

*General note: It is permitted to use AI tools for coding. Please refer to the uploaded manual `AI_Tools_Guidelines` for recommended ways how to use AI to advance your studies in a way that supports your learning. That means that you should not be satisfied if an AI tool hands you a working version of your code, but that you should put in effort to understand how exactly the problem is solved. Another note of caution: What might work for large programming languages like Python, does not necessarily work for Stan. Check your code carefully and do NOT blindly trust AI.*

## Introduction
During the past weeks, you have learned how Bayesian inference works and how it can be used in Bayesian cognitive models. You also learned about a specific type of Bayesian models that can be used to model pragmatic language understanding and production, the Rational Speech Act models. The goal of this homework assignment is for you to learn how to implement Bayesian models in Stan and, specifically, how to implement RSA models in Stan. A special focus will be on the different use cases and evaluation methods of RSA models.

## 1) Stan modeling (8 points)

1.1) In the file `simple_model.stan`, you will find a simple Stan model. Describe its implementation, relating it to the knowledge you gained about the conventions for coding models in Stan. (4 points)

1.2) You will notice that the model does not compile. Fix the problems and explain what you did. (4 points)

## 2) Bayesian cognitive models (10 points)
Think of a use case for `simple_model.stan` in the scope of Bayesian cognitive modeling. Describe the model while answering the following questions:

2.1) What cognitive capacity can be explained by this model? (2 points) 

2.2) What is the purpose and function of this model? (3 points)

2.3) At which level of analysis does it model this cognitive capacity and why? (3 points)

Overall coherence gives another 2 points.

## 3) RSA modeling (82 points)
The purpose of the following model is to explain the use of overinformative referring expressions in pragmatic communication. In referential communication, the speaker’s task is to produce a referring expression that allows a listener to identify the target in the context. Consider the context below, where the target is the small blue pin. A referring expression including a size adjective (the small pin) is strictly speaking sufficient for uniquely establishing reference to the target, yet speakers often “overmodify” with color, producing referring expressions like the small blue pin. This overmodification phenomenon is what the model is intended to capture.

<img src="img/size-sufficient.png" width="400"/>

### 3.1) Vanilla RSA (20 points)
In the file `vanilla_rsa.stan`, you find an RSA model of the production of referring expressions, based on the vanilla RSA model of Frank & Goodman (2012) that we discussed in class.

3.1.1) Provide informative comments in the file `vanilla_rsa.stan`. (4 points)

3.1.2) You will notice that the parameters and model blocks are empty. Why is that? Go through the following code and inspect the model's behavior. Look at the stan variables that are included in the fitted model. (3 points)

In [131]:
# compile model
stan_file = os.path.join('stan', 'vanilla_rsa.stan')
rsa_model = CmdStanModel(stan_file=stan_file)

In [127]:
# define input data
states = ["big_blue", "big_red", "small_blue"]
utterances = [
    "big", "small", "blue", "red"
]
n_states = len(states)
n_utterances   = len(utterances)

# build meaning_matrix[u, s]
meaning_matrix = np.zeros((n_utterances, n_states), dtype=int)
for u, utterance in enumerate(utterances):
    for s, state in enumerate(states):
        # literal meaning maps to true iff the utterance string appears in the state string
        # Stan cannot handle booleans, so we need to work with integers here
        meaning_matrix[u, s] = int(utterance in state)

# parameters - change them here
alpha = 1.0
cost_weight = 1.0

# cost function
cost_dict = {
    "big": 0.0,
    "small": 0.0,
    "blue": 0.0,
    "red": 0.0,
}
cost = np.array([cost_dict[utterance] for utterance in utterances])

# prepare Stan data as dictionary
stan_data = {
    "S": n_states,
    "U": n_utterances,
    "meaning_matrix": meaning_matrix.tolist(), # Stan cannot handle numpy arrays       
    "cost": cost.tolist(),                     # or dictionaries
    "alpha": alpha,
    "cost_weight": cost_weight
}

In [None]:
fit = rsa_model.sample(stan_data, show_console=True, chains=1, iter_warmup=0, adapt_engaged=False, iter_sampling=1)

3.1.3) Are the outputs in line with what you would expect given your knowledge about pragmatic communication and overinformative referring expressions?
Add complex utterances to the model (i.e., utterance consisting of a size and color adjective) and inspect the output again. The meaning of a complex two-word utterance is defined with intuitive intersective semantics: $$\mathcal{L}(u_{\text{complex}}, o)=\mathcal{L}(u_{\text{size}},o)\times\mathcal{L}(u_{\text{color}},o)$$ (6 points)

3.1.4) Play around with the rationality and cost weight parameters. How do they affect the model output? (4 points)

3.1.5) Adapt the utterance cost in a way that achieves a preference for overinformative referring expressions. (2 points)

3.1.6) Adapt the utterance cost in a way that seems most natural to you. (1 point)

### 3.2) Relaxed semantics (20 points)
It seems that our intuitions do not align well with the model. Let's use continuous rather than boolean semantics to see whether this can solve our problem. In the following, you need to adapt the RSA model and input data in a way that implements continuous semantics. The only change will be that the lexicon, or meaning matrix, should return real values instead of true or false: $$\mathcal{L}(u,o)\in [0,1] \subset \mathbb{R}$$
This approach captures the intuition that an object is not unambiguously big or blue, but rather that objects can count as big or blue to varying degrees.

3.2.1) Build a meaning matrix that captures the relaxed semantics with two new parameters size_semantics $x_\text{size}$ and color_semantics $x_\text{color}$. When an object $o$ is in the extension of a size adjective under the Boolean semantics defined above, take $\mathcal{L}(u,o)=x_\text{size}$, else $\mathcal{L}(u,o)=1-x_\text{size}$. The semantics are defined analogously for color. (6 points)
3.2.2) Run the model with alpha = 30, size_semantics = 0.8 and color_semantics = 0.99. Inspect the model outputs. (4 points)
3.2.3) Visualize the results of varying values for size_semantics and color_semantics, pit them against each other and interpret them. (6 points)
3.2.4) Van Gompel et al. (2019) found that speakers use overinformative referring expressions in about 80% of the trials that look like the one above, where size is sufficient to mention. What about contexts where color is sufficient to mention? Construct a context where color is sufficient to mention and interpret the output. (4 points)

### 3.3) Model evaluation by comparison to experiment data (42 points)
3.3.1) Create a new file `sem_rsa.stan`. Adapt the vanilla RSA model in a way that allows you to infer all free parameters instead of specifying them beforehand. Condition the model on the observed production data (`data/data_exp1.csv`) and integrate over the free parameters. Preprocess the observed data in a way that you see fit for the modeling purpose. Assume uniform priors for each parameter. Use the generated quantities block in your Stan model to generate the posterior predictive distribution (read up [Stan documentation](https://mc-stan.org/docs/stan-users-guide/posterior-prediction.html) for this). Choose an appropriate number of iterations for warm up and sampling from the posterior. (16 points)
3.3.2) Diagnose the model convergence and take actions if necessary. (4 points)
3.3.3) Interpret a summary of the fitted model. (6 points)
3.3.4) Correlate the model's posterior predictive distribution for overinformative utterance probabilities with the empirical data to assess and interpret model fit to the data. (8 points)
3.3.5) Bonus: Introduce separate cost parameters for size and color. (6 bonus points)
3.3.6) Interpret and discuss your findings. (8 points)

## X) Reflection (no points, but mandatory)

Reflect on your group work. What went well? What did not go well?

Please note down the group members' team roles anonymously and reflect on how you filled this role.