# Model-fitting

**Plan:**

1. Fit RSA models to data of Experiment 1:
    1. For each language, fit RSA+distance and RSA+person
    2. Given best fit of RSA+distance and RSA+person, do model-comparison (Bayes factors?) to see which model performs best
    3. Given best fit (distance or person) see how well the simple version of that model (i.e. without RSA reasoning) would do to fit the data: Do model-comparison between simple and RSA version.
2. Pre-register model-fitting for Experiment 2, with parameter settings that were fit to the data of Experiment 1. Repeat steps B and C above.

## Maximum likelihood estimation

**Goal:** Try to find parameter settings that maximise the likelihood of the data.
In order to do that, we first need to define a _likelihood function_.

**Likelihood function, single parameter:**

$$ \mathcal{L}(\theta | x_{1}, ..., x_{n}) = \prod_{i=1}^{n} f(x_{i} | \theta) $$


where $\theta$ is the parameter setting. 

In our case however, we have two parameters that we want to fit simultaneously. Let's call them $\theta_{1}$ and $\theta_{2}$.

**<span class="mark">Q: Likelihood function with two parameters??:</span>**

$$ \mathcal{L}(\theta_{1}, \theta_{2} | x_{1}, ..., x_{n}) = \prod_{i=1}^{n} f(x_{i} | \theta_{1}, \theta_{2}) $$


### Loading in the data:

In [3]:
import pandas as pd

data_exp_1_two_system = pd.read_csv('data/with_counts/TwoSystem.csv', index_col=0)  

data_exp_1_two_system

Unnamed: 0,Object_Position,Listener_Position,Language,Total,este,aquel,Estep,Aquelp
1,1,1,English,51,46,5,0.901961,0.098039
2,1,1,Italian,51,51,0,1.0,0.0
3,1,2,English,51,46,5,0.901961,0.098039
4,1,2,Italian,51,51,0,1.0,0.0
5,1,3,English,51,42,9,0.823529,0.176471
6,1,3,Italian,51,50,1,0.980392,0.019608
7,1,4,English,51,47,4,0.921569,0.078431
8,1,4,Italian,51,50,1,0.980392,0.019608
9,2,1,English,51,25,26,0.490196,0.509804
10,2,1,Italian,51,36,15,0.705882,0.294118


In [4]:
data_exp_1_three_system = pd.read_csv('data/with_counts/ThreeSystem.csv', index_col=0)  

data_exp_1_three_system

Unnamed: 0,Object_Position,Listener_Position,Language,Total,este,ese,aquel,Estep,Esep,Aquelp
1,1,1,Portuguese,50,47,3,0,0.94,0.06,0.0
2,1,1,Spanish,50,50,0,0,1.0,0.0,0.0
3,1,2,Portuguese,50,48,1,1,0.96,0.02,0.02
4,1,2,Spanish,50,49,1,0,0.98,0.02,0.0
5,1,3,Portuguese,50,47,3,0,0.94,0.06,0.0
6,1,3,Spanish,50,49,1,0,0.98,0.02,0.0
7,1,4,Portuguese,50,48,2,0,0.96,0.04,0.0
8,1,4,Spanish,50,49,0,1,0.98,0.0,0.02
9,2,1,Portuguese,50,11,23,16,0.22,0.46,0.32
10,2,1,Spanish,50,9,41,0,0.18,0.82,0.0


## Grid approximation

Let's start simple by doing a grid approximation, where we divide the continuous 2D space of possible parameter settings up into a discrete space. For example, let's start with all possible combinations of $\theta_{1}$ and $\theta_{2}$ from 0.1 to 0.6, in steps of 0.05.

Let's first load in the model predictions of the RSA model for this discrete grid space:

In [5]:
model_predictions = pd.read_csv('model_predictions/HigherSearchD_MW_RSA_tau_start_0.1_tau_stop_0.61_tau_step_0.05.csv')  

model_predictions

Unnamed: 0,Model,Word,Probability,Referent,Speaker_pos,Listener_pos,Listener_att,WordNo,SpeakerTau,ListenerTau
0,distance,este,1.00,0,0,0,0,2,0.1,0.1
1,distance,aquel,0.00,0,0,0,0,2,0.1,0.1
2,distance,este,1.00,0,0,0,0,3,0.1,0.1
3,distance,ese,0.00,0,0,0,0,3,0.1,0.1
4,distance,aquel,0.00,0,0,0,0,3,0.1,0.1
...,...,...,...,...,...,...,...,...,...,...
29035,pdhybrid,este,0.17,3,0,3,0,2,0.6,0.6
29036,pdhybrid,aquel,0.83,3,0,3,0,2,0.6,0.6
29037,pdhybrid,este,0.05,3,0,3,0,3,0.6,0.6
29038,pdhybrid,ese,0.40,3,0,3,0,3,0.6,0.6


For each combination of $\theta_{1}$ (=SpeakerTau) and $\theta_{2}$ (=ListenerTau), this model prediction dataframe provides the probability of producing each of the possible words (_este_ and _aquel_ in the case of the two-system model, _este_, _ese_ and _aquel_ in the case of the three-system model).

For example, to get the probability of producing _este_ under the **distance** model, with $\theta_{1} = 0.6$ and $\theta_{2} = 0.6$, given a 2-word system, when the Referent = 3, and the Listener's position = 3, we'd do the following:


In [6]:
relevant_row = model_predictions[model_predictions["Model"]=="distance"][model_predictions["Word"]=="este"][model_predictions["Referent"]==0][model_predictions["Listener_pos"]==0][model_predictions["WordNo"]==2][model_predictions["SpeakerTau"]==0.1][model_predictions["ListenerTau"]==0.1]

relevant_row

  """Entry point for launching an IPython kernel.


Unnamed: 0,Model,Word,Probability,Referent,Speaker_pos,Listener_pos,Listener_att,WordNo,SpeakerTau,ListenerTau
0,distance,este,1.0,0,0,0,0,2,0.1,0.1


And to now get only the probability value, we'd do:

In [7]:
prob_of_este = relevant_row["Probability"]

prob_of_este

0    1.0
Name: Probability, dtype: float64

Now, say we want to calculate the likelihood of the combination SpeakerTau = 0.1 + ListenerTau = 0.1, given the data of a particular language. What we have is: For each situation in the experiment (i.e. Referent and Listener position), and for each word (_este_ and _aquel_ in the case of a 2-word language, _este_, _ese_ and _aquel_ in the case of a 3-word language), we have two values:
1. The probability of that word being produced according to the model predictions
2. The proportion of that word actually being used in that situation by the participants in the experiment

<span class="mark">**Q: How do we calculate $f(x_{i} | \theta_{1}, \theta_{2})$?**</span>

Do we simply treat the model prediction probability as the parameter of a binomial distribution, and we just take the pmf or pdf of the observed count given this binomial distribution?


<span class="mark">**Q: Can we treat each word choice independently? As a binomial process where we either choose the word (= 1) or not (= 0)? Regardless of whether we have a 2-word system or a 3-word system?**</span>


<span class="mark">**Q: Are our obervations $x_{1}, ..., x_{i}$ independent?**</span>

That is: In the experiment, does each participant just get to see each situation once, and choose a word to produce? Such that the count of _este_ uses in a given situation is not dependent on the count of _aquel_ uses in that same situation?

To get the relevant counts from the data, for English for example, we'd have to do the following:

In [8]:
relevant_counts_row = data_exp_1_two_system[data_exp_1_two_system["Object_Position"] == 1][data_exp_1_two_system["Listener_Position"] == 1][data_exp_1_two_system["Language"] == "English"]

relevant_counts_row


  """Entry point for launching an IPython kernel.


Unnamed: 0,Object_Position,Listener_Position,Language,Total,este,aquel,Estep,Aquelp
1,1,1,English,51,46,5,0.901961,0.098039


In [9]:
este_count = relevant_counts_row["este"]

este_count

1    46
Name: este, dtype: int64

In [10]:
import numpy as np
from scipy.stats import binom, multinomial

def calc_multinom_pmf(pd_model_predictions, pd_data, model, language, object_pos, listener_pos, speaker_tau, listener_tau):
    if language == "English" or language == "Italian":
        WordNo = 2
        words = ["este", "aquel"]
    elif language == "Portuguese" or language == "Spanish":
        WordNo = 3
        words = ["este", "ese", "aquel"]
    probs_per_word = np.zeros((WordNo))   
    counts_per_word = np.zeros((WordNo))
    for i in range(len(words)):
        word = words[i]
        print('')
        print("word is:")
        print(word)
        model_prediction_row = pd_model_predictions[model_predictions["Model"]==model][model_predictions["Word"]==word][model_predictions["Referent"]==object_pos][model_predictions["Listener_pos"]==listener_pos][model_predictions["WordNo"]==WordNo][model_predictions["SpeakerTau"]==speaker_tau][model_predictions["ListenerTau"]==listener_tau]
        print("model_prediction_row is:")
        print(model_prediction_row)
        model_prediction_prob = model_prediction_row["Probability"]
        print("model_prediction_prob is:")
        print(model_prediction_prob)
        probs_per_word[i] = model_prediction_prob
        # Below is object_pos+1 and listener_pos+1, because in the model_predictions dataframe it starts counting 
        # from 0, but in the experimental data dataframe it starts counting from 1.
        data_count_row = pd_data[data_exp_1_two_system["Object_Position"] == object_pos+1][data_exp_1_two_system["Listener_Position"] == listener_pos+1][data_exp_1_two_system["Language"] == language]
        print('')
        print("data_count_row is:")
        print(data_count_row)
        data_count = data_count_row[word]
        print("data_count is:")
        print(data_count)
        counts_per_word[i] = data_count
        total = data_count_row["Total"]
        print("total is:")
        print(total)
    multinom_pmf = multinomial.pmf(counts_per_word, n=total, p=probs_per_word)
    multinom_logpmf = multinomial.logpmf(counts_per_word, n=total, p=probs_per_word)
    return multinom_pmf, multinom_logpmf


multinom_pmf, multinom_logpmf = calc_multinom_pmf(model_predictions, data_exp_1_two_system, "distance", "English", 1, 0, 0.4, 0.4)
print('')
print('')
print("multinom_pmf is:")
print(multinom_pmf)
print('')
print("multinom_logpmf is:")
print(multinom_logpmf)



word is:
este
model_prediction_row is:
          Model  Word  Probability  Referent  Speaker_pos  Listener_pos  \
17285  distance  este         0.94         1            0             0   

       Listener_att  WordNo  SpeakerTau  ListenerTau  
17285             0       2         0.4          0.4  
model_prediction_prob is:
17285    0.94
Name: Probability, dtype: float64

data_count_row is:
   Object_Position  Listener_Position Language  Total  este  aquel     Estep  \
9                2                  1  English     51    25     26  0.490196   

     Aquelp  
9  0.509804  
data_count is:
9    25
Name: este, dtype: int64
total is:
9    51
Name: Total, dtype: int64

word is:
aquel
model_prediction_row is:
          Model   Word  Probability  Referent  Speaker_pos  Listener_pos  \
17286  distance  aquel         0.06         1            0             0   

       Listener_att  WordNo  SpeakerTau  ListenerTau  
17286             0       2         0.4          0.4  
model_prediction_pro



In [11]:
model = "distance"  # can be set to either "distance" or "person"
language = "English"  # can be set to "English", "Italian", "Portuguese" or "Spanish"
object_positions = [0, 1, 2, 3]  # array of all possible object (= referent) positions
listener_positions = [0, 1, 2, 3]  # array of all possible listener positions
speaker_tau = 0.4
listener_tau = 0.4

def product_logpmf_over_situations(object_positions, listener_positions, pd_model_predictions, pd_data, model, language, speaker_tau, listener_tau):
    logproduct = np.log(1.0)  # The first probability should be multiplied with 1.0, which is equivalent to 0.0 in log-space
    for object_pos in object_positions:
        print('')
        print('')
        print("object_pos is:")
        print(object_pos)
        for listener_pos in listener_positions:
            print('')
            print("listener_pos is:")
            print(listener_pos)
            multinom_pmf, multinom_logpmf = calc_multinom_pmf(pd_model_predictions, pd_data, model, language, object_pos, listener_pos, speaker_tau, listener_tau)
            print("multinom_pmf is:")
            print(multinom_pmf)
            print("multinom_logpmf is:")
            print(multinom_logpmf)
            print("np.exp(multinom_logpmf) is:")
            print(np.exp(multinom_logpmf))
            logproduct += multinom_logpmf  # multiplication in probability space is equivalent to addition in log-space
    return logproduct
            
    
logproduct = product_logpmf_over_situations(object_positions, listener_positions, model_predictions, data_exp_1_two_system, model, language, speaker_tau, listener_tau)
print('')
print('')
print("logproduct is:")
print(logproduct)
print("np.exp(logproduct) is:")
print(np.exp(logproduct))



object_pos is:
0

listener_pos is:
0

word is:
este
model_prediction_row is:
          Model  Word  Probability  Referent  Speaker_pos  Listener_pos  \
17280  distance  este          1.0         0            0             0   

       Listener_att  WordNo  SpeakerTau  ListenerTau  
17280             0       2         0.4          0.4  
model_prediction_prob is:
17280    1.0
Name: Probability, dtype: float64

data_count_row is:
   Object_Position  Listener_Position Language  Total  este  aquel     Estep  \
1                1                  1  English     51    46      5  0.901961   

     Aquelp  
1  0.098039  
data_count is:
1    46
Name: este, dtype: int64
total is:
1    51
Name: Total, dtype: int64

word is:
aquel
model_prediction_row is:
          Model   Word  Probability  Referent  Speaker_pos  Listener_pos  \
17281  distance  aquel          0.0         0            0             0   

       Listener_att  WordNo  SpeakerTau  ListenerTau  
17281             0       2         0




data_count_row is:
    Object_Position  Listener_Position Language  Total  este  aquel     Estep  \
13                2                  3  English     51    30     21  0.588235   

      Aquelp  
13  0.411765  
data_count is:
13    30
Name: este, dtype: int64
total is:
13    51
Name: Total, dtype: int64

word is:
aquel
model_prediction_row is:
          Model   Word  Probability  Referent  Speaker_pos  Listener_pos  \
17326  distance  aquel         0.06         1            0             2   

       Listener_att  WordNo  SpeakerTau  ListenerTau  
17326             0       2         0.4          0.4  
model_prediction_prob is:
17326    0.06
Name: Probability, dtype: float64

data_count_row is:
    Object_Position  Listener_Position Language  Total  este  aquel     Estep  \
13                2                  3  English     51    30     21  0.588235   

      Aquelp  
13  0.411765  
data_count is:
13    21
Name: aquel, dtype: int64
total is:
13    51
Name: Total, dtype: int64
multino




          Model  Word  Probability  Referent  Speaker_pos  Listener_pos  \
17295  distance  este          0.0         3            0             0   

       Listener_att  WordNo  SpeakerTau  ListenerTau  
17295             0       2         0.4          0.4  
model_prediction_prob is:
17295    0.0
Name: Probability, dtype: float64

data_count_row is:
    Object_Position  Listener_Position Language  Total  este  aquel     Estep  \
25                4                  1  English     51    10     41  0.196078   

      Aquelp  
25  0.803922  
data_count is:
25    10
Name: este, dtype: int64
total is:
25    51
Name: Total, dtype: int64

word is:
aquel
model_prediction_row is:
          Model   Word  Probability  Referent  Speaker_pos  Listener_pos  \
17296  distance  aquel          1.0         3            0             0   

       Listener_att  WordNo  SpeakerTau  ListenerTau  
17296             0       2         0.4          0.4  
model_prediction_prob is:
17296    1.0
Name: Probabili



<span class="mark">**Q: What to do if multinomial pmf = 0.0?**</span>

## Monte Carlo

## Model Comparison