# Main goal: 
Fit a model for multisensory localization as a weighted average of visual and auditory information in a first round of experiments, then test it in a second round of experiment. The underlying assumption is that the mulitmodel localization can be modelled as some weighted average of the purely auditory/visual inputs via:

$$ L^* = w_v L^*_v +  w_a L^*_a  $$

where 
$w_v, w_a$  are the weights for the auditory/visual input, 
$L^*_w, L^*_a$ are the location estimates based on the purely visual/auditory input and 
$L^*$ is the location estimate based on the combined input ('dependant variable'? something that we want to find out here - even though the weights are the final goal)

## First round of experiments: purely visual & purely auditory
__Goals__:
- estimate weights $w_a, w_v$ for the auditory and visual information in the model of the multisensory setup
- find location estimates $L_a^*, L_v^*$ based on the purely auditory/visual inputs 

__Method__:
This will be achieved via  fitting a cumulative normal distribution to fit the datapoints we see - using the method of *Maximum Likelihood Estimation* (MLE). Recall that a normal distribution is defined by it's variance $\sigma^2$ and it's mean $\mu$. We thus want to estimate 

$$ \widehat{\sigma}^2, \widehat{\mu} = argmax_{\mu, \sigma} \prod_{t=1}^{T} p_t^{r_t} (1-p_t)^{1-r_t}
= argmax_{\mu, \sigma} \prod_{t=1}^{T} (\frac{1}{2}(1+erf(\frac{x-\mu}{\sqrt(2\sigma^2)})) )^{r_t} (1-(1+erf(\frac{x-\mu}{\sqrt(2\sigma^2)})))^{1-r_t}
$$

*TODO: annoying work of explaining all the variables and why a Bernoulli probability is the right choice*

Battaglia et al. suggested the following modification to the "classical MLE" model: transform it into a Bayesian model by adding priors $p(\sigma)$ and $p(\mu)$. The reason is the (evolutionary?) bias towards visual input as a more reliable source that the MLE model doesn't capture. In fact, some authors claim that this visual dominance is even so strong that the visual sensory input will completely dominate the auditory one (*visual capture theroy*). We thus refine our model into 

$$ \widehat{\sigma}^2, \widehat{\mu} = argmax_{\mu, \sigma} \prod_{t=1}^{T} (\frac{1}{2}(1-erf(\frac{x-\mu}{\sqrt(2\sigma^2)})) )^{r_t} (1-(1-erf(\frac{x-\mu}{\sqrt(2\sigma^2)})))^{1-r_t} \cdot p_{\sigma^2}(\sigma^2) \cdot p_{\mu}(\mu)$$

The prior for $\mu$ is just a uniform distribution. More interestingly, the prior for $\sigma^2$ is an *inverse gamma distribution* that we model in a way that it favors small variances (corresponding to __reliable__ sensory input) 
[*TODO: explain why and how this is, write out mathzzz*].

needed python functionalities:
    - argmax / maximum likelihood
    - inverse gamma distribution
    - uniform distribution
    
possibly useful:
    - stats.bernoulli

In [3]:
from scipy import stats, optimize
from math import erf, sqrt, exp

In [4]:
# r should be a list of {0,1} with the outcomes
# phi a cumulative normal distribution 

def phi(x, mu, sigma_sq):
    #'Cumulative distribution function for the standard normal distribution'
    # -> are we sure the x don't refer to the visual angles instead? YES I THINK SO
    return (1.0 + erf(x - mu / sqrt(2.0*sigma_sq))) / 2.0

In [5]:
def round1_likelihood(r, mu, sigma_sq):
# likelihood(R|mu, sigma) = 
    likelihood = 1
    # model the product
    for t in range(len(r)):
        likelihood = likelihood * stats.bernoulli.pmf(r[1][t], phi(r[0][t], mu, sigma_sq), loc=0) 
        # TODO: *p(mu)*p(sigma_sq) - i.e. go from MLE to Bayesian approach
    return likelihood

In [6]:
# a mockup example for now. I THINK THERE IS AN ERROR IN THEIR MODEL DESCRIPTION! 
# you need to take the positional information into account, too! 
# so r consists of two pieces of information: degree and angle
degrees = [-4.5, -3. -1.5, 0 , 1.5, 3, 4.5]
answers = [0, 0, 0, 0, 1, 1, 1]
r = (degrees, answers)

In [7]:
def round1_argmin(likelihood):
    result = optimize.minimize(lambda x: 1-likelihood(r, *x), (0,2))['x']
    mu = result[0]
    sigma2 = result[1]
    return mu, sigma2

In [11]:
mu, sigma2 = round1_argmin(round1_likelihood)
print(mu, sigma2)
# YAY!

0.0 2.0


In [12]:
# get the weights from the sigmas we calculated
def get_weight_from_variance(variance, other_variance):
    return (1/variance)/((1/variance + 1/other_variance))

## Second round of experiments:
Compare the weights obtained in the second experiment (and thus the model for multisensory integration) and compare it with empirical results. The empirical weights are again found via a maximum likelihood estimation similar to the monosensory trials, but with a modified probability $p_t$:
\begin{align}
\widehat{w}_a, \widehat{w}_v &= argmax_{w_a, w_v} \prod_{t=1}^{T} p_t^{r_t} (1-p_t)^{1-r_t} \\
&= argmax_{w_a, w_v} \prod_{t=1}^{T} (\frac{1}{1 + exp[-(L_c - L_s)/\tau]})^{r_t} (1-(\frac{1}{1 + exp[-(L_c - L_s)/\tau])}^{1-r_t}) \\
&= argmax_{w_a, w_v} \prod_{t=1}^{T} (\frac{1}{1 + exp[-(w_vL_v^c + w_aL_a^c - (w_vL_v^s + w_aL_a^s))/\tau]})^{r_t} (1-(\frac{1}{1 + exp[-(w_vL_v^c + w_aL_a^c - (w_vL_v^s + w_aL_a^s)))/\tau])}^{1-r_t})
\end{align}

*NOTE: I am still ab bit confused about their explanation with 'location estimates' here. I think we probably just have to use the actual locations because we don't really have any location estimates other than the mean of the two distributions fitted in the first round which should just be very close to zero. Is this just a mistake in the paper?
Related question: the visual and audio location is the same in the comparison stimulus then, right? Then L_c collapses  (since the weights sum to 1) Also it makes no sense to optimize for both of the weights since they are constrained to sum to one.*
With these assumptions the equation would simplify to 

$$
argmax_{w_a, w_v} \prod_{t=1}^{T} (\frac{1}{1 + exp[-(L_c - (w_vL_v^s + (1-w_v)L_a^s))/\tau]})^{r_t} (1-(\frac{1}{1 + exp[-(L_c - (w_vL_v^s + (1-w_v)L_a^s)))/\tau])}^{1-r_t}).
$$

In [13]:
#def round2_likelihood(r, w_a, w_v, tau):
# rewrite w_a = 1 - w_v
def round2_likelihood(r, w_v, tau):
    # remember r is a tuple of (location, answer)
    L = r[0]
    answers = r[1]
    # constants
    l_s_a = 1.5
    l_s_v = -1.5
    likelihood = 1
    # model the product
    for t in range(len(L)):
        #likelihood = (likelihood * 1 / (1 + exp(- (L[t] - w_v*l_s_v + w_a*l_s_a))/tau)**answers[t] 
        #                        * ( 1 - 1 / (1 + exp(- (L[t] - w_v*l_s_v + w_a*l_s_a))/tau))**(1 - answers[t]))
        # rewrite w_a = 1 - w_v
        likelihood = likelihood * (1 / (1 + exp(- (L[t] - (w_v*l_s_v + (1-w_v)*l_s_a))/tau))**answers[t] 
                                * ( 1 - (1 / (1 + exp(- (L[t] - w_v*l_s_v + (1-w_v)*l_s_a))/tau)))**(1 - answers[t]))
    return likelihood

In [59]:
def round2_argmin(likelihood):
    result = optimize.minimize(lambda x: 1-likelihood(r, *x), 
                               [0.2, 10], method = 'SLSQP', bounds = [(0.5, 1),(None, None)], tol =0.0000000000000001)['x']
    w_a = result[0]
    tau = result[1]
    return w_a, tau

In [62]:
# new mockup which is skewed towards vision
degrees = [-4.5, -3. -1.5, 0 , 1.5, 3, 4.5]
answers = [0, 0, 0, 0, 1, 1, 1]
r = (degrees, answers)

In [63]:
round2_argmin(round2_likelihood)
# STRANGE: the value seems to not go under 0.5 (which it won't in the experiment anyways but mathematically that 
# should be possible for sure!)

(0.50000000000117861, -20.085381579078952)