# Different strategies for dealing with noise



The idea here is that there are different possible strategies of dealing with noise that are attested in natural communication systems:

* **Reduplication**: simply repeat the signal several times. This is compatible with a compositional system. It is not costly in terms of learnability, because the only extra thing that needs to be learned is a single extra rule that applies to all signals. However, it is relatively costly in terms of utterance length (it would thus not do well under a pressure for minimal effort). 
* **Diversify signal**: make the individual segments that each signal consists of as distinct as possible. For example, in the language shown below, all four signals can be distinguished from each other in all cases where one character is obscured by noise. This strategy however is not compatible with compositionality, because it relies on making each of the segments as distinct from each other as possible. That means these languages are necessarily holistic, an therefore less easy to learn (so they would do less well under a pressure for learnability).
    - 02 --> 'aaaa'
    - 03 --> 'bbbb'
    - 12 --> 'abba'
    - 13 --> 'baab'
* **Repair**: This strategy could be seen as a form of redundancy across turns, instead of within a signal. However, it will be initiated only when neccesary, and should therefore fare slightly better than the reduplication strategy under a pressure for minimal effort (where effort is measured as total shared utterance length of both interlocutors across a set number of interactions).

The predictions of Vinicius & Seán (2016 Evolang abstract titled "Language adapts to signal disruption in interaction"), are that although the reduplication strategy and the repair strategy should do equally well under a pressure for learnability, adding the possibility of repair will 'lift the pressure for redundancy', such that receivers can request that speakers repeat a signal only after a problem occurs.
---> However, we would add that in the absence of a pressure for minimal effort, the repair strategy does not have an advantage over the reduplication strategy. 
    


## Predictions under different selection pressures:

We predict that under the following assumptions:
- There is a pressure for expressivity/mutual understanding (or rather: a pressure to get ones signal/message across; which feels like a better way to describe the pressure that frogs and song birds are under)
- Noise regularly disrupts part of the signal (Vinicius & Seán used a 0.5 probability in their experiment)
- Repair is a possibility
We predict the following strategies to become dominant under the following combinations of the presense/absence of a pressure for learnability and a pressure for minimal effort:

|                | - minimal effort                                              | + minimal effort          |
|----------------|---------------------------------------------------------------|---------------------------|
| - learnability | Any of the three strategies above will do                                         | Repair + Compositional OR Holistic         |
| + learnability | Reduplication + Compositional OR Repair + Compositional | Repair + Compositional |

<span class="mark">Note</span> that the prediction in the {-learnability, +minimal effort} condition above only holds if we do not distinguish between open and closed requests. Because if we do, as in the model we submitted to evolang, we'd expect the Repair + compositional strategy to fare best in this condition, without the need for a pressure for learnability.

## How to represent languages?

### Possibility 1: Different form lengths

If we continue with Kirby et al.'s (2015) way of representing meanings and forms (which is a minimal way of creating languages that we can classify as compositional, holistic or degenerate), where meanings consist of $f=2$ features, which can each have $v=2$ values, we can allow for each of the language strategies specified above ('reduplication' and 'diversify signal'), by simply allowing for multiple string lengths $l$, while keeping the alphabet size $|\Sigma|$ at 2.

For example, where Kirby et al. (2015) only allowed for a single possible string length, and specified $f = v = l = |\Sigma| = 2$, we could minimally allow for two possible string lengths: one being equal to $f$ (i.e. the minimum string length required to uniquely specify each meaning feature), and one being equal to $2*f$, to enable reduplication of the signal.

That would yield the following types of languages:


**Reduplication + compositional:**

02 --> aaaa

03 --> abab

12 --> baba

13 --> bbbb


**Diversify signal + holistic:**

02 --> aaaa

03 --> bbbb

12 --> abba

13 --> baab


**Repair + compositional:**

02 --> aa

03 --> ab

12 --> ba

13 --> bb


In order to still make it possible for iterated learning chains to transition from a language that uses forms of length 2 into a language that uses forms of length 4 and vice versa, we need to then also allow for languages that use a mixture of form lengths (e.g. three forms of length 2, and one form of length 4). This yields the following number of possible languages:

$$ (2^2+2^4)^4 = 160000$$

which means that compared to the Kirby et al. (2015) model (where there were ($(2^2)^4 = 256$ possible languages), the hypothesis space expands by a factor of 625. That is not ideal, because if we assume that simulation run times increase linearly with the size of the hypothesis space, a simulation that took 1 hour to run in our previous model would now take almost 4 weeks to run.

However, this linear relationship between the simulation run times and the size of the hypothesis space holds when during learning, we actually loop through each hypothesis and update its posterior probability based on the data. There are a couple of ways in which this process can be optimised:

1. **memoisation**: This would require enumerating all possible data points (i.e. <meaning, form> pairs) (including all possible noisy forms), and for each of them calculating its likelihood for all possible hypotheses **once**, and caching the result. Whenever the same <meaning, form> pair is then encountered by any learner, the corresponding likelihood vector is then simply retrieved from memory and multiplied with the learner's current posterior. This should be doable given that the total number of meanings is 4, and the total number of forms (including all possible noisy variants, assuming that noise is restricted to a single character) is 56; which makes 4\*56 = 224 possible <meaning, form> pairs. For each of those 224 possible datapoints, we would then calculate its likelihood for all 160,000 hypotheses, and cache these values in a 224\*160,000 matrix. (That matrix thus has 224\*160,000 = 35,840,000 entries.)
2. Intergenerational learning could be sped up by representing data as simple counts of <meaning, form> pairs, and simply updating the posterior probability distribution for the full data set in one step, by multiplying the prior of the hypothesis with the likelihood of the <meaning, form> pair to the power of the number of times it occurs in the data set. This should speed things up a little in intergenerational learning, but won't make a difference in intragenerational learning, because there we assume that the hearer updates their posterior in each interaction. 
3. Do not do exact inference over the hypothesis at all, but instead use Gibbs sampling (as used by Burkett & Griffiths, 2010, and Kirby et al., 2015) --> This would require a bit more time to figure out, and is hopefully not necessary once optimisations 1 and 2 above have been implemented.

### Possibility 2: Allow reduplication as grammatical rule, and increase alphabet size

If instead of allowing for multiple form lengths, we instead increase the size of the alphabet $\Sigma$ from 2 to 4, that will make the diversify signal strategy possible. More concretely, that would mean that instead of there being an alphabet $[a, b]$, there would be an alphabet $[a, b, c, d]$. 
That would allow for the following example languages, where the bit at the end of the signal specifies whether the signal should be repeated (1) or not (0).


**Reduplication + compositional:**

02 --> aa1

03 --> ab1

12 --> ba1

13 --> bb1


**Diversify signal + holistic:**

02 --> aa0

03 --> bb0

12 --> cc0

13 --> dd0


**Repair + compositional:**

02 --> aa0

03 --> ab0

12 --> ba0

13 --> bb0


Choosing for this option would mean that instead of there being $(2ˆ2 + 2ˆ4) = 20$ possible forms, there would be $4ˆ2 = 16$ possible forms, and therefore $(4^2)^4 = 65536$ possible languages. In addition however, we'd need languages to have an extra bit that specifies whether signals are reduplicated or not (assuming there are only two options: reduplication ON versus reduplication OFF). That means that there'd be a total of $((4^2)^4)*2 = 131072$. So compared to possibility 1, possibility 2 only reduces the size of the hypothesis space by a factor of 1.22, which is perhaps not worth it, given that the language representations themselves would become a bit more complex, because they now contain a separate bit specifying whether signals should be reduplicated or not. In addition, this way of representing languages would make the 'reduplication' and 'diversify signal' strategies harder to distinguish, as can be seen in the example languages above.

If we find that neither possibility 1 nor possibility 2 make it feasible to run simulations within a reasonable time-frame, we could consider tackling the different possible strategies for dealing with noise separately. I.e. one model where we allow for the possibility to add reduplication to signals vs. repair, and another model where we allow for diversification of signal segments 

# References

Burkett, D., & Griffiths, T. L. (2010). Iterated learning of multiple languages from multiple teachers. The Evolution of Language: Proceedings of the 8th International Conference (EVOLANG8), Utrecht, Netherlands, 14-17 April 2010, 58–65.

Kirby, S., Tamariz, M., Cornish, H., & Smith, K. (2015). Compression and communication in the cultural evolution of linguistic structure. Cognition, 141, 87–102. https://doi.org/10.1016/j.cognition.2015.03.016