# Simulating Language, Lab 9, Gene-culture co-evolution

We're going to use the same code as the last lab to do something similar to Smith & Kirby (2008) and discover what types of prior and learning strategy combinations are evolutionarily stable. You may be surprised to find that we really don't need much more than the code we already have to do this!

## Code from Lab 8

Here's the code from Lab 8, with no changes.

In [1]:
import random
%matplotlib inline
import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg', 'pdf')

from math import log, log1p, exp
from scipy.special import logsumexp

from numpy import mean # This is a handy function that calculate the average of a list

### Parameters for language

In [2]:
variables = 2           # The number of different variables in the language
variants = 2            # The number of different variants each variable can take

### Log probability functions

In [4]:
def log_subtract(x,y):
    return x + log1p(-exp(y - x))

def normalize_logprobs(logprobs):
    logtotal = logsumexp(logprobs) #calculates the summed log probabilities
    normedlogs = []
    for logp in logprobs:
        normedlogs.append(logp - logtotal) #normalise - subtracting in the log domain
                                        #equivalent to dividing in the normal domain
    return normedlogs
 
def log_roulette_wheel(normedlogs):
    r = log(random.random()) #generate a random number in [0,1), then convert to log
    accumulator = normedlogs[0]
    for i in range(len(normedlogs)):
        if r < accumulator:
            return i
        accumulator = logsumexp([accumulator, normedlogs[i + 1]])

def wta(probs):
    maxprob = max(probs) # Find the maximum probability (works if these are logs or not)
    candidates = []
    for i in range(len(probs)):
        if probs[i] == maxprob:
            candidates.append(i) # Make a list of all the indices with that maximum probability
    return random.choice(candidates)

### Production of data

In [5]:
def produce(language, log_error_probability):
    variable = random.randrange(len(language)) # Pick a variant to produce
    correct_variant = language[variable]
    if log(random.random()) > log_error_probability:
        return variable, correct_variant # Return the variable, variant pair
    else:
        possible_error_variants = list(range(variants))
        possible_error_variants.remove(correct_variant)
        error_variant = random.choice(possible_error_variants)
        return variable, error_variant

### Function to check if language is regular

In [6]:
def regular(language):
    first_variant = language[0]
    for variant in language:
        if variant != first_variant:
            return False # The language can only be regular if every variant is the same as the first
    return True

### Prior

In [7]:
def logprior(language, log_bias):
    if regular(language):
        number_of_regular_languages = variants
        return log_bias - log(number_of_regular_languages) #subtracting logs = dividing
    else:
        number_of_irregular_languages = variants ** variables - variants # the double star here means raise to the power
                                                                         # e.g. 4 ** 2 is four squared
        return log_subtract(0, log_bias) - log(number_of_irregular_languages)
        # log(1) is 0, so log_subtract(0, bias) is equivalent to (1 - bias) in the
        # non-log domain

### Likelihood

In [8]:
def loglikelihood(data, language, log_error_probability):
    loglikelihoods = []
    logp_correct = log_subtract(0, log_error_probability) #probability of producing correct form
    logp_incorrect = log_error_probability - log(variants - 1) #logprob of each incorrect variant
    for utterance in data:
        variable = utterance[0]
        variant = utterance[1]
        if variant == language[variable]:
            loglikelihoods.append(logp_correct)
        else:
            loglikelihoods.append(logp_incorrect)
    return sum(loglikelihoods) #summing log likelihoods = multiplying likelihoods

### Learning

In [10]:
def all_languages(variables, variants):
    if variables == 0:
        return [[]] # The list of all languages with zero variables is just one language, and that's empty
    else:
        result = [] # If we are looking for a list of languages with more than zero variables, 
                    # then we'll need to build a list
        smaller_langs = all_languages(variables - 1, variants) # Let's first find all the languages with one 
                                                               # fewer variables
        for language in smaller_langs: # For each of these smaller languages, we're going to have to create a more
                                       # complex language by adding each of the possible variants
            for variant in range(variants):
                result.append(language + [variant])
        return result

def learn(data, log_bias, log_error_probability, learning_type):
    list_of_all_languages = all_languages(variables, variants) # uses the parameters we set above
    list_of_posteriors = []
    for language in list_of_all_languages:
        this_language_posterior = loglikelihood(data, language, log_error_probability) + logprior(language, log_bias)
        list_of_posteriors.append(this_language_posterior)
    if learning_type == 'map':
        map_language_index = wta(list_of_posteriors) # For MAP learning, we pick the best language
        map_language = list_of_all_languages[map_language_index]
        return map_language
    if learning_type == 'sample':
        normalized_posteriors = normalize_logprobs(list_of_posteriors)
        sampled_language_index = log_roulette_wheel(normalized_posteriors) # For sampling, we use the roulette wheel
        sampled_language = list_of_all_languages[sampled_language_index]
        return sampled_language

### Iterated learning

In [11]:
def iterate(generations, bottleneck, log_bias, log_error_probability, learning_type):
    language = random.choice(all_languages(variables, variants))
    if regular(language):
        accumulator = [1]
    else:
        accumulator = [0]
    language_accumulator = [language]
    for generation in range(generations):
        data = []
        for i in range(bottleneck):
            data.append(produce(language, log_error_probability))
        language = learn(data, log_bias, log_error_probability, learning_type)
        if regular(language):
            accumulator.append(1)
        else:
            accumulator.append(0)
        language_accumulator.append(language)
    return accumulator, language_accumulator

## New code

Imagine we have a population of individuals who share a cognitive bias and a learning strategy (i.e., sampling or map) that they are born with. In other words, it is encoded in their genes. These individuals transmit their linguistic behaviour culturally through iterated learning, eventually leading to a particular distribution over languages emerging. We can find that distribution for a particular combination of prior bias and learning strategy by running a long iterated learning chain, just like we were doing in the last lab.

Now, imagine that there is some genetic mutation in this population and we have an individual who has a different prior and/or learning strategy. We can ask the question: will this mutation have an evolutionary advantage? In other words, will it spread through the population, or will it die out?

To answer this question, we need first to think about what it means to have a survival advantage? One obvious answer is that you might have a survival advantage if you are able to learn the language of the population well. Presumably, if you learn the language of the population poorly you won't be able to communicate as well and will be at a disadvantage.

The function `learning_success` allows us to estimate how well a particular type of learner will do when attempting to learn any one of a set of languages we input. The function takes the usual parameters you might expect: the bottleneck, the bias, the error probability, and the type of learner (`sample` or `map`). However, it also takes a list of different languages, and a number of test trials. Each test trial involves:

1. picking at random one of the languages in the list, 
2. producing a number of utterances from that language (using the `bottleneck` parameter)
3. learning a new language from that list of utterances
4. checking whether the new language is identical to the one we originally picked (in which case we count this as a learning success)

At the end it gives us the proportion of trials which were successful.

In [41]:
def learning_success(bottleneck, log_bias, log_error_probability, learning_type, languages, trials):
    success = 0
    for i in range(trials):
        input_language = random.choice(languages)
        data = []
        for i in range(bottleneck):
            data.append(produce(input_language, log_error_probability))
        output_language = learn(data, log_bias, log_error_probability, learning_type)
        if output_language == input_language:
            success = success + 1
    return success / trials

We can use this function in combination with the iterate function to see how well a particular type of learner will learn languages that emerge from cultural evolution. For example, try the following:

```
languages = iterate(100000, 5, log(0.6), log(0.05), 'map')[1]
print(learning_success(5, log(0.6), log(0.05), 'map', languages, 100000))
```

This will run an iterated learning simulation for 100,000 generations with a MAP learner and a bias of 0.6. Then it will test how well the same kind of learner learns the languages that emerge from that simulation. To get an accurate result, it runs the learning test for 100,000 trials. These two numbers (the generations and the test trials) don't need to be the same, but should ideally be quite large so that we can get accurate estimates. You can try running them with lower numbers a bunch of times and see how variable the results are to get a rough and ready idea of how accurate the samples are.

OK, but how does this help us tell what kind of biases and learning strategies will evolve? As I discussed above, we want to see if a mutation will have an advantage (and therefore is likely to spread through a population) or not. So, really, we want to know how well a learner will do at learning, who *isn't* the same as the one that created the languages. Try this:

```
print(learning_success(5, log(0.6), log(0.05), 'sample', languages, 100000))
```

The original list of languages was created by a population of MAP learners. Now we're testing what the expected success of a learner with a sampling strategy would be if exposed to one of these languages. If this number is higher than the number we got above, then the mutation could spread through the population. If this number is lower than the number we got above, we can expect it to die out. You may find that these numbers are quite similar (which is why we need large numbers for learning trials and genenerations to get an accurate estimate). This suggests that in some cases the selection pressure on the evolution of these genes might not be enormous, but nevertheless small differences in fitness can nevertheless lead to big changes over time.

## Question

There's only one question for this lab, because I want you to think about how best you can explore it with the tools I've given you here! 

You could answer this question just by typing in a bunch of commands like the examples above, or you could try and come up with a way of looping through different combinations. If you want, you could try and come up with a measure quantifying how big an advantage (or disadvantage) a mutation has in a particular population. If you want to be really fancy would be to then visualise these results in a graph somehow (hint: you can use `plt.imshow` to visualise a 2-dimensional list of numbers).

1. Which mutations will spread in different populations of learners, which mutations will die out, and which are selectively neutral (i.e. are neither better nor worse)?