Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your collaborators below:

In [1]:
COLLABORATORS = ""

---

In [2]:
import numpy as np

For this problem, you will write code to implement the language-learning game explained in lecture and described by Gold in his 1967 paper. The game is intended to be a formal model of the way in which people learn natural languages. In the game, there is a learner who is trying to learn the language and a teacher who is supplying the learner with examples of valid sentences from the language. For our convenience, we will represent each sentence with a single number and each language as a set of valid sentences (numbers). In this version of the game, we will assume that there are only 10 possible languages: $L_1$, $L_2$, ... , $L_{10}$, and that each language $L_i$ contains the sentences 1 through $i$, so $L_1 = \{1\}$, $L_2 = \{1,2\}$, $L_3 = \{1,2,3\}$, ..., and $L_{10} = \{1,2,3,4,5,6,7,8,9,10\}$.

The game proceeds as follows: the teacher first selects a target language that the learner must learn. The teacher then generates examples of valid sentences from the target language and tells them to the learner one at a time (we will assume that the teacher chooses each example randomly from the set of valid sentences). Each time the learner sees a new sentence, they get one guess for what the target language is, based on all the observed sentences seen so far. In our version of the game, the game will be over when the learner correctly guesses the target language. The learner’s strategy should be to always guess the first language that is compatible with the sentences seen so far. So if the sequence of observed sentences were $\{3,1,1,2,1,3,2\}$, the learner should guess $L_3$, and if the observed sentences were $\{2,9\}$, the learner should guess $L_9$.

---

## Part A (1 point)

<div class="alert alert-success">Complete the function definition for `select_target_language` below to randomly select the target language.</div>

Note: in the last problem set, we used the `np.random.rand` function to generate random numbers between 0 and 1. There is another random function that will come in handy for this problem, called `np.random.choice`, which will randomly choose an element from a list or array:

In [3]:
np.random.choice?

In [4]:
def select_target_language(possible_languages):
    """Randomly choose a target language from a list of possible
    languages.
    
    Hint: your solution can be done in 1 line of code (including
    the return statement).

    Parameters
    ----------
    possible_languages : list of sets
        The list of possible languages to choose from

    Returns
    -------
    set : the target language

    """
    return np.random.choice(possible_languages)

Check that your function selects languages randomly from the set of possible languages (i.e., it returns something different if you run it more than once):

In [5]:
for i in range(5):
    print(select_target_language([{1}, {1, 2, 3}, {1, 2}, {1, 2, 3, 4, 5}, {1, 2, 3, 4}]))

{1, 2, 3}
{1}
{1, 2, 3, 4}
{1, 2, 3, 4}
{1, 2, 3, 4, 5}


In [6]:
# add your own test cases here!


In [7]:
"""Check implementation of select_target_language."""

def gen_random_languages(n):
    languages = [tuple(sorted(set(np.random.randint(0, 100, i)))) for i in range(1, n + 1)]
    unique = [set(x) for x in set(languages)]
    return unique

for i in range(1, 21):
    # create a random set of target languages
    languages = gen_random_languages(i)

    # generate a few target languages 
    selected = set()
    for j in range(20):
        target = select_target_language(languages)
        assert target in languages, "target language '{}' is not in the given set of languages".format(target)
        selected.add(tuple(sorted(target)))

    if i > 1:
        assert len(selected) > 1, "select_target_language does not produce a random answer"

print("Success!")

Success!


---

## Part B (1 point)

<div class="alert alert-success">Complete the function definition below for `generate_example`, which randomly generates an example sentence from the target language.</div>

Note: as with the previous part of this problem, the function `np.random.choice` may come in handy! Note, however, that it will not work with sets by default, so you may need to convert the set to a list first.

In [8]:
def generate_example(valid_sentences):
    """Randomly choose an example "sentence" from the given set of valid
    sentences.
    
    Hint: your solution can be done in 1 line of code (including the
    return statement).

    Parameters
    ----------
    valid_sentences : set of integers
        The valid sentences to choose from

    Returns
    -------
    integer : an integer representing the example sentence

    """
    return np.random.choice(list(valid_sentences))

Test that your function doesn't always return the same output:

In [9]:
for i in range(10):
    print(generate_example({1, 2, 3, 4, 5}))

3
2
1
4
4
2
3
1
3
2


Also make sure that it can handle a language containing *any* valid sentences, even if they are not 1 through 10:

In [10]:
for i in range(10):
    print(generate_example({1, 4, 8, 2, 5, 3}))

3
4
1
4
8
1
8
8
3
3


In [11]:
# add your own test cases here!


In [12]:
"""Check the implementation of generate_example"""

for i in range(1, 21):
    # create a random set of target languages
    language = set(np.random.randint(0, 100, i))

    # generate a few examples
    selected = set()
    for j in range(20):
        example = generate_example(language)
        assert example in language, "example '{}' is not in the given language".format(example)
        selected.add(example)

    if i > 1:
        assert len(selected) > 1, "generate_example does not give a random answer"

print("Success!")

Success!


---

## Part C (1 point)

Now that we have some functions to generate a target language and examples, we can try playing a (modified) version of Gold's game. For this, you can use the function `gold_game` provided below. The function takes as an argument `guesser`, a function that guesses from the observations and possible languages.

In [13]:
def gold_game(guesser):
    """Plays a version of Gold's game using the `guesser` function to generate guesses.
    
    Parameters
    ----------
    guesser : function
        The function that generates guesses. Takes as arguments
        the list of observations and list of possible languages.
        
    """
    # generate the list of possible languages
    possible_languages = [set(range(1, n + 1)) for n in range(1, 10)]
    
    # randomly choose the target language
    valid_sentences = select_target_language(possible_languages)
    
    observations = []
    # don't actually loop forever, to prevent infinite loops
    for i in range(100):
        # print the status
        print("ROUND {}".format(len(observations) + 1))
        
        # observe a new sentence
        observations.append(generate_example(valid_sentences))
        print("Observations: {}".format(observations))
        
        # generate a guess for what the language is
        guess = guesser(observations, possible_languages)
        print("You guessed: {}".format(guess))
        
        # if the guess was correct, then stop, otherwise, keep going
        if guess == valid_sentences:
            print("--> Correct!")
            break
        else:
            print("Sorry, wrong language.\n")
            
    if guess != valid_sentences:
        raise RuntimeError("The guesser never guessed the right answer! Terminating after 100 tries.")

We can then create a guessing function that promps you for input:

In [14]:
def human_guesser(observations, possible_languages):
    """Prompt the user to guess the language (of those in `possible_languages`)
    generated the sentences in `observations`.

    Parameters
    ----------
    observations : list of integers
        "Grammatical" sentences that have been observed
    possible_languages : list of sets
        Possible languages that could have resulted in the sentences
        that were observed.

    Returns
    -------
    set of integers : a guess for the true language

    """
    # tell the user what languages there are
    print("Languages:")
    for i, language in enumerate(possible_languages):
        print("    {} : {}".format(i, language))
    print()

    # prompt them for a guess
    guess = None
    while guess is None:
        guess = input("Type the index of the language you want to guess: ")
        try:
            guess = int(guess)
        except:
            guess = None
        else:
            if guess < 0 or guess > (len(possible_languages) - 1):
                guess = None
            
    return possible_languages[guess]

Then, to play the game yourself, you can run `gold_game` with the `human_guesser` function. Uncomment and run the following cell (remember to comment it back out before turning in your problem set!):

In [15]:
#gold_game(human_guesser)

<div class="alert alert-success">Now, we're interested in writing a program that can play the game without any human input. Complete the function `guess_language` below to guess the target language based on all the previously observed example sentences.</div>

Hint: you may want to take a look at [how to check if one set is a subset of another](https://docs.python.org/3.4/library/stdtypes.html?highlight=set#set.issubset).

In [16]:
def guess_language(observations, possible_languages):
    """Guess which language (of those in `possible_languages`) generated
    the sentences in `observations`. 
    
    Note that your function should guess the *simplest* (i.e.. shortest) 
    language consistent with the sentences in observations. In the event 
    of a tie, you may return any of the tied languages.

    Hint: your solution can be done in 4 lines of code, including the
    return statement.

    Parameters
    ----------
    observations : list of integers
        "Grammatical" sentences that have been observed
    possible_languages : list of sets
        Possible languages that could have resulted in the sentences
        that were observed.

    Returns
    -------
    set of integers : your guess for the language that generated
    the sentences in `observations`
    """
    for p in sorted(possible_languages, key=len):
        if set(observations).issubset(p):
            return p

Try playing Gold's Game using your function:

In [17]:
gold_game(guess_language)
#You want to guess the smallest possible language which includes all of the observed sentences. 
#If you observe sentence 4, then you would want to guess the smallest language that includes sentence 4. 
#Then if you observe sentence 1 as well as sentence 4, you want to guess the smallest possible 
#language which includes both sentence 1 and sentence 4.



ROUND 1
Observations: [2]
You guessed: {1, 2}
Sorry, wrong language.

ROUND 2
Observations: [2, 6]
You guessed: {1, 2, 3, 4, 5, 6}
--> Correct!


Here are also a few toy examples for you to try:

In [18]:
guess_language([2], [{1, 2, 3}, {1}, {1, 2}])

{1, 2}

In [19]:
guess_language([1, 3], [{2, 3, 4}, {1, 3, 4}, {1, 2, 4}, {1, 2, 3}])

{1, 3, 4}

In [20]:
# add your own test cases here!


In [21]:
"""Check the implementation of guess_language"""
from nose.tools import assert_equal

def gen_random_languages(n):
    languages = [tuple(sorted(set(np.random.randint(0, 100, i)))) for i in range(1, n + 1)]
    unique = [set(x) for x in set(languages)]
    return unique

for i in range(1, 21):
    languages = gen_random_languages(i)
    target = select_target_language(languages)

    observations = []
    for j in range(1, i + 1):
        observations.append(generate_example(target))
        guess = guess_language(observations, languages)
        
        for obs in observations:
            assert obs in guess, "Observation '{}' is not in the guessed language {}".format(obs, guess)
            
        for l in languages:
            if all([obs in l for obs in observations]) and guess != l:
                assert len(l) >= len(guess), "Guessed language should be the simplest language consistent with the observations"
                
print("Success!")

Success!


---

## Part D (1 point)

Gold’s version of the language-learning game is different from ours: in his version, the learner is never told whether they correctly guessed the target language, and the game never ends. The learner is said to have won the game if there comes a point in time where they always guess the correct language from that point onward. Additionally, the teacher must eventually show the learner an example of every valid sentence in the target language. 

<div class="alert alert-success">If we played the game this way, would the language learner that you implemented always be able to win the game (for *any* set of finite languages, not just the ones we considered here)? A one sentence answer for this part is sufficient. (**0.5 points**)</div>

Yes, if the set is finite then the language learner could be able to correctly guess the language at a certain point.

<div class="alert alert-success">Why or why not? Please provide an explanation to the above. (**0.5 points**)</div>

Since the set is finite, the teacher has to have a number ('language') in the finite set of numbers. Therefore, once the teacher shows the example of the language with that number, the language learner would always guess the correct language from this point onward. 

## Part E (1 point)

Assume that instead of 10 languages, we had an infinitude of languages $L_1$, $L_2$, $L_3$, ..., $L_\infty$ with each language $L_i$ containing all the sentences from 1 to $i$. 

<div class="alert alert-success">If we played this game with Gold’s version of the rules described above, would the language learner that you implemented always be able to win the game? A one sentence answer is sufficient. (**0.5 points**)</div>

No, it would not be able to win the game.

 <div class="alert alert-success"> Why or why not? Please provide an explanation. (**0.5 points**)</div>

If there was an infinitude of languages, then the teacher could theoretically give a number, and then all the numbers under it for an infinite amount of time—thus never resulting in a 'win' for the language learner. As L reaches infinity, it would take an infinite amount of time to get to the next L because the language learner has to guess everything in between 1 and that number. For this reason, the language learner could never be confident that it has the right number because the teacher use this method to 'trick' the learner. 

---

Before turning this problem in remember to do the following steps:

1. **Restart the kernel** (Kernel$\rightarrow$Restart)
2. **Run all cells** (Cell$\rightarrow$Run All)
3. **Save** (File$\rightarrow$Save and Checkpoint)

<div class="alert alert-danger">After you have completed these three steps, ensure that the following cell has printed "No errors". If it has <b>not</b> printed "No errors", then your code has a bug in it and has thrown an error! Make sure you fix this error before turning in your problem set.</div>

In [22]:
print("No errors!")

No errors!
