In [1]:
from datascience import *
import numpy as np

## np.random.choice

``` np.random.choice(a, size=) ```

``` np.random.choice(a, size=, replace=, p=) ```

np.random.choice allows to randomly sample based on proportions and/or replacement. We specify an array `a` with elements that we aim to choose from, specify a sample `size` and whether we want to `replace` in our random sample or not. The array `p` contains proportions for corresponding elements in the `a` array. 

Note: `a` takes in an array or a number; `size` takes in a number; `replace` takes in true or false; `p` takes in an array of proportions. 

## Examples:

In [6]:
np.random.choice(make_array(0, 1, 2), size=3)

array([2, 0, 0])

In the above example, we are choosing three numbers from an array of 0, 1, 2 at random. Since there is no specificed value for `replace`, it is defaulted to `True` and proportions `p`, it is defaulted to make all proportions equal. Thus, 0, 1, and 2 have an equal chance of being chosen.

In [7]:
np.random.choice(make_array(0, 1, 2), size=3, replace=False)

array([0, 2, 1])

In the above example, we are choosing three numbers from an array of 0, 1, 2 at random. Except this time we are doing without replacement. Also note that each number has an eqaul chance of being chosen: namely 0.33.

First, the number 0 is chosen. Then since ```replace=False```, we have to choose from 1 and 2. Then 2 is chosen and we only have one number left to choose, and so 1 is chosen.

In [8]:
np.random.choice(make_array(0, 1, 2), size=3, replace=False)

array([0, 1, 2])

In [9]:
# There is another way to do the same thing as above:
np.random.choice(3, size=3, replace=False)

array([1, 0, 2])

Notice how this time an array is not passed in; a number is passed in. This automatically creates an array from ```0 to n - 1```, which is what will be sampled from. And in both cases, the result is the same. Even though the order is of the result is different, the sampling done is still similar, and, of course, the ordering will be different since this is a random sample!

#### We aren't confinded to sampling numbers, we can sample names or words:

In [10]:
np.random.choice(make_array(
    'Milan', 'Michelle', 'Margherita', 'Mia', 'Mona'), size=5, replace=False)

array(['Mona', 'Michelle', 'Mia', 'Margherita', 'Milan'], dtype='<U10')

Since we have ```replace=False``` and ```size=5```, we have to choose all the possible outcomes!

In [11]:
np.random.choice(make_array('Milan', 'Michelle', 'Margherita', 'Mia', 'Mona'), size=5, replace=True)

array(['Margherita', 'Margherita', 'Milan', 'Milan', 'Mia'], dtype='<U10')

Since we have ```replace=True``` and ```size=5```, we choose randomly with replacement and it is possible to choose the sanem more than once. Notice how ```'Milan'``` is chosen twice. 

## Examples with proportions:

In [19]:
np.random.choice(make_array(0, 1, 2), size=3, replace=True, p=make_array(0.1, 0.4, 0.5))

array([2, 1, 2])

In the above example, 0 has a 0.1 chance of being select; 1 has a 0.4 chance; and 2 has a 0.5 chance. Everything else in the random sample is the same. And notice how, since 2 has a 0.5 chance of being chosen, it shows up more frequently in our chosen sample.

In [21]:
np.random.choice(make_array(0, 1, 2), size=10, replace=True, p=make_array(0.1, 0.4, 0.5))

array([1, 2, 2, 2, 1, 2, 2, 1, 1, 1])

In the above example, 2 and 1 appear 5 times each. This proves that it is randomly chosen!

In [22]:
np.random.choice(make_array(0, 1, 2), size=10, replace=True, p=make_array(1, 0, 0))

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

To drive the point home, this time 0 has a 100% chance of being chosen, and 1 and 2 have 0% chance each. As a result, everything in the chosen sample is 0!

## Things to Notice:

#### Here I am reverting pack to eqaul proportion for the items in the array:

In [24]:
np.random.choice(make_array(0, 1, 2), size=4, replace=False)

ValueError: Cannot take a larger sample than population when 'replace=False'

This error is valid and let's try to make sense of it. So we want to choose from 3 numbers 4 times and we are __not__ replacing! This is interesting because we can't do it unless we replace. 

For instance, say we choose 0 first time.
Then we have to choose from 1 and 2. (Note we need to select three times)
Say we select 1. Now we need to choose from 2, and we need to make two selections. 
Now we choose the last remaining element, which is 2. But we need to choose one more time, and we don't have anything to choose form! 

Thus, we get the error!

In [25]:
np.random.choice(make_array(0, 1, 2, 3), size=4, replace=False)

array([2, 3, 0, 1])

This works because the number of elements in the array is equal to size, so we can sample without replacement here!

## Usefullness of np.random.choice:

```np.random.choice``` is useful because it allows us to conduct random samples based on proportions for the items in our array we are choosing from. 

For example say we have a coin that is biased (such that getting a head is 80% likley and a tail is 20%) and we want to see how many heads we get. To sample based on this, we can do the following:

```np.random.choice(make_array('Heads', 'Tails'), size=10, p=make_array(0.8, 0.2))```

In [27]:
np.random.choice(make_array('Heads', 'Tails'), size=10, p=make_array(0.8, 0.2))

array(['Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails',
       'Heads', 'Heads', 'Heads'], dtype='<U5')

Perfectly enough, we get 8 heads!

And even if __everything is equally likely__ this can be done by not specifying anything in the `p=` argument. In other words do what we've been doing.

To randomly sample an unbiased (fair) coin, do the following: 
```np.random.choice(make_array('Heads', 'Tails'), size=10)```

In [28]:
np.random.choice(make_array('Heads', 'Tails'), size=10)

array(['Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads',
       'Tails', 'Heads', 'Tails'], dtype='<U5')

Again, perfeclty enough, we get 5 heads! 

__(I didn't do any voodoo magic to get exactly 8 and exactly 5 heads for the random samples above! It's all random!)__