# A 9-hour Python tutorial focusing on data processing

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

## Homework: List and pandas Series

In [7]:
import numpy as np
import pandas as pd

#### Problem 1
When `s` is a string,  
`list(s)` will be a list of letters in `s`.

Write a function `has(letter, s)`  
that takes a length-one string `letter` and a string `s` as the input  
and output `True` or `False`,  
depending on whether `s` contains `letter`.

In [1]:
s = 'abcdefg'
print(list(s))

### your answer here


['a', 'b', 'c', 'd', 'e', 'f', 'g']


#### Problem 2
Write a function `is_prime(k)`  
that takes an integer `k` as the input  
and outputs `True` or `False` depending on whether `k` is a prime number or not.

In [2]:
### your answer here


#### Problem 3
Suppose `l` is a list of numbers in `1,...,6`.

Write a function `dice_distribution(l)`  
that takes `l` as the input  
and output the distribution 
`[a1,...,a6]`,  
where `ak` is the number of occurrences of `k` in the list `l`.

In [3]:
l = [1,3,2,2,3,5,4,6,4,3,2,5,6,1,3,4,2,6,1,5,3,2,4,5,1,2,6,4,3,4,3,4,3,6,6,5,5,6,5,2,1,4]
### your answer here


#### Problem 4
Now Jephian has a dice and you can roll it several times  
to get a list of numbers in `1,...,6`.

Write a function `most_freq(l)`  
that takes `l` as the input  
and outputs the most frequent element.

There are several ways to achieve this.  
According to experiments, 
the methods with speed form fast to slow are the following.
1. `np.bincount` + `np.argmax`
2. `np.sum(arr = k) for k in range(1,7)`
3. primitive `for` loop
4. `pd.value_counts` + `pd.idxmax`  

You may run your own experiments  
by putting `%%timeit` in the first line of the cell.

In [23]:
### Run this cell first to create the dice

import random

random.seed(10)  ### You can change a dice by changing the number here
my_secret = random.randint(1,6)
random.seed(None)

def roll_Jephian_dice():
    p_space = list(range(1,7)) + [my_secret] * 5
    return random.choice(p_space)

l = [roll_Jephian_dice() for k in range(60000)]

In [None]:
### put %%timeit if you want to time it

def most_freq(l):
    

most_freq(l)

#### Problem 5
Now Jephian wants to play a game with you:  
1. Jephian will roll a dice 100 times.
2. Each time, you can choose to bet (cost $1) or to hold (cost nothing).
3. If your bet is the correct number, you win \$3.
4. Step 1~3 together is called a round.  
Each round Jephian will use a different dice.  
This process will repeat 60 rounds.  

As more as possible, earn some money from Jephian!

A few more details:
1. The dice has 6 numbers: 1,...,6.
2. You have access to a list called `record`,  
which records all numbers appeared so far.
3. Each round starts with `record = []`.  
4. You are going to write a function `player_move()` to represent your strategy.  
Return 1,...,6 to bet; return 0 to hold.
5. Then run the cell with `play()` to check the result.

Run the next three cells to get a feeling first.

In [29]:
### Run this cell first to setup
### For the purpose of writing the homework, you are not allowed to modify the code here.
### For your curiosity, of course you can play around the code in this cell.

import random

record = []

def roll_Jephian_dice(biased):
    p_space = list(range(1,7)) + [biased] * 2
    return random.choice(p_space)

def play():
    played_list = []
    earned_list = []
    for _ in range(60):
        record.clear()
        played = 0
        earned = 0
        biased = random.randint(1,6)
        for __ in range(100):
            p = player_move()
            if p not in [0,1,2,3,4,5,6]:
                raise ValueError('Your function can only output 0,...,6')
            J = roll_Jephian_dice(biased)
            record.append(J)
            if p == 0:
                pass
            else:
                played += 1
                earned -= 1
                if p == J:
                    earned += 3
        played_list.append(played)
        earned_list.append(earned)
    
    print('Each round')
    print('Average number of games played:',np.average(np.array(played_list)))
    print('Average money earned:', np.average(np.array(earned_list)))
    print()
    print('Total money earned:', np.array(earned_list).sum())

In [61]:
### Define your strategy here in player_move()
### Remember that you have the access to the record

def player_move():
    return random_guess() ### random guess is not a good idea

### Sample strategies
def always_hold():
    return 0

def random_guess():
    return random.randint(1,6)


In [62]:
play()

Each round
Average number of games played: 100.0
Average money earned: -50.85

Total money earned: -3051
