# Analyzing Coin Flips

> Data and some of the main ideas originally from [Can you Fake Coin Tosses?](https://faculty.math.illinois.edu/~hildebr/fakerandomness/)

It turns out that humans are rather bad random number generators. If you ask a person to simulate a random sequence (e.g., flipping a coin a number of time), they will almost always introduce patterns in the sequence that you would not see if it were genuinely random.  A cool example of this in action is the [mind reader app](http://mindreaderpro.appspot.com/) put together by Yoav Freund's group at UC San Diego.


In this notebook we'll be using conditional probabilities as a way to understand the difference between real and fake coin tosses.

In [1]:
import pandas as pd

df_fake = pd.read_csv('coin_flips.csv')
df_fake

Unnamed: 0,student,flips
0,math199chp2017fall1,0000011001000011101010011101111010001110110100...
1,math199chp2017fall2,0010101100010111100101011000101100100011011001...
2,math199chp2017fall3,0000101010001010011111001011110010110000000101...
3,math199chp2017fall4,1101001110101001110101100001101101000100111010...
4,math199chp2017fall5,0010100111011011010110000110010011000101100110...
5,math199chp2017fall6,0010110001011110101011001101010001101011110010...
6,math199chp2017fall7,0001011010010111100101001001110110100100001101...
7,math199chp2017fall8,1011010011100101101000011101001110100110100110...
8,math199chp2017fall9,0011101100101001001011011000011010110111011101...
9,math199chp2017fall10,0010100011110101100100111001011001101000110101...


In [2]:
df_fake.iloc[0]['flips']

'0000011001000011101010011101111010001110110100111101000110010111100011111110011101100001011100010110001101000001010110100010100101000101101111101100101111010111010010111110010111001010101001011110010010000101'

In [3]:
from random import randint
def sample_random_flips(x):
    return ''.join(str(randint(0,1)) for _ in range(len(x)))

df_real = df_fake.copy()
df_real['flips'] = df_fake['flips'].map(sample_random_flips)

In [4]:
#!pip install regex
import regex as re

def count_overlapping(text, search_for):
    return len(re.findall(search_for, text, overlapped=True))

def make_seq_column(df, seq):
    df['seq_' + seq] = df['flips'].map(lambda x: count_overlapping(x, seq)/(len(x) - len(seq) + 1))

In [5]:
import itertools

def populate_length_n_seqs(n):
    for s in itertools.product(*([['0', '1']]*n)):
        make_seq_column(df_real, ''.join(s))
        make_seq_column(df_fake, ''.join(s))

for n in range(10):
    populate_length_n_seqs(n)

In [6]:
def get_conditional_probs(df, conditioning_seq):
    seq_count = df['seq_' + conditioning_seq] * (df['flips'].map(len)-1)
    followed_by_1 = df['seq_' + conditioning_seq + '1'] * (df['flips'].map(len)-2)
    followed_by_0 = df['seq_' + conditioning_seq + '0'] * (df['flips'].map(len)-2)
    p_1_given_conditioning_seq = followed_by_1.sum() / seq_count.sum()
    p_0_given_conditioning_seq = followed_by_0.sum() / seq_count.sum()
    return p_0_given_conditioning_seq / (p_1_given_conditioning_seq + p_0_given_conditioning_seq), \
            p_1_given_conditioning_seq / (p_1_given_conditioning_seq + p_0_given_conditioning_seq)

In [8]:
get_conditional_probs(df_fake, '11')

(0.5898412698412698, 0.4101587301587301)

In [30]:
from sklearn.model_selection import train_test_split

df_fake_train, df_fake_test = train_test_split(df_fake)

In [31]:
import numpy as np

for s in df_fake_test['flips']:
    context = []
    llr = 0
    for flip in s:
        context_to_use = ''.join(context[-2:])
        p_0, p_1 = get_conditional_probs(df_fake_train, context_to_use)
        if flip == '0':
            llr += np.log(p_0) - np.log(0.5)
        else:
            llr += np.log(p_1) - np.log(0.5)
        context.append(flip)
    # note if llr is greater than 0, then we have successfully determined that the coin flips are fake
    print(llr)

-12.192473874105687
9.325369658669095
7.249061807339866
18.534949458487738
-5.466329182104011
6.466114541623526
-1.3041789654085396
24.695751007622487
18.018866595807413
11.296857155643918
4.875289940936188
