# How random are you?
Just how good are people at generating random numbers?  We're going to try to find out here in this project. We're going to compare a series of user-generated random "coin toss" type events to actual coin toss events. In particular, we're going to see if people make random sequences or whether there's a bias embedded in there. We'll do this by looking at the distribution of run-lengths (how often do you have N in a row?).  For example, if 1=heads and 0=tails, the sequence `1 0 1 1 0 1 0 1 0 1 1 0` would have, for heads, 3 1-length runs, 2 2-length runs, and no 3-length runs.

## Getting the user input
For the first part of this project, you'll collect a series of 1's and 0's from a user. There are fancier ways of doing this but, trust me, when tossing in cross-platform aspects, macOS security, and simplicity, we're going to just use the `input` method in Python. The downside to this is that it just asks you to enter in a string and waits until you hit Enter. There's no ability to filter the keys and no ability to make sure exactly N valid keys have been pressed.

So, *write a function that has the user ostensibly press a bunch of 1's or 0's. Then, take that input, remove anything other than 1 or 0. Do this using a list comprehension. You can keep it as a list or convert it back to a string.* Once that basic bit is done, *set it up so that it can take a `min` (default 20) number of 1's and 0's that must be in there and enforce this.*

Remember, a string is something you can iterate over. Here, for example, taking `foo` and filtering out the numbers:


In [1]:
foo='Sharks are4 older1 than trees'
print(foo)
bar=[i for i in foo if not i.isnumeric()]
print(bar)
baz=''.join(i for i in foo if not i.isnumeric())
print(baz)

Sharks are4 older1 than trees
['S', 'h', 'a', 'r', 'k', 's', ' ', 'a', 'r', 'e', ' ', 'o', 'l', 'd', 'e', 'r', ' ', 't', 'h', 'a', 'n', ' ', 't', 'r', 'e', 'e', 's']
Sharks are older than trees


In [2]:
def GetUserSequence(min=20):
    '''function where user presses a bunch of 1s and 0s. Remove anything not a 1 or 0 using list comprehension. Convert back to string. 
    then make it so that there has be a min amount of 20 number of 1s and 0s. '''

    sequence = str(input(f"Please randomly press 1 and 0 on your keyboard at least {min} times."))

    while len(sequence) < min: 
        print(f"Please enter at least {min} values")
        sequence = input(f"Please randomly press 1 and 0 on your keyboard at least {min} times.")

    print(f"Thank you for entering at least {min} numbers")
    filtered_seq = [n for n in sequence if n == "0" or n == "1"]
    return "".join(filtered_seq)


userseq=GetUserSequence()


Thank you for entering at least 20 numbers


## OK, so what's actually random?
Here, we're going to write come code that actually makes a random sequence of 0's and 1's of length n (default 1000), returning this as a string.  Later on, we'll use numpy and scipy, but Python now has decent random numbers built in.  Have a look at [`random.choices`](https://docs.python.org/3/library/random.html#random.choices).  But, here's a sample of how it works:

In [3]:
import random
print(random.choices(['duck','go'],[10,2],k=10))


['duck', 'duck', 'duck', 'go', 'duck', 'duck', 'duck', 'duck', 'duck', 'duck']


**Now, in the cell below**, write a function `GenRandom` that uses `random.choices` to make a list of `n` random 0's and 1's.

In [4]:
import random
def GenRandom(n=1000):
    '''make a list of n random 0s and 1s; convert to string aftere'''
    return ''.join(random.choices(['1', '0'], k = n))
   


rndseq=GenRandom()

print(rndseq)

0000110000101101010111000011101000001100011000111000101010010101001101001000101001011010000101100000000110110100110011000110011100001101011000010001001000010011100000000110010010010011100000100110001110000011101100110001110111010010001111101101000010100001110011110101001010111111001011101110111110101101100111110011111010010010011001111100100110100011000111011111100110100100010110011101110011100100011010011010110111010001110110101100000100001110001110100010100111101101110100100100110011001101100110010000101100010101000111010000010101010111111111111111100001000011100011100001110101101010001111110000010010010100010010000001001001000101001101111011000110011000101101101011011010100000001011001000100001000000111110001111010000100100100000011110011010111000100011010111001111001001110000000100010110111000000111111110111000111100000011101001111010010010000110010100001111100110111010101101110010100010000110010011011110111010000101000010001101010100010010001001110010111001101111011101100100010101

# The fun part
Now comes the fun part.  We need to see just how often patterns come up. In particular, we're going to look for how often we get runs of length 1, runs of length 2, of length 3, ... length 8.  Python has a nice [`count`](https://docs.python.org/3/library/stdtypes.html?highlight=count#str.count) function that works on strings that we might think to use. But, the trouble is, this counts "non-overlapping occurrences of substrings".  Have a look at this sample.  We should end up with 1 run of length 1 and 1 of length 3, but none of length 2.

In [5]:
foo='01001110'
print(foo.count('1'))
print(foo.count('11'))
print(foo.count('111'))

4
1
1


Well that's not quite right... It found 4 of length=1, 1 of length=2 (*wait - think about why it came up with just 1 of these and not 2*), and 1 of length 3.

What if we made it look for 0's beforehand to make sure we're at the start of a run?

In [61]:
print(foo.count('01'))
print(foo.count('011'))
print(foo.count('0111'))

2
1
1


Closer, I suppose, but still not there.  What if we looked for the full start with 0, thing, and then end with 0?

In [15]:
print(foo.count('010'))
print(foo.count('0110'))
print(foo.count('01110'))

1
0
1


Hey, that looks good!  Let's just test it one more time though

In [62]:
foo='0101010101010101010101010'
print(foo.count('010'))
print(foo.count('0110'))
print(foo.count('01110'))

6
0
0


We were so close weren't we? I mean, really now.  6?  I count 12 in there.  *Why is it coming up with only 6?* Once we figure that out, *what might we do about it?*

Now, write a function that:

1. Takes in a string and figures out the number of run-lenghts of 1's from 1-8.  Remember, your string could start or end with a 1, so any solution you come up with has to handle this.
2. Divides those counts by the length of the string itself to, in some ways, normalize this so that short and long inputs are on roughly an even "odds of run-length X" kind of footing.  No, you can't use numpy.  A key point here is to think around obstacles for solutions.
3. Returns a list with those "probabilities" as a list

Then, run this on both your user-generated string and on the random string and give me a pretty printout of the results.

In [10]:
def CalcRunLengthProbs(s):
    s = list(s)
    original = s.copy()
    #handle edge cases so insert 0 into sequence at first and last positions:
    s.insert(0, "0")
    s.append("0")
    s = "".join(s)
    s = s.replace("0", "00")

    prob_list = [] #initialize list to store probability for each run length
    for run_length in range(1, 9):
        run_length_ones = ["0"] + ["1"] * run_length + ["0"] #create list of 1s for each run length with 0s at each end
        run_length_ones = ''.join(run_length_ones) #convert list to string
        n_count = s.count(run_length_ones) #count number of times run_length_ones appears in sequence
    
        prob = n_count / len(original)  #calculate probability of run_length_ones appearing in original sequence
        prob_list.append(prob)

                     
    
    return prob_list #return list that contains probabilities for each run length




In [9]:
print(f'list of probabilities for randomly generated sequence: {CalcRunLengthProbs(rndseq)}\n user-generated sequence: {CalcRunLengthProbs(userseq)}')




1000
00000000001100000000100110010010011100000000111001000000000011000000110000001110000001001001000010010010000110010000100000010010000100110010000000010011000000000000000011001100100001100001100000011000011100000000110010011000000001000000100001000000001000011100000000000000001100001000010000100001110000000000100001100000011100000000001110011000011000000111001110010000100000011111001100100000000100100000000111000011110010010000100100111111000010011100111001111100100110011000011111000011111001000010000100001100001111100001000011001000000110000001110011111100001100100001000000100110000111001110000111000010000001100100001100100110011100100000011100110010011000000000010000000011100000011100100000010010000111100110011100100001000010000110000110000110011000011000010000000010011000000100100100000011100100000000001001001001001111111111111111000000001000000001110000001110000000011100100110010010000001111110000000000100001000010010000001000010000000000001000010000100000010010000110011110011000