# Lesson 2.0: Types, containers, random numbers

## Recap

Let's just take a moment to remind ourselves how far we've come.  Here are some of the many things that we know how to do already:
- basic mathematical operations
- function definition
- iteration and recursion with loop control structures `for` and `while`
- keeping track of variables outside of loops (i.e., counting and summing)
- function definition and using function *inside* of loops
- conditionals with `if`, `else`, and `elif`

## Motivation

I mentioned at the beginning of the workshop that one of the uses of computation in the physical sciences was for simulating experiments or apparatus that are prohibitively complicated.  It's important to note that studying a simulation is **not** the same as studying the "real world" -- the behavior of the simulation will depend on our current understanding of the laws that govern the real world, and these may not be correct!  Still, simulations are very useful for hypothesis testing.  To that end, check this out:
- Cue evolution simulation code.  What hypotheses can we test? (Note: If you're not actually in a classroom with me or watching the video right now, you won't be seeing this!  The demo is similar-ish to David Randall Miller's work [here](https://www.youtube.com/watch?v=N3tRFayqVtk).  Very cool!

### Types and casting

When we use a computer to do a calculation, input and output quantities need to be stored/accessed by the computer.  **How** the computer does this depends on what **type** of quantity we're storing.  "Type" is an important concept in computer science.

We have so far encountered a few different types of quantities that `python` knows about:

#### `int` --> Integers 

Integers can be signed (positive or negative (or 0))

In [2]:
a = 4
b = -2
c = 180753806735751753
d = -200000000000000000000000000000000000000000000000000000
print(a, b, c, d)
type(b)

4 -2 180753806735751753 -200000000000000000000000000000000000000000000000000000


int

Use the cell below to see if you can figure out the maximum size for an integer.  (*I.e.*, what's the biggest `int` that `python` can handle?  Spoiler alert: Don't spend too much time on this!)

#### `float` --> Floats 

Floats are numbers with a finite number of digits to the right of the decimal.  They can be expressed in regular decimal form or in scientific notation.  You're almost certainly running this notebook on a 64-bit computer, which means your floats can have up to about 16 digits of precision.  (Variable `f` below gets truncated at 16 digits on my computer.)

In [3]:
e = 2.71828
f = -358735.3582636827067923067236e-10
g = -2.0e10
print(e, f, g)
type(g)

2.71828 -3.587353582636827e-05 -20000000000.0


float

#### `bool` --> Booleans 

Booleans are the outcomes of logical operations.  There are only two boolean values: `True` and `False`.  When we implement comparison operators, the outcomes are booleans.

In [5]:
h = 2 > 1
i = 2 < 1
j = (1 == 1.0000000001)
print(h, i, j)
type(j)

True False False


bool

#### `string` --> Strings

Strings are sequences of alphanumeric and punctuation characters.  We've used strings a little bit in our `print` statements.  Strings are initialized with either double quotes (`"   "`) or single quotes/apostrophes (`'   '`).

In [6]:
k = "Mikey"
m = 'likes'
n = 'bikes'
print(k, m, n)
type(n)

Mikey likes bikes


str

***

There are several other data types built into `python`, some of which we'll encounter later.  

You may wonder why this is necessary.  (Do you?)  There are two reasons why type is important:

**First**, in order to make efficient use of memory, the computer stores/accesses different types of data differently.  For example, there is a maximum size `float` that `python` can handle:

In [None]:
a = 2.0e309
print(a)

Haha, oops.  Efficient memory usage is less of an issue these days, as computer memory and speed are essentially infinitely larger than they were in the early days of computing, but it is very much still an issue at the extremes (size, speed) of computing.  

**Second**, the types of operations that can be done with a piece of data depend on what type of data it is!  Most programming languages have built-in rules that prevent you from doing something nonsensical.  For example, the following operations are allowed:

In [8]:
print(a, e, a + e)
print(b, f, b / f)
print(k, n, k + " " + n)
print(h, j, h and j)

4 2.71828 6.71828
-2 -3.587353582636827e-05 55751.40431320216
Mikey bikes Mikey bikes
True False False


The following is not.  Uncomment it, and give it a whirl!  Can you decypher the error message?

In [9]:
a + k

TypeError: unsupported operand type(s) for +: 'int' and 'str'

So `python` has some built-in guard rails that keep ~~me~~ us from making dumb mistakes.

***

What about the following:

In [11]:
var1 = 62
var2 = False
print(var1 + var2)

62


What the WHAT!?

`python` is what's called a *dynamically typed* language, meaning that it can change the type of a variable fluidly/on the fly.  So, when we ask `python` to add `var2` to an integer, `python` *interprets* `var2` in a way that seems reasonable.  In fact, many programming languages have this equivalence between the booleans and integers: `True` --> 1, `False` --> 0.

Another consequence of dynamic typing is that if I initialize a variable as a certain type, the type can changed based on what I do with the variable.  Let's use the `type()` function to check the type of a variable:

In [12]:
var3 = 2
print(var3)
print(type(var3))
var3 = var3 + 0.1
print(var3)
print(type(var3))

2
<class 'int'>
2.1
<class 'float'>


In most implementations, this feature of `python` allows us to be a bit careless/sloppy with quantities.  Yay!  But, there are instances where this sloppiness could cause problems.  Boo!

One last feature having to do with types.  We can actively change the type of a quantity by *casting* it to another type.  We do this with functions named after the various types. There's a lot of interesting stuff that goes on behind the scenes when casting, but we'll ignore this for now.  Check these operations out:

In [13]:
var4 = 432
print(var4, float(var4))
print(var4, str(var4))
print(var4, bool(var4))
print()

var5 = True
print(var5, int(var5))
print(var5, str(var5))
print(var5, float(var5))
print()

var6 = 2.6835
print(var6, int(var6))
print()

var7 = "yowza"
print(var7)
#print(int(var7)) # this will fail... why???

432 432.0
432 432
432 True

True 1
True True
True 1.0

2.6835 2

yowza


Note that casting a `float` to an `int` has a potentially useful behavior!

*** 

## Random number generation

Generating and analyzing large random or pseudo-random datasets is difficult.  In this way, computing has allowed humans to make amazing progress in the statistical sciences.  

For example: In the old days, if I wanted to investigate the large-scale/statistical properties of something like a game of blackjack (you know, for cheating reasons), I had to actually *deal* many games of blackjack.  Ugh!  Gross.  That would take so much time!  With computers, I can generate millions of blackjack hands according to the deck properties and history in seconds with a bit of fairly basic programming.  

Problems like this hinge on a) the computer's ability to do loops and b) the computer's ability to generate random numbers.

There are several `python` libraries that provide random number generation.  We learned in a previous lesson that a *library* is a bit of code that `python` is able to use (almost like a book of extra knowledge), but that `python` doesn't pre-load.  In order to have access to the built-in stuff in the random number generation library, we need to tell `python` to load it before we do any calculations.  

This process is analogous to the way that humans work.  Pretend that you and I are going on a camping trip on which we'll need to defend ourselves from bears.  You'll need to learn the skills before we go.  You *are capable* of doing it, you just don't know *how* to do it yet.  SO!  Before we leave and start camping, I ask you to go to the library and read the book "HOW TO DEFEND ONESELF FROM BEARS" to load the knowledge into your brain/mind.  After you do that, I can call on you to execute many bear-defense strategies: run, jump, climb tree, bear headlock, play dead, *etc*.

Okay enough goofing around.  Here's an example:

In [14]:
import random as rand # this line adds the random module and gives it a shorter handle 
a = rand.random()     # this line accesses the random() function that within the rand module
print(a)

0.0574009340563999


In [22]:
a = rand.random()     # this line accesses the random() function that within the rand module
print(a)

0.8007705583212595


In the first line, I imported the `random` module and called it something shorter, `rand` (a nickname of sorts so that I don't have to type as much).  In the second line, I told python to access the `random()` function of `rand`.  This function generates a random `float` between 0 and 1.  Execute the cell above a few times to see how the random number varies.

The random module has LOTS of features in addition to `random()`.  It's quite sophisticated.

Let's put `random()` inside of a loop to generate a whole mess of numbers:

In [23]:
for i in range(20):
    num = rand.random()
    print(num)

0.6829679951948412
0.5251547179876317
0.22259135540121477
0.26491892891931446
0.6893204359954785
0.17712838503091977
0.42528643978199987
0.20260853348866326
0.2492744619212257
0.1303332055697538
0.5766944782774684
0.9333838892007877
0.4289006686508299
0.12887181484182986
0.10484153830781517
0.18839822275264273
0.1728562190386722
0.7133669052603396
0.7491400128371849
0.30530684003560127


Every time we use `rand.random()`, it generates a new number between 0 and 1.  Good to know.

There are MANY ways to use this very simple tool.  One can transform the range of the numbers quite easily.  Here's a loop that prints 20 random numbers between -1 and 1:

In [24]:
for i in range(20):
    rnum = rand.random()
    new_num = 2.0*rnum - 1.0
    print(new_num)

0.2272450328672353
0.1136361833020696
-0.3260629704232547
-0.7254097503514634
0.5902225060920949
0.6264012701916299
-0.8299818312929632
-0.1548348210591941
0.8169871684060466
-0.7086381548741578
-0.26547877990676194
0.06780780776474571
-0.8344039790034921
0.6064708320342023
0.3576320285190706
0.39477566237509487
-0.8490041449040153
0.10380293058088674
0.16971681434493457
0.7384089608162074


Here's a snippet that writes a bunch of random points in two-dimensional space:

In [25]:
for i in range(20):
    x = rand.random()
    y = rand.random()
    print("(" + str(x) + ", " + str(y) + ")")

(0.07027799619441388, 0.6860808568278013)
(0.251281388566268, 0.8553620870440376)
(0.7807111162587349, 0.4909935985732453)
(0.8714379373299315, 0.6716208536795922)
(0.2484776683795632, 0.13414075739273945)
(0.5640277024189329, 0.9794865521457568)
(0.9579798943227955, 0.1827332911100633)
(0.7372158174327819, 0.0807522914048121)
(0.5784721030649663, 0.42397257107616815)
(0.44481646761737337, 0.512413886198205)
(0.9489629470160428, 0.3954809260432661)
(0.23950148957987083, 0.4738752761451882)
(0.482658661760464, 0.9644341713522387)
(0.29419596322732966, 0.030416455664954878)
(0.022762629500312714, 0.04290095234246649)
(0.8265812654183523, 0.3633226422278817)
(0.38463638022976365, 0.8478528527180352)
(0.38587194781128886, 0.673878337811147)
(0.19075546966159584, 0.5278492805480239)
(0.4431108211291528, 0.013174794770997922)


One can also input random numbers into a function:

In [26]:
def parab(x):
    return x**2 - 4*x + 3

for i in range(20):
    input_x = rand.random()
    parab_value = parab(input_x)
    print(parab_value)

2.279249292075011
1.0690044014923918
0.15099572697209407
1.1137113037169488
1.0933594621666847
2.6647503020641365
2.032444253533744
0.45692058393448765
2.518110079438924
2.9641165897760113
1.7579484449772205
0.08623668137211604
2.258048467633224
1.3609438187707756
0.9768760188766823
2.4257713574525708
2.844668402062223
2.3907235649849956
0.17868701272267362
0.11047196070525711


These tools may all seem pretty weak, but they are the foundations of **powerful techniques**.  Scientists use the same fundamental processes to do things like study the performance of nuclear reactors, model galaxies, and simulate protein folding.  Generating random numbers is quite useful in studying quantum mechanics, too!  You may know that quantum-mechanical processes are non-deterministic; the outcome of a measurement occurs randomly according to the probability distribution of all of the possible outcomes.  Computing has allowed us to learn so much about QM that we wouldn't have otherwise.

Random `floats` (remember what those are) are cool, but there are some systems that exhibit discrete behavior.  For example, if I wanted to write a random number generator to simulate you randomly choosing a card from a deck, I wouldn't want the outcome to be that you chose the 31.467459267th card.  That doesn't make sense.

The random module has tools for generating random integers:

The above line generates a random integer between 1 and 100.  This is useful for many types of real-life decisions:

Hey, `python`, how many kids whould I have?

In [37]:
rand.randint(1,100)

69

Ummmmmmm, no.

Hey, `python`, how many tacos should I eat tonight?

In [38]:
rand.randint(1,100)

90

Sage advice.  Now you're talking.

There are lots of other things that one can do with random numbers, but it's probably best to become acquainted with the simple code and then learn more once you have a need.  Let's do a warm-up problem to get the juices flowing.

#### Live demo!
Generate 100 random integers between 0 and 40 or 60 and 100 (i.e., omit any numbers between 41 and 59).  You'll probably need an if statement.  Make sure there are 100!


In [39]:
n_acceptable = 0
while n_acceptable < 100:
    r_num = rand.randint(0, 100)
    if r_num < 40 or r_num > 60:
        n_acceptable = n_acceptable + 1
        print(r_num)

88
83
8
5
62
3
92
8
2
86
70
69
90
87
3
84
100
96
28
3
92
67
71
19
97
93
99
30
14
82
99
37
9
79
34
13
94
91
92
23
73
71
72
4
20
13
86
79
92
3
17
82
27
69
65
29
65
91
3
26
76
78
12
80
13
67
0
32
29
18
69
19
89
37
76
21
62
65
72
5
10
78
71
87
7
85
70
0
62
80
62
99
84
38
85
25
37
80
65
34


* * *

## Lists

Now that we've talked about **types** we can talk about **containers**.  Here's an analogy to motivate us:

> An egg is a type of food.  There are many different containers that we can put eggs in: an egg carton, a box, your hand, a bookbag.  These containers could also be used for other types of food.  *E.g.* I could put raisins in an egg carton, a box, your hand, *etc*.  The appropriate container depends on how we want to prepare or use the food.

That analogy didn't work as well as I'd hoped, but know that most programming languages have containers for packaging collections of values.  

Probably the most useful of these in `python` is the *list*.  A list is an ordered collection of elements.  A list has a first element, a second element, ..., and a last element.  Each element has a number, which we call the index, indicating its place in the list.

NB: In many other languages, similar objects are called *arrays*.  I often accidentally call `python` lists "arrays".  We'll encounter `python` arrays in a future lesson, and youmay or may not see the differences.

The most basic way to initialize a list is the following:

In [40]:
primes_lt20 = [2, 3, 5, 7, 11, 13, 17, 19]

Lists are a big deal in `python`, and the language offers many ways to work with and manipulate them.  First, we can access an individual element of a list by specifying its index number in **square brackets**:

In [43]:
primes_lt20[3]

7

This might look a little weird... after all, the 4th element of the list is 7 (7 is the 4th prime number).

The reason that this returns the fifth element in the list is that python is a "zero-indexed language", meaning that the indices of list elements begin with 0.  

Check this out:

In [44]:
primes_lt20[0]

2

OH *OKAY!*  This is one of those rules that is hard to internalize.  The indices just begin with 0 in python (and many other [but not all!] modern languages).

We can check the length of the list:

In [45]:
len(primes_lt20)

8

There are also a bunch of things that we can do with the index notation:

In [46]:
primes_lt20[0:3]

[2, 3, 5]

The cell above returns a new list that is a subset of the original list from index 0 up to (but not including) index 3.

You can also get the last element in the list by using the index -1:

In [47]:
primes_lt20[-1]

19

From here, you can get creative.

In [48]:
primes_lt20[-3]

13

We can also add to (append, insert) or remove from (pop, remove) a list:

In [49]:
print(primes_lt20)
primes_lt20.append(23)
print(primes_lt20)

[2, 3, 5, 7, 11, 13, 17, 19]
[2, 3, 5, 7, 11, 13, 17, 19, 23]


In [50]:
print(primes_lt20)
primes_lt20.insert(3,301)
print(primes_lt20)

[2, 3, 5, 7, 11, 13, 17, 19, 23]
[2, 3, 5, 301, 7, 11, 13, 17, 19, 23]


In [51]:
print(primes_lt20)
primes_lt20.pop(3)
print(primes_lt20)

[2, 3, 5, 301, 7, 11, 13, 17, 19, 23]
[2, 3, 5, 7, 11, 13, 17, 19, 23]


In [52]:
print(primes_lt20)
primes_lt20.pop()
print(primes_lt20)

[2, 3, 5, 7, 11, 13, 17, 19, 23]
[2, 3, 5, 7, 11, 13, 17, 19]


In [53]:
print(primes_lt20)
primes_lt20.remove(11)
print(primes_lt20)

[2, 3, 5, 7, 11, 13, 17, 19]
[2, 3, 5, 7, 13, 17, 19]


That last one is a little funky.  `remove(elem)` removes the first element in a list that is equal to `elem`.

***

`python` has all kinds of cool built-in functions for dealing with lists.  For example, we can loop over the elements in a list using `for`:

In [54]:
for num in primes_lt20:
    print(num, num == 13)

2 False
3 False
5 False
7 False
13 True
17 False
19 False


We cas ask `python` how many elements are in a list:

In [None]:
print(len(primes_lt20))

We can ask `python` whether a value appears in a list:

In [56]:
12 in primes_lt20

False

... or we can ask `python` *where* a value appears in a list:

In [57]:
primes_lt20.index(13)

4

Here is a structure that I use all the time (and you will after our next lesson).  Let's say that I have a list of random numbers, I may not know how many, and want to loop over these numbers and square them.  Please try to understand each of the lines below. 

In [61]:
# let's first create the list
list1 = []    
for i in range(rand.randint(1, 100)):
    list1.append(rand.random())

print("We have " + str(len(list1)) + " elements in our list.\n")

# then we'll do a for loop
for i in range(len(list1)): 
    num = list1[i]
    print(i, '\t', list1[i], '\t', (list1[i])**2)

We have 42 elements in our list.

0 	 0.33790915291579027 	 0.11418259562426693
1 	 0.43261161768083645 	 0.1871528117524302
2 	 0.5143229846810862 	 0.2645281325712608
3 	 0.9143695887756409 	 0.8360717448777346
4 	 0.8187195108583851 	 0.6703016374601934
5 	 0.25935667359390235 	 0.06726588413769401
6 	 0.8987272992750669 	 0.8077107584622557
7 	 0.7046964128101716 	 0.49659703422752377
8 	 0.7733861879542079 	 0.5981261957183414
9 	 0.7348588378075896 	 0.5400175115039213
10 	 0.45849321326405623 	 0.21021602660919936
11 	 0.1661815088338643 	 0.027616293878299714
12 	 0.2575457644577369 	 0.0663298207901201
13 	 0.6157489985976395 	 0.3791468292739959
14 	 0.2228935977718317 	 0.0496815559276711
15 	 0.057838330701021734 	 0.0033452724982807533
16 	 0.955341721765989 	 0.9126778053468043
17 	 0.7501107150978673 	 0.5626660849046339
18 	 0.8030766632527737 	 0.644932127061209
19 	 0.9177345326410229 	 0.8422366724018368
20 	 0.6269027022238646 	 0.3930069980555834
21 	 0.06357006854

The block above makes a list containing a random number of random numbers, and then "runs over" the elements in this list and squares each.

For my last list trick, I'll show you that `python` knows how to do some simple operations with multiple lists.  In some cases, the outcomes are obvious.  In other situations, python's interpretation of the syntax may be surprising:

In [62]:
aL = [1, 6, 3, 7, 9, 2, 4, 6, 7]
bL = [7, 8, 9, 0]

aL + bL

[1, 6, 3, 7, 9, 2, 4, 6, 7, 7, 8, 9, 0]

In [63]:
aL.sort()
print(aL)

[1, 2, 3, 4, 6, 6, 7, 7, 9]


It's also worth noting that `python` is perfectly happy with lists that contain different types of data.  For example, some programming languages would freak out about the following:

In [65]:
mixed_list = [1, 2, 4.5, -6.2, 'a phrase', 99, "hey, another phrase"]
print(mixed_list)
mixed_list.sort()

[1, 2, 4.5, -6.2, 'a phrase', 99, 'hey, another phrase']


TypeError: '<' not supported between instances of 'str' and 'int'

* * *

## Dictionaries

Lists are ordered collections of data -- "ordered" means that the order of the elements is an important feature of the structure, and indexing these elements with an integer makes sense.  `python` also has a way (actual several ways/containers) of storing data for which ordering isn't that important (but still exists).  

A **dictionary** is like a list, but the elements are indexed by strings called **key**s.  In a `dict`, each **value** is associated with a **key**.  This might sound crazy, but it works quite a bit like a simple 2-column data table.  Dictionaries are initialized in the following way:

In [None]:
students_year = {'Abdoul': 2, 'Aiden': 2, 'Aimee': 2, 'Gannon': 2, 
                 'Gaurav': 2, 'Gia': 2, 'Maddy': 1, 'Alex': 4}

print(type(students_year))

We can access the elements in the dictionary in the same way as for a list, but now the "indices" are the key strings:

In [None]:
students_year['Abdoul']

Note that there are **two** pieces of important information for each element in a dictionary: the key and the value.  Because order is less imporant for a dictionary, there are several ways of looping over the elements in it.  Check out these implementations:

In [None]:
for k in students_year.keys():
    print(k)

# for v in students_year.values():
#     print(v)
    
# for key, value in students_year.items():
#     print(key, 'is my favorite student in year', value, '!')

Note that even though ordering is not important in a dictionary, there seems to be an assumed order, *viz.* the order in which we initialized it.  There are various other fancy ways to iterate over the dictionary.  Some of these impose an index on the dictionary.  Here's an example:

In [None]:
for i, (k, v) in enumerate(students_year.items()):
    print(i, k, v)

* * *
## Live demos!

#### 1. Max finder
Write a program that generates a list of 100 random integers between 0 and 10^6 (inclusive), and then finds the largest number in the list.  Your program should NOT use any built-in methods/functions like `max`.

In [75]:
rand_list = []
for i in range(100):
    r_num = rand.randint(0, 1000001)
    rand_list.append(r_num)

largest_num = 0
for num in rand_list:
    if num > largest_num:
        largest_num = num

print(largest_num)

998447


#### 2. A simple card game simulation
Think of a deck of cards as containing cards numbered 1 (ace) through 13 (king) of each suit for a total of 52 cards.  
- First, generate a list that contains the numbers of the cards.
- Write a block of code that randomly selects three cards from the deck without replacing each card.  (Be careful!)
- Check whether the cards drawn contain a pair.
- Do the above steps $N = 5000$ times by enclosing them in a loop.  Count the number of times you drew a pair and call this number $p$.
- Calculate the probability of drawing a pair in three cards by dividing $p/N$.