# Lecture 1 - Optimization and the Knapsack Problem
## Exercise 1
6/6 points (graded)

As a burgler robs a house, she finds the following items:

* Dirt - Weight: 4, Value: 0
* Computer - Weight: 10, Value: 30
* Fork - Weight: 5, Value: 1
* Problem Set - Weight: 0, Value: -10

This time, she can only carry a weight of 14, and wishes to maximize the value to weight ratio of the things she carries. She employs three different metrics in an attempt to do this, and writes an algorithm in Python to determine which loot to take.

The algorithm works as follows:

1. Evaluate the metric of each item. Each metric returns a numerical value for each item.
2. For each item, from highest metric value to lowest, add the item if there is room in the bag.

Describe the heuristic that each of the following 3 metrics uses, and choose the result of running the algorithm with each metric.

1. Metric 1:
```py:
def metric1(item):
    return item.getValue() / item.getWeight()
```
Which heuristic does Metric 1 employ?

- [ ] Choose the lightest object first.
- [ ] Choose the most valuable object first.
- [x] Choose the item with the best value to weight ratio first.

What will be the result of running the burgler's algorithm with Metric 1?

- [ ] The algorithm runs and returns the optimal solution.
- [ ] The algorithm runs and returns a non-optimal solution.
- [x] The algorithm does not run.

2. Metric 2:
```py:
def metric2(item):
    return -item.getWeight()
```

Which heuristic does Metric 2 employ?

- [x] Choose the lightest object first.
- [ ] Choose the most valuable object first.
- [ ] Choose the item with the best value to weight ratio first.

What will be the result of running the burgler's algorithm with Metric 2?

- [ ] The algorithm runs and returns the optimal solution.
- [x] The algorithm runs and returns a non-optimal solution.
- [ ] The algorithm does not run.

3. Metric 3:
```py:
def metric3(item):
    return item.getValue()
```

Which heuristic does Metric 3 employ?

- [ ] Choose the lightest object first.
- [x] Choose the most valuable object first.
- [ ] Choose the item with the best value to weight ratio first.

What will be the result of running the burgler's algorithm with Metric 3?

- [ ] The algorithm runs and returns the optimal solution.
- [x] The algorithm runs and returns a non-optimal solution.
- [ ] The algorithm does not run.

# Lecture 1 - Optimization and the Knapsack Problem
## Exercise 2
6/6 points (graded)

Please help the burglar out! For each of the following greedy metrics, what should be the burglar's first two choices of items? Here's a table of the items from the slides:

| **item** | **$** | **kg** | **$/kg** |
| :------: | :---: | :----: | :------: |
| clock	   |  175  |   10   |    17.5  |
| picture  |   90  |    9   |    10    |
| radio    |   20  |    4   |     5    |
| vase     |   50  |    2   |    25    |
| book     |   10  |    1   |    10    |
| computer |  200  |   20   |    10    |

For this problem, assume that the maximum weight the burglar can carry is 20.

1. Metric: max value

The burglar should first pick:
* `computer`

and should next pick:
* `no more space`

2. Metric: min weight

The burglar should first pick:
* `book`

and should next pick:
* `vase`

3. Metric: max value/weight ratio

The burglar should first pick:
* `vase`

and should next pick:
* `clock`

# Lecture 1 - Optimization and the Knapsack Problem
## Exercise 3
3/3 points (graded)

For these questions, you'll be asked to give the big-O upper bound runtime for some Python functions. In your answer, please omit the "O( )" portion of the answer and simply write a mathematical expression.

Use +, -, / signs to indicate addition, subtraction, and division. Explicitly indicate multiplication with a * (ie say "6*n" rather than "6n"). Indicate exponentiation with a caret (^) (ie "n^4" for `n**4`). Indicate base-2 logarithms with the word log2 followed by parenthesis (ie "log2(n)").

What this all means is if an answer is, for example, `O(n**4)`, please simply write "n^4" in the box.

What is the big-O upper bound runtime for the function look_for_things, where n is defined as the number of elements in myList?
```py:
NUMBER = 3
def look_for_things(myList):
    """Looks at all elements"""
    for n in myList:
        if n == NUMBER:
            return True
    return False
```
* `n`
 
2. What is the big-O upper bound runtime for the function `look_for_other_things`, where `n` is defined as the number of elements in `myList`?
```py:
NUMBER = 3
def look_for_other_things(myList):
    """Looks at all pairs of elements"""
    for n in myList:
        for m in myList:
            if n - m == NUMBER or m - n == NUMBER:
                return True
    return False
```
* `n^2`
 
3. What is the big-O upper bound runtime for the function `look_for_all_the_things`, where `n` is defined as the number of elements in `myList`?

You do not need to account for the runtime of `get_all_subsets`; the code is provided so you can see how many subsets it generates, as that **will** be a factor in your answer.
```py:
def get_all_subsets(some_list):
    """Returns all subsets of size 0 - len(some_list) for some_list"""
    if len(some_list) == 0:
        # If the list is empty, return the empty list
        return [[]]
    subsets = []
    first_elt = some_list[0]
    rest_list = some_list[1:]
    # Strategy: Get all the subsets of rest_list; for each
    # of those subsets, a full subset list will contain both
    # the original subset as well as a version of the subset
    # that contains first_elt
    for partial_subset in get_all_subsets(rest_list):
        subsets.append(partial_subset)
        next_subset = partial_subset[:] + [first_elt]
        subsets.append(next_subset)
    return subsets

NUMBER = 3
def look_for_all_the_things(myList):
    """Looks at all subsets of this list"""
    # Make subsets
    all_subsets = get_all_subsets(myList)
    for subset in all_subsets:
        if sum(subset) == NUMBER:
            return True
    return False
```
* `2^n`

# Lecture 2 - Decision Trees and Dynamic Programming
## Exercise 1
10.0/10.0 points (graded)

Here is the [lecture from 6.00.1x on generators](https://www.youtube.com/watch?v=BLWn90kEYMk). Additionally, you can also take a look at Chapter 8.3 in the textbook.

For the following problem, consider the following way to write a power set generator. The number of possible combinations to put n items into one bag is 2**n`. Here, items is a Python list. If need be, also check out the [docs on bitwise operators](https://wiki.python.org/moin/BitwiseOperators) (<<, >>, &, |, ~, ^).

generate all combinations of N items
```py:
def powerSet(items):
    N = len(items)
    # enumerate the 2**N possible combinations
    for i in range(2**N):
        combo = []
        for j in range(N):
            # test bit jth of integer i
            if (i >> j) % 2 == 1:
                combo.append(items[j])
        yield combo
```
As above, suppose we have a generator that returns every combination of objects in one bag. We can represent this as a list of 1s and 0s denoting whether each item is in the bag or not.

Write a generator that returns every arrangement of items such that each is in one or none of two different bags. Each combination should be given as a tuple of two lists, the first being the items in bag1, and the second being the items in bag2.
```py:
def yieldAllCombos(items):
    """
      Generates all combinations of N items into two bags, whereby each 
      item is in one or zero bags.

      Yields a tuple, (bag1, bag2), where each bag is represented as 
      a list of which item(s) are in each bag.
    """
```
Note this generator should be pretty similar to the powerSet generator above.

We mentioned that the number of possible combinations for N items into one bag is . How many possible combinations exist when there are two bags? Think about this for a few minutes, then click the following hint to confirm if your guess is correct. Remember that a given item can only be in bag1, bag2, or neither bag -- it is not possible for an item to be present in both bags!

<details>
<summary>How many possible combinations exist for N items into two bags?</summary>
<br>

* With two bags, there are `3**N` possible combinations available.
* With one bag we determined there were `2**N` possible combinations available by representing the bag as a list of binary bits, 0 or 1. Since there are N bits, and they can be one of two possibilities, there must be `2**N` possibilities.
* With two bags there thus must be `3**N` possible combinations. You can imagine this by representing the two bags as a list of "trinary" bits, 0, 1, or 2 (a 0 if an item is in neither bag; 1 if it is in bag1; 2 if it is in bag2). With the "trinary" bits, there are N bits that can each be one of three possibilities - thus there must be `3**N` possible combinations.
</details>

In [None]:
import random


class Item(object):
    def __init__(self, n, v, w):
        self.name = n
        self.value = float(v)
        self.weight = float(w)

    def getName(self):
        return self.name

    def getValue(self):
        return self.value

    def getWeight(self):
        return self.weight

    def __str__(self):
        return '<' + self.name + ', ' + str(self.value) + ', '\
                     + str(self.weight) + '>'


def buildItems():
    return [Item(n,v,w) for n,v,w in (('clock', 175, 10),
                                      ('painting', 90, 9),
                                      ('radio', 20, 4),
                                      ('vase', 50, 2),
                                      ('book', 10, 1),
                                      ('computer', 200, 20))]


def buildRandomItems(n):
    return [Item(str(i),10*random.randint(1,10),random.randint(1,10))
            for i in range(n)]


def yieldAllCombos(items):
    """
        Generates all combinations of N items into two bags, whereby each
        item is in one or zero bags.

        Yields a tuple, (b1, b0), where each bag is represented as a list
        of which item(s) are in each bag.
    """
    length = len(items)
    for i in range(3**length):
        b1, b0 = [], []
        for j in range(length):
            rem = i // 3**j % 3
            if rem == 1:
                b1.append(items[j])
            elif not rem:
                b0.append(items[j])
        yield b1, b0


print(f'{"Bag 1":<38} | {"Bag 0":>38}')
for bo, bz in yieldAllCombos(buildItems()):
    o = ' '.join([i.getName() for i in bo]) if bo else '<NOTHING>'
    z = ' '.join([i.getName() for i in bz]) if bz else '<NOTHING>'
    print(f'{o:<39}|{z:>39}')

# Lecture 2 - Decision Trees and Dynamic Programming
## Exercise 2
4/4 points (graded)

1. Dynamic programming can be used to solve optimization problems where the size of the space of possible solutions is exponentially large.

- [x] True
- [ ] False

2. Dynamic programming can be used to find an approximate solution to an optimization problem, but cannot be used to find a solution that is guaranteed to be optimal.

- [ ] True
- [x] False

3. Recall that sorting a list of integers can take  using an algorithm like merge sort. Dynamic programming can be used to reduce the order of algorithmic complexity of sorting a list of integers to something below , where n is the length of the list to be sorted.

- [ ] True
- [x] False

4. Problem: I need to go up a flight of  stairs. I can either go up 1 or 2 steps every time. How many different ways are there for me to traverse these steps? For example:
```
3 steps -> could be 1,1,1 or 1,2 or 2,1
4 steps -> could be 1,1,1,1 or 1,1,2 or 1,2,1 or 2,1,1 or 2,2
5 steps -> could be 1,1,1,1,1 or 1,1,1,2 or 1,1,2,1 or 1,2,1,1 or 2,1,1,1 or 2,2,1 or 1,2,2 or 2,1,2
```
Does this problem have optimal substructure and overlapping subproblems?

- [x] It has optimal substructure and overlapping subproblems
- [ ] It doe not have optimal substructure and does not have overlapping subproblems
- [ ] It has optimal substructure and does not have overlapping subproblems
- [ ] It does not have optimal substructure and it has overlapping subproblems

# Lecture 3 - Graph Problems
## Exercise 1
2/2 points (graded)

We often use graphs to simplify optimization problems, as they are easy implement on a computer.

The following concepts can be illustrated with a graph. Determine which variables should be represented by edges and vertices in this graph.

1. A school's course catalog

Some classes must occur at least one semester before certain other classes (e.g., Calculus I must be taken before Calculus II), but not all classes have prerequisites.

If we want to represent the catalog as a graph, which variables should be represented as edges and vertices?

- [ ] A) Each edge is a class, while different vertices indicate the semester the class is taken.
- [x] B) Each vertex is a class, while a directional edge indicates that one class must come before another.
- [ ] C) Each vertex is a class, while edges between two vertices indicate that the classes may be taken at the same time.

2. Students in a line

Second graders are lining up to go to their next class, but must be ordered alphabetically before they can leave. The teacher only swaps the positions of two students that are next to each other in line.

If we want to represent this situation as a graph, which variables should be represented as edges and vertices?

- [x] A) Vertices represent permutations of the students in line. Edges connect two permutations if one can be made into the other by swapping two adjacent students.

- [ ] B) Vertices represent students. Edges connect two students if they are next to each other in line.

- [ ] C) Vertices represent permutations of the students, and each edge represents an individual student. An edge connects two vertices if that student is involved in swap between the two permutations.
unanswered

# Lecture 3 - Graph Problems
## Exercise 2
10.0/10.0 points (graded)

Consider our representation of permutations of students in a line from Exercise 1. (The teacher only swaps the positions of two students that are next to each other in line.) Let's consider a line of three students, Alice, Bob, and Carol (denoted A, B, and C). Using the Graph class created in the lecture, we can create a graph with the design chosen in Exercise 1: vertices represent permutations of the students in line; edges connect two permutations if one can be made into the other by swapping two adjacent students.

We construct our graph by first adding the following nodes:
```py:
nodes = []
nodes.append(Node("ABC")) # nodes[0]
nodes.append(Node("ACB")) # nodes[1]
nodes.append(Node("BAC")) # nodes[2]
nodes.append(Node("BCA")) # nodes[3]
nodes.append(Node("CAB")) # nodes[4]
nodes.append(Node("CBA")) # nodes[5]

g = Graph()
for n in nodes:
    g.addNode(n)
```
Add the appropriate edges to the graph.

<details>
<summary>Hint: How to get started?</summary>
<br>

Write your code in terms of the `nodes` list from the code above. For each node, think about what permutation is allowed. A permutation of a set is a rearrangement of the elements in that set. In this problem, you are only adding edges between nodes whose permutations are between elements in the set beside each other . For example, an acceptable permutation (edge) is between "ABC" and "ACB" but not between "ABC" and "CAB".
</details>

In [None]:
class Node:
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return str(self.value)

class Edge:
    def __init__(self, left, right):
        self.left = left
        self.right = right

class Graph:
    def __init__(self):
        self.nodes = []
        self.edges = []

    def addNode(self, node):
        self.nodes.append(node)

    def addEdge(self, edge):
        self.edges.append(edge)

nodes = []
nodes.append(Node("ABC")) # nodes[0]
nodes.append(Node("ACB")) # nodes[1]
nodes.append(Node("BAC")) # nodes[2]
nodes.append(Node("BCA")) # nodes[3]
nodes.append(Node("CAB")) # nodes[4]
nodes.append(Node("CBA")) # nodes[5]

g = Graph()
for n in nodes:
    g.addNode(n)

g.addEdge(Edge(nodes[0], nodes[1]))
g.addEdge(Edge(nodes[0], nodes[2]))
g.addEdge(Edge(nodes[1], nodes[4]))
g.addEdge(Edge(nodes[2], nodes[3]))
g.addEdge(Edge(nodes[3], nodes[5]))
g.addEdge(Edge(nodes[4], nodes[5]))

for edge in g.edges:
    print(edge.left, edge.right)

# Lecture 3 - Graph Problems
## Exercise 3
4/4 points (graded)

1. For questions 1 and 2, consider our previous problem (permutations of 3 students in a line). 

When represented as a tree, each node will have how many children?
* `2`

2. Given two permutations, what is the maximum number of swaps it will take to reach one from the other?
* `3`

3. For questions 3 and 4, consider the general case of our previous problem (permutations of n students in a line). Give your answer in terms of n.

When represented as a tree, each node will have how many children?
* `n-1`
 
4. Given two permutations, what is the maximum number of swaps it will take to reach one from the other?
* `(n^2 - n) / 2

# Lecture 3 - Graph Problems
## Exercise 4
7/7 points (graded)

Consider our continuing problem of permutations of three students in a line. Use the enumeration we established when adding the nodes to our graph. That is,

```py:
nodes = []
nodes.append(Node("ABC")) # nodes[0]
nodes.append(Node("ACB")) # nodes[1]
nodes.append(Node("BAC")) # nodes[2]
nodes.append(Node("BCA")) # nodes[3]
nodes.append(Node("CAB")) # nodes[4]
nodes.append(Node("CBA")) # nodes[5]
```

so that ABC is Node 0, BCA is Node 3, etc.

Using Depth First Search, and beginning at the listed source nodes, give the first path found to the listed destination nodes. For the purpose of this exercise, assume DFS prioritizes lower numbered nodes. For example, if Node 2 is connected to Nodes 3 and 4, the first path checked will be 23. Additionally, DFS will never return to a node already in its path.

To denote a path, simply list the numbers of the nodes exactly as done in the lecture.

<details>
<summary>Hint: Visual representation</summary>
<br>

You can never go wrong with drawing a picture of the problem. Here is one possible visualization. The possible permutations are denoted in the graph below. From each node, you can choose to go in either direction. In this particular depth-first-search problem, you will choose the lower numbered node over the higher numbered one, even if it will lead to a longer path from the source to the destination.
<p align="center"><img src="https://courses.edx.org/assets/courseware/v1/0cc1a7fc9161594df8ae57529a09849b/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/l9p4.png"></p>
</details>

1. Source: 0, Destination: 4
* `014`

2. Source: 4, Destination: 1
* `41`

3. Source: 1, Destination: 1
* `1`

4. Source: 2, Destination: 4
* `2014`

5. Source: 2, Destination: 3
* `201453`

6. Source: 3, Destination: 1
* `3201`

We saw before that for permutations of 3 people in line, any two nodes are at most three edges, or four nodes, away. But DFS has yielded paths longer than three edges! In this graph, given a random source and a random destination, what is the probability of DFS finding a path of the shortest possible length?
* `2/3`

# Lecture 3 - Graph Problems
## Exercise 5
5/5 points (graded)

**Challenge Problem!** This problem is difficult and may stump you, but we include it because it is very interesting, especially for those who are more mathematically inclined.

Don't worry if you can't get all the math behind it, and don't get discouraged. Remember that you do not lose points for trying a problem multiple times, nor do you lose points if you hit "Show Answer". If this problem has you stumped after you've tried it, feel free to reveal the solution and read our explanations.

In the following examples, assume all graphs are undirected. That is, an edge from A to B is the same as an edge from B to A and counts as exactly one edge.

A **clique** is an unweighted graph where each node connects to all other nodes. We denote the clique with `n` nodes as **KN**. Answer the following questions in terms of `n`.

1. How many edges are in KN?
* `(n^2 - n) / 2`

2. Consider the new version of DFS. This traverses paths until all non-circular paths from the source to the destination have been found, and returns the shortest one.

Let A be the source node, and B be the destination in **KN**. How many paths of length 2 exist from A to B?
* `n - 2`

3. How many paths of length 3 exist from A to B?
* `(n - 2) * (n - 3)`

Continuing the logic used above, calculate the number of paths of length `m` from A to B, where `1 <= m <= (n - 1)`, and write this number as a ratio of factorials.

To indicate a factorial, please enter `fact(n)` to mean `n!`; `fact(n+2)` to mean `(n + 2)!`, etc.

### The 'logic above' from the last part of the problem
<details>
<summary>Click to see the solution for the previous problem, if you want some guidence on how to think about this problem part</summary>
<br>

Answer: `(n - 2) * (n - 3)`

Use the same reasoning as used for the previous problem. After knowing our source and destination, we must travel through 2 additional nodes without touching any node twice. For the first node, we have `n - 2` choices, and for the second, we have `n - 3` choices.

Note that this is equivalent to `(n - 2)! / (n - 4)!`
</details>

* `fact(n - 2) / fact(n - m - 1)`

5. Using the fact that for any n, `(1 / 0!) + (1 / 1!) + (1 / 2!) + ... + (1 / n!) <= e` for all `n`, where `e` is some constant, determine the asymptotic bound on the number of paths explored by DFS. For simplicity, write `O(n)` as just `n`, `O(n**2)` as `n^2`, etc.
* `fact(n - 2)`

# Lecture 3 - Graph Problems
## Exercise 6
5/5 points (graded)

In the following examples, assume all graphs are undirected. That is, an edge from A to B is the same as an edge from B to A and counts as exactly one edge.

A **clique** is an unweighted graph where each node connects to all other nodes. We denote the clique with `n` nodes as **KN**. Answer the following questions in terms of `n`.

1. What is the asymptotic worst-case runtime of a Breadth First Search on KN? For simplicity, write `O(n)` as just `n`, `O(n**2)` as `n^2`, etc.
* `n`

2. BFS will always run faster than DFS.
- [ ] True
- [x] False

3. If a BFS and DFS prioritize the same nodes (e.g., both always choose to explore the lower numbered node first), BFS will always run at least as fast as DFS when run on two nodes in KN.
- [x] True
- [ ] False

4. If a BFS and Shortest Path DFS prioritize the same nodes (e.g., both always choose to explore the lower numbered node first), BFS will always run at least as fast as Shortest Path DFS when run on two nodes in any connected unweighted graph.
- [x] True
- [ ] False

5. Regardless of node priority, BFS will always run at least as fast as Shortest Path DFS on two nodes in any connected unweighted graph.
- [x] True
- [ ] False

# Lecture 3 - Graph Problems
## Exercise 7
10.0/10.0 points (graded)

Consider once again our permutations of students in a line. Recall the nodes in the graph represent permutations, and that the edges represent swaps of adjacent students. We want to design a weighted graph, weighting edges higher for moves that are harder to make. Which of these could be easily implemented by simply assigning weights to the edges already in the graph?

- [x] A) A large student who is difficult to move around in line.
- [x] B) A sticky spot on the floor which is difficult to move onto and off of.
- [ ] C) A student who resists movement to the back of the line, but accepts movement toward the front.

Write a `WeightedEdge` class that extends `Edge`. Its constructor requires a weight parameter, as well as the parameters from Edge. You should additionally include a `getWeight` method. The string value of a `WeightedEdge` from node A to B with a weight of 3 should be "A->B (3)".

```py:
class WeightedEdge(Edge):
    def __init__(self, src, dest, weight):
        # Your code here
        pass
    def getWeight(self):
        # Your code here
        pass
    def __str__(self):
        # Your code here
        pass
```

In [None]:
class WeightedEdge(Edge):
    def __init__(self, src, dest, weight):
        self.src = src
        self.dest = dest
        self.weight = weight

    def getWeight(self):
        return self.weight
        
    def __str__(self):
        return '{}->{} ({})'.format(self.src, self.dest, self.weight)

# Lecture 3 - Graph Problems
## Lab: Graphs
This is an optional lab component to the lecture. Play with it and explore!

> Info
> In this lab, we will be visualizing distances in a graph.
> 
> [Dijkstra's algorithm](http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) is a general method to find the shortest distances from a node to all other nodes in a graph. We provide a Javascript implementation of the algorithm below. Initially we set the connection probability to 0.67; that is, any possible connection in the graph has a 2/3 chance of appearing. The graph is interactive; you can add new edges by clicking and dragging from one node to another (note that you can only add edges between neighbors). You can also remove connections by clicking on them. If you click on a node, the distance will be color-coded into all the nodes; an "infinite" distance (i.e. nodes that are impossible to reach) will be shown as bright blue, and the source node itself (distance 0) will be bright red.
> 
> Try to play around with the graph. What kind of an edge would make the highest impact on distances? Try to create interesting scenarios by setting the connection probability to 0, and creating a graph from scratch.

# Lecture 4 - Plotting

# Lecture 5 - Stochastic Thinking
## Exercise 1
### Exercise 1-1
6/6 points (graded)

For the following explanations of different types of programmatic models, fill in the blank with the appropriate model the definition describes.

* A ______ model is one whose behavior is entirely predictable. Every set of variable states is uniquely determined by parameters in the model and by sets of previous states of these variables. Therefore, these models perform the same way for a given set of initial conditions, and it is possible to predict precisely what will happen.
  * `deterministic`

* A ________ model is one in which randomness is present, and variable states are not described by unique values, but rather by probability distributions. The behavior of this model cannot be entirely predicted.
  * `stochastic`

* A _______ model does not account for the element of time. In this type of model, a simulation will give us a snapshot at a single point in time.
  * `static`

* A _______ model does account for the element of time. This type of model often contains state variables that change over time.
  * `dynamic`

* A _______ model does not take into account the function of time. The state variables change only at a countable number of points in time, abruptly from one state to another.
  * `discrete`

* A ______ model does take into account the function of time, typically by modelling a function f(t) and the changes reflected over time intervals. The state variables change in an unbroken way through an infinite number of states.
  * `continuous`

### Exercise 1-2
3/3 points (graded)

* If you are using differential equations to model a simulation, are you more likely to be doing a discrete or continuous model?
- [ ] Discrete
- [x] Continuous

* Let's say you run a stochastic simulation 100 times. How many times do you need to run the simulation again to get the same result?
- [ ] 1 time
- [ ] 99 times
- [ ] 100 times
- [ ] 101 times
- [ ] All of the above will give you the same result.
- [x] None will necessarily give you the same result.

* Which modelling system would be best to model a bank account?
- [ ] Discrete
- [ ] Continuous
- [x] Either discrete or continuous would work, depending on the specifics of the model you wish to use.

# Lecture 5 - Stochastic Thinking
## Exercise 2
0.0/5.0 points (graded)

This problem asks you to write a short function that uses the the random module. Click on the above link to be taken to the Python docs on the random module, where you can see all sorts of cool functions the module provides.

The random module has many useful functions - play around with them in your interpreter to see how much you can do! To test this code yourself, put the line import random at the top of your code file, to import all of the functions in the random module. To call random module methods, preface them with random., as in this sample interpreter session:
```
>>> import random
>>> random.randint(1, 5)
4
>>> random.choice(['apple', 'banana', 'cat'])
'cat'
```
How would you randomly generate an even number `x`, `0 <= x < 100`? Fill out the definition for the function `genEven()`. Please generate a uniform distribution over the even numbers between 0 and 100 (not including 100).
```python:
def genEven():
    '''
    Returns a random number x, where 0 <= x < 100
    '''
    # Your code here
```

In [None]:
import random


def genEven():
    '''
    Returns a random number x, where 0 <= x < 100
    '''
    return random.randrange(0, 100, 2)

# Lecture 5 - Stochastic Thinking
## Exercise 3
### Exercise 3-1
5/5 points (graded)

Write a deterministic program, deterministicNumber, that returns an even number between 9 and 21.
```python:
def deterministicNumber():
    '''
    Deterministically generates and returns an even number between 9 and 21
    '''
    # Your code here
```

### Exercise 3-2
5/5 points (graded)

Write a uniformly distributed stochastic program, stochasticNumber, that returns an even number between 9 and 21.
```python:
def stochasticNumber():
    '''
    Stochastically generates and returns a uniformly distributed even number between 9 and 21
    '''
    # Your code here
```

In [None]:
import random


def deterministicNumber():
    '''
    Deterministically generates and returns an even number between 9 and 21
    '''
    random.seed(0)
    return random.choice((10, 12, 14, 16, 18, 20))

In [None]:
import random


def stochasticNumber():
    '''
    Stochastically generates and returns a uniformly distributed even number between 9 and 21
    '''
    return random.randint(5, 10) * 2

# Lecture 5 - Stochastic Thinking
## Exercise 4
3/3 points (graded)

1. Are the following two distributions equivalent?
```python:
import random
def dist1():
    return random.random() * 2 - 1

def dist2():
    if random.random() > 0.5:
        return random.random()
    else:
        return random.random() - 1 
```
- [x] Yes
- [ ] No

2. Are the following two distributions equivalent?
```python:
import random
def dist3():
    return int(random.random() * 10)

def dist4():
    return random.randrange(0, 10)
```
- [x] Yes
- [ ] No

3. Are the following two distributions equivalent?
```python:
import random
def dist5():
    return int(random.random() * 10)

def dist6():
    return random.randint(0, 10)
```
- [ ] Yes
- [x] No

# Lecture 5 - Stochastic Thinking
## Exercise 5
10/10 points (graded)

In this problem, we're going to calculate some probabilities of dice rolls. Imagine you have two fair four-sided dice (if you've never seen one, [here's a picture](https://courses.edx.org/assets/courseware/v1/fb089d94485d7a3adc6951358b07f90f/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_d4-translucent-red.jpg). The result, a number between 1 and 4, is displayed at the top of the die on each of the 3 visible sides). 'Fair' here means that there is equal probability of rolling any of the four numbers.

You can answer the following questions in one of two ways - you can calculate the probability directly, or, if you're having trouble, you can simply write out the entire [sample space](https://en.wikipedia.org/wiki/Sample_space) for the problem. A sample space is defined as a listing of all possible outcomes of a problem, and it can be written in many ways - a tree or a grid are popular options. For example, here is a diagram of the [sample space for 3 coin tosses](https://courses.edx.org/assets/courseware/v1/b6e4ea1e4183e95cfbce003ee9675ab1/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_coinTossSampleSpace.png).

Some vocabulary before we begin: an **event** is a subset of the sample space, or, a collection of possible outcomes. A **probability function** assigns an event, *A*, a probability *P(A)* that represents the likelihood of event *A* occuring.

As an example, let's say we flip a coin. Define the event *H* as the event that the coin comes up heads. We can assign the probability *P(H)* = 1/2; the likelihood that event *H* occurs.

The following problems will ask for the probability that a given event occurs.

1. What is the size of the sample space for one roll of a four sided die? `4`
2. What is the size of the sample space for two rolls of a four sided die? `16`
3. Assume we roll 2 four sided dice. What is P({sum of the rolls is even})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/2`
4. Assume we roll 2 four sided dice. What is P({rolling a 2 followed by a 3})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/16`
5. Assume we roll 2 four sided dice. What is P({rolling a 2 and a 3, in any order})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/8`
6. Assume we roll 2 four sided dice. What is P({sum of the rolls is odd})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/2`
7. Assume we roll 2 four sided dice. What is P({first roll equal to second roll})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/4`
8. Assume we roll 2 four sided dice. What is P({first roll larger than second roll})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `3/8`
9. Assume we roll 2 four sided dice. What is P({at least one roll is equal to 4})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `7/16`
10. Assume we roll 2 four sided dice. What is P({neither roll is equal to 4})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `9/16`

# Lecture 5 - Stochastic Thinking
## Exercise 6
13/13 points (graded)

In this problem, we're going to calculate some various probabilities.

1. What is the size of the sample space for two rolls of a ten sided die? `100`
2. What is the size of the sample space for three rolls of an eight sided die? `512`
3. What is the size of the sample space for drawing one card from a deck of 52 cards? `52`
4. What is the size of the sample space for drawing one card from each of two decks of 52 cards? That is, drawing one card from one deck of cards, then a second card from a second deck of cards. `2704`
5. Assume we roll 2 ten sided dice. What is P({rolling a 2 followed by a 3})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/100`
6. Assume we roll 2 ten sided dice. What is P({first roll larger than second roll})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `9/20`
7. Assume we roll 3 eight sided dice. What is P({all three rolls are equal})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/64`
8. A [standard deck of cards](http://en.wikipedia.org/wiki/Standard_52-card_deck) contains 52 cards, 13 each of four suits - diamonds, clubs, hearts, and spades. Each suit contains one of 13 cards: A (ace), 2, 3, 4, 5, 6, 7, 8, 9, 10, J (jack), Q (queen), K (king). Given one deck of 52 playing cards, you flip one . over. Assuming a fair deck,what is P({ace of hearts})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/52`
9. Given one deck of 52 playing cards, you flip one over. Assuming a fair deck, what is P({drawing a card with suit spades})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/4`
10. Given one deck of 52 playing cards, you flip one over. Assuming a fair deck, what is P({ace of any suit})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/13`
11. Given one deck of 52 playing cards, you flip one over. Assuming a fair deck, what is P({any card except an ace})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `12/13`
12. Given one deck of 52 playing cards, you draw two random cards. (The cards are drawn at the same time, so the selection is considered without replacement) Assuming a fair deck, what is P({both cards are aces})? Answer in reduced fraction form - eg 1/5 instead of . 2/10. `1/221`
13. Given two decks of 52 playing cards, you flip one over from each deck. Assuming fair decks, what is P({the two cards are the same suit})? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/4`

# Lecture 5 - Stochastic Thinking
## Exercise 7
5/5 points (graded)

You pick three balls in succession out of a bucket of 3 red balls and 3 green balls. Assume replacement after picking out each ball. What is the probability of each of the following events?

1. Three red balls: A : {R,R,R}. Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/8`
2. The sequence red, green, red: A : {R,G,R}. Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/8`
3. Any sequence with 2 reds and 1 green. Answer in reduced fraction form - eg 1/5 instead of 2/10. `3/8`
4. Any sequence where the number of reds is greater than or equal to the number of greens. Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/2`
5. You have a bucket with 3 red balls and 3 green balls. This time, assume you **don't** replace the ball after taking it out. What is the probability of drawing 3 balls of the same color? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/10`

# Lecture 6 - Random Walks
## Exercise 1
3/3 points (graded)

1. Would placing the drunk's starting location not at the origin change the distances returned?
- [ ] Yes
- [x] No

If so, what line would you modify to compensate? Enter the line number (eg 17). If not, just type None. `None`
```python:
def simWalks(numSteps, numTrials, dClass):
    homer = UsualDrunk()
    notOrigin = Location(1, 0)
    distances = []
    for t in range(numTrials):
        f = Field()
        f.addDrunk(homer, notOrigin)
        distances.append(round(walk(f, homer, numSteps), 1))
    return distances
```

2. If you were going to use random.seed in a real-life simulation instead of just when you are debugging a simulation, would you want to seed it with 0?
- [ ] Yes
- [x] No

# Lecture 6 - Random Walks
## Exercise 2
2/2 points (graded)

1. Is the following code deterministic or stochastic?
```python:
import random
mylist = []

for i in range(random.randint(1, 10)):
    random.seed(0)
    if random.randint(1, 10) > 3:
        number = random.randint(1, 10)
        mylist.append(number)
print(mylist)
```
- [ ] Deterministic
- [x] Stochastic

2. Which of the following alterations (Code Sample A or Code Sample B) would result in a deterministic process?
```python:
import random

# Code Sample A
mylist = []

for i in range(random.randint(1, 10)):
    random.seed(0)
    if random.randint(1, 10) > 3:
        number = random.randint(1, 10)
        if number not in mylist:
            mylist.append(number)
print(mylist)

# Code Sample B
mylist = []

random.seed(0)
for i in range(random.randint(1, 10)):
    if random.randint(1, 10) > 3:
        number = random.randint(1, 10)
        mylist.append(number)
    print(mylist)
```
Check one or both.

- [x] Code Sample A
- [x] Code Sample B

# Lecture 6 - Random Walks
## Exercise 3
3/3 points (graded)

The output of `random.randint(1, 10)` after a specific seed is shown below.
```
>>> import random
>>> random.seed(9001)
>>> random.randint(1, 10)
1
>>> random.randint(1, 10)
3
>>> random.randint(1, 10)
6
>>> random.randint(1, 10)
6
>>> random.randint(1, 10)
7
```
We would like you to solve this problem using just the above output, without using the interpreter. (Note that the actual output you get when you run the above commands is actually going to be 1, 5, 5, 2, 10) What is printed in the following programs? Separate new lines with commas - so the above output would be 1, 3, 6, 6, 7.

**Note!** Try it out!
```python:
random.seed(9001)
for i in range(random.randint(1, 10)):
    print(random.randint(1, 10))
```
* `3`

```python:
random.seed(9001)
d = random.randint(1, 10)
for i in range(random.randint(1, 10)):
    print(d)
```
* `1, 1, 1`

```python:
random.seed(9001)
d = random.randint(1, 10)
for i in range(random.randint(1, 10)):
    if random.randint(1, 10) < 7:
        print(d)
    else:
        random.seed(9001)
        d = random.randint(1, 10)
        print(random.randint(1, 10))
```
* `1, 1, 3`

# Lecture 6 - Random Walks
## Exercise 4
1 point possible (graded)

Suppose we wanted to create a class `PolarBearDrunk`, a drunk polar bear who moves randomly along the x and y axes taking large steps when moving South, and small steps when moving North.
```python:
class PolarBearDrunk(Drunk):
    def takeStep(self):
        # code for takeStep()
```

Which of the following would be an appropriate implementation of takeStep()?

1. Option A)
```python:
directionList = [(0.0, 1.0), (1.0, 0.0), (-1.0, 0.0), (0.0, -1.0)]
myDirection = random.choice(directionList)
if myDirection[0] == 0.0:
    return myDirection + (0.0, -0.5)
return myDirection
```

2. Option B)
```python:
directionList = [(0.0, 0.5), (1.0, -0.5), (-1.0, -0.5), (0.0, -1.5)]
return random.choice(directionList)
```

3. Option C)
```python:
directionList = [(0.0, 0.5), (1.0, 0.0), (-1.0, 0.0), (0.0, -1.5)]
return random.choice(directionList)
```

4. Option D)
```python:
directionList = [(0.0, 1.0), (1.0, 0.0), (-1.0, 0.0), (0.0, -1.0), (0.0, -1.0)]
return random.choice(directionList)
```
- [ ] Option A)
- [ ] Option B)
- [x] Option C)
- [ ] Option D)

# Lecture 6 - Random Walks
## Lab: Random Walks
This is an optional lab component to the lecture. Play with it and explore!

> Info
> We will be visualizing a random walk in this lab.
> 
> A random walk can be used to model real-life phenomena that are not necessarily random. Particle behavior is one of these applications. Using a random walk, we can simulate the path of one or more molecules in a variable-density medium and gain insight into > certain processes like diffusion.
> 
> The particles are initially set to move ΔX and ΔY in the range of [-0.5, 0.5] at each step. By increasing the "density" (of arbitrary units) below the plot you can reduce this range, effectively constraining the particle movement.
> 
> Feel free to play with all of the parameters (although be warned that simulating too many particles may slow down and/or crash your browser). Try and guess the simulated particle behavior under certain edge conditions. For instance, what would you expect to happen if one of the sides is much denser than the other?

# Lecture 7 - Inferential Statistics
## Exercise 1
3/3 points (graded)

1. A fair two-sided coin is flipped 4 times. It comes up heads all four times. What is the probability that it comes up heads on the fifth flip? Answer in reduced fraction form - eg 1/5 instead of 2/10. `1/2`
2. A fair two-sided coin is flipped 1000 times. It comes up heads every time. Which is correct?
- [ ] Regression to the mean tells us that the next 1000 tosses will be almost all tails.
- [x] Regression to the mean tells us that the next few tosses will be not as extreme as the first 1000.

3. Next we toss a huge ball with 1,000 dots on it. Half the dots are red and the other half are blue. We roll the ball and when it stops, we note the color of the dot on the very top of the ball.

True or False? If we roll it four times, and it comes up red once and blue three times, then we have proved that the ball is biased.
- [ ] True
- [x] False

# Lecture 7 - Inferential Statistics
## Exercise 2
5/5 points (graded)

For the questions below, please try to think about the solution in your head before using an IDE or a calculator to compute it. The goal of these questions is to give you some intuition about the topics we've been discussing.

1. Which of the following populations has the largest variance?
- [ ] [0, 1, 2, 3, 4, 5, 6]
- [ ] [3, 3, 3, 3, 3, 3, 3]
- [x] [0, 0, 0, 3, 6, 6, 6]

2. Which of the following populations has the largest variance?
- [ ] [3, 3, 5, 7, 7]
- [x] [1, 5, 5, 5, 9]

3. If a number is removed from a population, the standard deviation of that population will always decrease.
- [ ] True
- [x] False

4. You are taking samples of the ages of two populations, A and B. Population A is all the residents of San Francisco, while Population B is all the residents of Los Angeles.

The sample from Population A has a mean of 35 and a standard deviation of 1. The sample from Population B has a mean of 45 and a standard deviation of 15. Which of the following are certain?

- [ ] You will not find an individual in Population A that is over the age of 37.
- [ ] The average age of Population A is lower than the average age of Population B.
- [x] The average age of the sample of Population A is lower than the average age of the sample of Population B.
- [ ] If the sample size of each population is 10, then you can conclude that your sample accurately represents the population.
- [x] A sample size of 1 million is more appropriate than a sample size of 10 for these populations.

5. The 95% confidence interval for a normal distribution of data with a mean of 5 and a standard deviation of 2 is 5 +/- `3.92`

# Lecture 7 - Inferential Statistics
## Exercise 3
5.0/5.0 points (graded)

Write a function, `stdDevOfLengths(L)` that takes in a list of strings, `L`, and outputs the standard deviation of the lengths of the strings. Return `float('NaN')` if `L` is empty.

Recall that the standard deviation is computed by this equation:

$$\sqrt{\frac{\sum_{t\text{ in }X} (t - \mu)^2}{N}}$$

where:

- _μ_ is the mean of the elements in X.
- $\sum_{t\text{ in }X} (t - \mu)^2$ means the sum of the quantity $(t - \mu)^2$ for _t_ in _X_.

  That is, for each element (that we name t) in the set X, we compute the quantity $(t - \mu)^2$ . We then sum up all those computed quantities.
- N is the number of elements in X.
  1. Test case: If `L = ['a', 'z', 'p']`, `stdDevOfLengths(L)` should return `0`.
  2. Test case: If `L = ['apples', 'oranges', 'kiwis', 'pineapples']`, `stdDevOfLengths(L)` should return `1.8708`.

**Note: If you want to use functions from the math library, be sure to `import math`. If you want to use numpy arrays, you should add the following lines at the beginning of your code for the grader:**
```python:
import os
os.environ["OPENBLAS_NUM_THREADS"] = "1"
```
Then, do `import numpy as np` and use `np.METHOD_NAME` in your code.

In [None]:
def stdDevOfLengths(L):
    """
    L: a list of strings

    returns: float, the standard deviation of the lengths of the strings,
      or NaN if L is empty.
    """
    if not L:
        return float('NaN')
    n, lengths = len(L), [len(word) for word in L]
    avg = sum(lengths) / n
    return (sum((l - avg)**2 for l in lengths) / n)**0.5

# Lecture 7 - Inferential Statistics
## Exercise 4
3/3 points (graded)

The coefficient of variation is the standard deviation divided by the mean. Loosely, it's a measure of how variable the population is in relation to the mean.

1. Figure 1 shows the skyline of Pythonland, and Figure 2 shows the skyline of Montyland.

<img src="https://courses.edx.org/assets/courseware/v1/fd0d7808e53d9bc64e664197e148cbbb/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_pythonland.png">
<img src="https://courses.edx.org/assets/courseware/v1/cebfb0c7fa3e024410e6fc8cb839e70f/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_montyland.png">

Considering the heights of buildings in Pythonland and Montyland, which has a larger coefficient of variation?
- [ ] Pythonland
- [x] Montyland

2. Which of the following populations has the highest coefficient of variation?


- [x] [1, 2, 3]
- [ ] [11, 12, 13]
- [ ] [0.1, 0.1, 0.1]

3. Compute the coefficient of variation of [10, 4, 12, 15, 20, 5] to 3 decimal places. `.503`

# Lecture 7 - Inferential Statistics
## Exercise 5
8/8 points (graded)

In the lecture, you saw a uniform and a normal distribution. There is another type of distribution, called an [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution). For the following real-life situations, fill in the blank with the appropriate distribution model (normal, uniform, or exponential) that would best simulate the situation.

1. Rolling a fair 6-sided die `uniform`
2. Sum of rolling 2 fair 6-sided dice `normal`
3. Women's shoe sizes `normal`
4. Human intelligence (IQ) scores `normal`
5. Amount of mold on bread, assuming an infinite supply of bread `exponential`
6. The winning lottery numbers `uniform`
7. Skilled person throwing darts at a dart board `normal`
8. Radioactive decay (time between successive atom decays) `exponential`

# Lecture 7 - Inferential Statistics
## Exercise 6
4/4 points (graded)

1. Samples were taken from a distribution, and the histogram of those samples is shown here:

<img src="https://courses.edx.org/assets/courseware/v1/6fc519024c060da7fe8569ba5bc65686/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure1.png" width="500" alt="bell curve centered around 2, 0">

Which of the following distributions were the samples taken from? `Normal Distribution`

2. Which of the following histograms best matches samples taken from a uniform distribution between 0 and 2? `Figure 2`

<img src="https://courses.edx.org/assets/courseware/v1/2f798c2d1354b6e76630a7e96034076f/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure2-0.png" width="500" alt="bell curve centered around 1, 0 and max height at 300">
Figure 1

<img src="https://courses.edx.org/assets/courseware/v1/ce729758c69d772ac916f44952ac6fd5/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure2-1.png" width="500" alt="equal height bars starting from 0 to 2 and each approx 100 height">
Figure 2

<img src="https://courses.edx.org/assets/courseware/v1/46bbc40dce2c3a7bf39ea26149ab94fe/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure2-2.png" width="500" alt="equal height bars starting from 0 to 1 and each approx 200 height">
Figure 3

<img src="https://courses.edx.org/assets/courseware/v1/a056fac4aaeb0633aa5a82e093ba1b86/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure2-3.png" width="500" alt="exponentially decreasing bars starting from 0 with the tail at 2">
Figure 4

3. Each of the following histograms was generated by sampling a different normal distribution. Which histogram best matches the normal distribution with the highest variance of the three? `Figure 3`

<img src="https://courses.edx.org/assets/courseware/v1/f54d3d63dfbf2b16ba84e55e15b1f72c/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure3-0.png" width="500" alt="bell curve centered around 1.5, tails at -1 and 3, max height at 700">
Figure 1

<img src="https://courses.edx.org/assets/courseware/v1/2231d913a055d2c9ce6941162f148614/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure3-1.png" width="500" alt="bell curve centered around 2, tails at -1 and 5, max height at 400">
Figure 2

<img src="https://courses.edx.org/assets/courseware/v1/e59ae33bf767a8dcda5713e48f70700f/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure3-2.png" width="500" alt="bell curve centered around 0, tails at -5 and 5, max height at 200">
Figure 3

4. Mary's Clothes Shoppe is a moderatly busy store. Which of the following histograms best matches observations of how much time (in minutes) there is between customer arrivals? That is, which histogram helps best predict how much time until the next customer comes into the Clothes Shoppe. 

For each histogram, 1000 observations were made. The x-axis is measured in minutes, and the height of each bar at minute m corresponds to how many times there was an m minute wait until the next customer arrived. The histogram with the highest variance of the three is shown below. `Figure 1`

<img src="https://courses.edx.org/assets/courseware/v1/a78d176df487e44e8af41ab90be5c5ef/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure4-0.png" width="500" alt="exponentially decreasing bars starting from 0 with the tail at 20 max height 450">
Figure 1

<img src="https://courses.edx.org/assets/courseware/v1/85b32f15e567e1248c8aed948c5fca36/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure4-1.png" width="500" alt="bell curve centered around 10, tails at 0 and 20, max height at 350">
Figure 2

<img src="https://courses.edx.org/assets/courseware/v1/0825836119f76b73b96da5571d25a7c3/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_figure4-2.png" width="500" alt="equal height bars starting from 0 to 20 and each approx 100 height">
Figure 3

# Lecture 7 - Inferential Statistics
## Lab: Understanding Probability Distributions
This is an optional lab component to the lecture. Play with it and explore!

> Info
>
> In this lab, we will look at a few sample probability distributions and try to gain an intuitive understanding of their parameters.

### 1) Describe your probability density function (PDF)
You can see the resulting PDF from the graph.

**Distribution type:**
- [ ] Gaussian
- [ ] Exponential
- [ ] Uniform

* Mean: 
* Variance: 

### 2) See the results
You can see samples drawn from the PDF here. We are plotting them in 2D with X and Y values drawn independently from the PDF you described above, because it's more fun that way.

Go ahead and play with the parameters of your PDF; you will see that both plots automatically update. Use this to gain an intuitive understanding of what properties are affected by the parameters.

# Lecture 8 - Monte Carlo Simulations
## Exercise 1
### Exercise 1-1
1/1 point (graded)

Suppose we have an experiment. We toss a coin `m` times. Each time we collect results from a sample of size `n` and compute this sample's mean $μ_i$ and standard deviation $σ_i$. This experiment has an underlying distribution with mean _μ_ and standard deviation _σ_.

Which of the following does the Central Limit Theorem (CLT) guarantee (for large enough _μ_ and _σ_):

- [x] The sample means will be approximately normally distributed.
- [x] The sample means will have a mean close to the mean of the original distribution _μ_.
- [x] The sample means will have a variance close to the variance of the original distribution divided by the sample size $$\frac{σ^2}{n}$$.

# Lecture 8 - Monte Carlo Simulations
## Exercise 2
2/2 points (graded)

1. If you wanted to run a simulation that estimates the value of  in a way similar to the Pi estimation shown in lecture, what geometric shape would you throw needles at?
- [ ] A square, with a smaller square drawn inside it. The smaller square is formed by connecting the larger square's midpoints.
- [ ] A cube with a sphere inscribed inside it.
- [x] A flat line ranging from 0 to root 2 and with a subsection that spans from 0 to 1.

2. What introduced the error for Archimedes' method of calculating Pi?
- [ ] Incorrect conceptual model.
- [ ] Calculation error.
- [x] Not enough samples.

# Lecture 8 - Monte Carlo Simulations
## Exercise 3
2/2 points (graded)

If you remember the Buffon Needle Problem, the ratio of the areas of a circle and a square are used to estimate the value of _π_ by dropping needles onto the shapes, like so:

$$π = \frac{(\text{area of square}) (\text{needles in circle})}{(\text{needles in square})}$$

We can imagine that using different area ratios results in the estimation of different constants.

In the following boxes, you will be asked to enter in mathematical expressions. To enter in addition, multiplication, subtraction, or division, use the operators: +, *, -, /. To enter in exponentiation, use the caret (^) key. To enter in the constant _π_, simply type pi.

1. What constant can you estimate using the following picture? `pi/2`

<img src="https://courses.edx.org/assets/courseware/v1/0d73ed054284394cc825aac0a2430180/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/files_finger_exercises_L16-semicircle.png" alt="rectangle height 1 and length 2 with th ebottom half of a semicircle transcibed in it">

2. Download the code used in the lecture "Finding Pi". If we now want to estimate the constant from the picture above, what should the number '4' in the line: `return 4*(inCircle/float(numNeedles))` be changed to? `2`

# Lecture 8 - Monte Carlo Simulations
## Exercise 4
0.0/5.0 points (graded)

You have a bucket with 3 red balls and 3 green balls. Assume that once you draw a ball out of the bucket, you don't replace it. What is the probability of drawing 3 balls of the same color?

Write a Monte Carlo simulation to solve the above problem. Feel free to write a helper function if you wish.
```python:
def noReplacementSimulation(numTrials):
    '''
    Runs numTrials trials of a Monte Carlo simulation
    of drawing 3 balls out of a bucket containing
    3 red and 3 green balls. Balls are not replaced once
    drawn. Returns the a decimal - the fraction of times 3 
    balls of the same color were drawn.
    '''
    # Your code here
```

In [None]:
def noReplacementSimulation(numTrials):
    '''
    Runs numTrials trials of a Monte Carlo simulation
    of drawing 3 balls out of a bucket containing
    3 red and 3 green balls. Balls are not replaced once
    drawn. Returns the a decimal - the fraction of times 3 
    balls of the same color were drawn.
    '''
    if numTrials <= 0:
        return 0
    counter = 0
    for _ in range(numTrials):
        balls = ['R'] * 3 + ['G'] * 3
        drawn = []
        for _ in range(3):
            drawn.append(balls.pop(balls.index(random.choice(balls))))
        counter += drawn[0] == drawn[1] == drawn[2]
    return counter / numTrials

# Lecture 9 - Sampling and Standard Error
## Exercise 1
2/2 points (graded)

1. For this situation, decide whether you should do randomized sampling or stratified sampling: You are traveling across the United States and recording the heights of 1000 people to find out the average height in the US.
- [x] Random sampling
- [ ] Stratified sampling

2. For this situation, decide whether you should do randomized sampling or stratified sampling: You live in a state that has 20,000 people in one big city and 100 people in a rural area. You and want to sample households in this state to determine how many electronic devices the average household has across all states.
- [ ] Random sampling
- [x] Stratified sampling

# Lecture 9 - Sampling and Standard Error
## Exercise 2
2/2 points (graded)

1. You are given the following partially completed function and a file julytemps.txt containing the daily maximum and minimum temperatures for each day in Boston for the 31 days of July 2012. In the loop, we need to make sure we ignore all lines that don't contain the relevant data.
```python:
def loadFile():
    inFile = open('julytemps.txt')
    high = []
    low = []
    for line in inFile:
        fields = line.split()
        # FILL THIS IN
            continue
        else:
            high.append(int(fields[1]))
            low.append(int(fields[2]))
    return (low, high)
```
Be sure that you have looked through the raw data file and that you understand which lines do and do not contain relevant data. Which set of conditions would capture all non-data lines (ie, provide a filter that would catch anything that wasn't relevant data)? `fields` is the variable that contains a list of elements in a line.
- [ ] `if len(fields) != 3:`
- [x] `if len(fields) != 3 or 'Boston' == fields[0] or 'Day' == fields[0]:`
- [ ] `if not fields[0].isdigit() or len(fields) < 3:`
- [x] `if len(fields) < 3 or not fields[0].isdigit():`
- [ ] `if '-' == fields[0] or 'Boston' == fields[0] or 'Day' == fields[0] or ' ' == fields[0]:`
- [ ] `if '-' == fields[0] or 'Boston' == fields[0] or 'Day' == fields[0]:`

2. Suppose you defined `diffTemps = list(numpy.array(highTemps) - numpy.array(lowTemps))` to be a list which is the element-by-element difference between `highTemps` and `lowTemps`. Which is a valid plotting statement for a graph with days on the horizontal axis and the temperature difference on the vertical axis?
- [ ] `pylab.plot(highTemps,lowTemps)`
- [ ] `pylab.plot(range(1,32), highTemps)`
- [ ] `pylab.plot(range(1,32), lowTemps)`
- [x] `pylab.plot(range(1,32), diffTemps)`
- [ ] `pylab.plot(diffTemps, range(1,32))`

# Lecture 9 - Sampling and Standard Error
## Exercise 3
3/3 points (graded)

1. The following image shows the average low and average high temperature in from the data in [julytemps.txt](https://raw.githubusercontent.com/lcsm29/edx-mit-6.00.2x/main/data/julytemps.txt).

<img src="https://courses.edx.org/assets/courseware/v1/15569781abd0a10ad2d7cf79ebd001ee/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/temps.png" alt="vertical bar 1 with top at 95 and bottom at 70 and midpoint at 82 and vertical bar 2 with top at 75 and bottom at 59 and midpoint at 65">

The errorbars represent the 95% confidence interval. The 95% confidence interval for the average high is 83.5 +/- 12.9 and the 95% confidence interval for the average low is 67.2 +/- 7.3. Are these two means statistically significant at the 95% confidence interval?
- [ ] Yes
- [x] No

2. Are these two means statistically significant at the 99.7% confidence interval?
- [ ] Yes
- [x] No

3. Are these two means statistically significant at the 68% confidence interval?
- [x] Yes
- [ ] No

# Lecture 9 - Sampling and Standard Error
## Exercise 4
3/3 points (graded)

Ace, Bree, and Chad are each tasked with finding the standard error for three different problems. Each person only has a sample size of 100 data points for each of their problem.

* Ace: the winning bonus number in the lottery
* Bree: the average women's shoe size
* Chad: the number of mold bacteria on bread over time

1. Which person's sample standard deviation will be the closest to the actual population standard deviation?
- [x] Ace
- [ ] Bree
- [ ] Chad

2. Which person's sample standard deviation will be the farthest to the actual population standard deviation?
- [ ] Ace
- [ ] Bree
- [x] Chad

3. Now suppose Chad used a sample size of 10,000 instead of 100 but the other two people still use a sample size of 100. Mark all that are correct.
- [x] The difference between the sample standard deviation and actual population standard deviation for the mold problem decreases.
- [ ] The difference between the sample standard deviation and actual population standard deviation for the mold problem is now less than the difference between the sample standard deviation and actual population standard deviation for the shoe problem.
- [ ] difference between the sample standard deviation and actual population standard deviation for the mold problem is now less than the difference between the sample standard deviation and actual population standard deviation for the lottery problem.

# Lecture 9 - Sampling and Standard Error
## Exercise 5
3/3 points (graded)

You are given two data files. Each file contains 1800 data points measuring the heart rate (in beats per minute, every 0.5 seconds) of a subject prforming comparable activities for the duration of 15 minutes: [hr1.txt](https://raw.githubusercontent.com/lcsm29/edx-mit-6.00.2x/main/data/hr1.txt) and [hr2.txt](https://raw.githubusercontent.com/lcsm29/edx-mit-6.00.2x/main/data/hr2.txt). The data is plotted in the figures below. (note that the data is taken from the [MIT-BIH Database](http://ecg.mit.edu/time-series/))

<img src="https://courses.edx.org/assets/courseware/v1/8b08b2af6d18e44ff1c22515cb551ed5/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/ts1.gif" alt="graph with xaxis label (time sec*2) and yaxis label (heart rate bpm) and noisy jagged graph around y=90 and from x=0 to x=1750graph with xaxis label (time sec*2) and yaxis label (heart rate bpm) and graph with frequency of 100 around y=90 and from x=0 to x=1750">

Using a sample size of 250, decide whether the following methods of drawing samples will yield samples where the examples are independent of each other.

1. Using random.sample
- [x] Examples are independent in hr1 sample.
- [x] Examples are independent in hr2 sample.

2. Getting a random number between 1 and 1800, 250 times.
- [ ] Examples are independent in hr1 sample.
- [ ] Examples are independent in hr2 sample.
- [x] Neither h1 nor h2 give independent examples.

3. Starting at the first example and going until the 500th example.
- [ ] Examples are independent in hr1 sample.
- [ ] Examples are independent in hr2 sample.
- [x] Neither h1 nor h2 give independent examples.

# Lecture 10 - Experimental Data Part 1
## Exercise 1
1.0/1.0 point (graded)

Using the formula derived in this segment, compute `k` from the second experimental observation: m = 0.15 kg, x = 0.1015 m.

Use 9.81 m/s^2 as the gravitational constant (g). Enter your answer to at least 1 decimal place of accuracy.

k = `14.4975` 

# Lecture 10 - Experimental Data Part 1
## Exercise 2
2/2 points (graded)

Which of the following lines will fit a parabola to the spring data given in the lecture file, [springData.txt](https://raw.githubusercontent.com/lcsm29/edx-mit-6.00.2x/e7a25f95d4290ffb8fd0545c322c1623a72b152b/data/springData.txt)? Check all that work.

- [ ] `a,b = pylab.polyfit(xVals, yVals, 2)`
- [x] `model = pylab.polyfit(xVals, yVals, 2)`
- [x] `a,b,c = pylab.polyfit(xVals, yVals, 2)`

Suppose the coefficients returned by polyval are in the tuple (c1, c2, c3). Which of the following lines calculate the estimated y values?

- [x] estYVals = c1*xVals**2 + c2*xVals + c3
- [ ] estYVals = c3*xVals**2 + c2*xVals + c1

# Lecture 10 - Experimental Data Part 1
## Exercise 3
1/1 point (graded)

Recall from the previous video the concept of the coefficient of determination, also known as the $R^2$ value. This is computed by $$1-\frac{(\text{variability of errors})^2}{(\text{variability of data})^2}$$. The variability of the errors is computed by taking the sum of the squares of (observed - predicted) errors. We normalize this variablity by dividing it by the variability of the data, which is sum of the squares of (observation - average_observation) for each observation.

In [this file](https://raw.githubusercontent.com/lcsm29/edx-mit-6.00.2x/e7a25f95d4290ffb8fd0545c322c1623a72b152b/data/lectureCode_L17_code.py), this $R^2$ value is computed by the function rSquare.

In that file, revise fitData and fitData3 to report the coefficient of determination for the fitted line in each case. Did this measure of the "goodness of fit" improve when we eliminated the measurements after the spring reached its elastic limit and Hooke's Law no longer applied?

- [x] Yes
- [ ] No

# Lecture 11 - Experimental Data Part 2
## Exercise 1
5/5 points (graded)
To model data effectively, it is important to understand the underlying model that describes the data. This means that knowing the physical phenomenon or event that is being modeled is extremely important. For each of the following data/phenomena/events, describe what type of model (linear, quadratic, other) you would use to describe the underlying phenomena.

1. Hourly temperature from 7am to 7pm `quadratic`
2. Gravitational force on an object as mass increases `linear`
3. Displacement of a mass on a hanging spring from the ceiling `other`

It is also important to understand physical phenomena and their limitations when modeling data. Which of the following are true?

- [x] Even though hourly temperature fluctuations throughout the day may oscillate for a variety of reasons (wind, cloud cover, etc), the underlying trend is quadratic and using a quadratic model is most appropriate.
- [ ] You can eliminate a small number of non-outlier data points in order to construct a model that has a better fit.
- [x] At some point, some physical phenomena have limitations that do not fit their mathematical models (i.e. springs have an elastic limit).

When modeling, the model that has the biggest R^2 value is always the best model.

- [ ] True
- [x] False

# Lecture 11 - Experimental Data Part 2
## Exercise 2
6/6 points (graded)

Suppose you are given the following data and are asked to fit a curve to this data.
```python:
A = [1,2,3,4,5,6,7,8,9,10]
L = [0.59,18.38, 33.01, 54.14, 72.48, 89.8, 97.07, 112.6, 142.87, 199.84]
```
1. Match each plot with the correct polynomial fit.

* Fit 1 `linear`
<img src="https://courses.edx.org/assets/courseware/v1/93ca5dcd269ecdc1870ada78367adf47/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/pfigure_1.png">

* Fit 2 `Polynomial of degree 2`
<img src="https://courses.edx.org/assets/courseware/v1/143c31a08c2138921086400d31807cf4/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/pfigure_2.png">

* Fit 3 `Polynomial of degree 5`
<img src="https://courses.edx.org/assets/courseware/v1/e78a4e1f8a7b626f25f37758e8ceeeb1/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/pfigure_3.png">

2. Is each fit an example of overfitting?

* Fit 1 `No`
<img src="https://courses.edx.org/assets/courseware/v1/93ca5dcd269ecdc1870ada78367adf47/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/pfigure_1.png">

* Fit 2 `No`
<img src="https://courses.edx.org/assets/courseware/v1/143c31a08c2138921086400d31807cf4/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/pfigure_2.png">

* Fit 3 `Yes`
<img src="https://courses.edx.org/assets/courseware/v1/e78a4e1f8a7b626f25f37758e8ceeeb1/asset-v1:MITx+6.00.2x+3T2021+type@asset+block/pfigure_3.png">


# Lecture 11 - Experimental Data Part 2
## Curve Fitting Lab

### Lab: Curve Fitting
This is an optional lab component to the lecture. Play with it and explore!

> Info
> In this section, we provide you with a few choices of datasets, all found in Google Public Data. The datasets are supplied by The World Bank. Play around with the options, observe the results and remember, the complex model isn't always the correct model!

#### Step 1: Data
- Mobile cellular subscriptions
- GDP per capita (constant 2000 USD)
- Internet users as percentage of population

#### Step 2: Parameters
Specify your curve fitting parameters below.

Regression method
- Linear
- Exponential
- Polynomial

Degree of polynomial fit: (slider 1 to 20)

#### Step 3: Fitting
Your selected dataset and fitted curve is plotted below.


# Lecture 12 - Machine Learning
## Exercise 1
7/7 points (graded)

For the following problems, decide whether it would be better to perform supervised or unsupervised learning given the data.

- 100,000 emails are read and marked as spam or not spam, depending on some metrics measured on their content(keywords, length, etc). We want to determine what a new email will be marked as. `Supervised`
- A junkyard has 500 objects with 2 and 4 wheels. We want to separate the objects into 4 different groups. `Unsupervised`
- A group of 1000 students are asked for a sample of their handwriting. Researchers make pairs of (handwritten text, typed text). Given a new handwriting sample from a new student, we want to determine what the typed version of the handwriting sample would be. `Supervised`
- Given a set of t-shirts, we want to organize them in 3 different piles. `Unsupervised`
- Given a greenhouse full of plants, we want to organize them so that they can be given away to novice, intermediate, and expert plant handlers. `Unsupervised`
- Given a set of colored points on an x-y axis, we want to place a new point on the plot, knowing its color. `Supervised`
- A school documents the age, grade, score on a math standardized test, and score on a writing standardized test for every student. A new student comes to the school and we want to decide what grade they should be placed in. `Supervised`

# Lecture 12 - Machine Learning
## Exercise 2
1/1 point (graded)

For the following question, check the boxes that correspond to the rules that we may be able to learn. Consider the following set of 6 emails, which classify the email as spam or not. Which of the following rules might we learn? Check all that apply.

| Spam or Not Spam?	| Spam | Spam | Spam | Not spam | Not spam | Not spam | 
| :---------------: | :--: | :--: | :--: | :------: | :------: | :------: |
| Words in Email    |   4  |   4  |  30  |    35    |    50    |    10    |
| Flagged Words     | CASH, BUY, password | cash, free, rolex | cash free, call	| only, cancel, sign | free, check, weight | quote, cheap, website |

- [x] Emails containing both flagged words "cash" and "free" are marked "spam".
- [ ] Emails without the flagged word "free" are marked "not spam".
- [ ] Emails with an even number of words are marked "spam".
- [ ] Emails with less than 31 words are marked "spam".
- [x] Emails containing at least one flagged word in all capital letters are marked "spam".

# Lecture 12 - Machine Learning
## Exercise 3
4/4 points (graded)

For each of the following situations state whether it would be a good idea to scale the features.

1. One feature set is height (in meters) and the other is weight (in kilograms). `Scale`
2. One feature set is the number of detected earthquakes in a city and the other is the population in that city. `Scale`
3. The percent concentration of a virus in a random sampling of healthy and unhealthy people. `Don't scale`
4. The angle of refraction of light (degree that light bends) observed when entering water vs. glass vs a diamond. `Don't scale`

# Lecture 12 - Machine Learning
## Exercise 4
5/5 points (graded)

The company Internet Movies, Inc. has found wide success in their streaming movie business. After many long and successful years of delivering content, they have decided to use machine learning to make their business even more successful. Luckily, they already possess a huge dataset that has grown over years and years of user activity – but they need your help to make sense of it! Answer the following questions

1. Let’s start with a simple case. Assume user Alice is a particularly good member and she makes sure to rate every movie she ever watches on the website. What machine learning approach would be better for the company to use for determining whether she would be interested in a new specific movie? `Supervised`

2. Bob, on the other hand, is not that much into ratings. He does watch a lot of movies, but never takes the time to rate them. For users like Bob, which of the following data can the company use to determine potential interest in a specific movie? Check all that apply.
- [x] Metadata of movies: actors, director, genre, etc.
- [ ] Length of the movie
- [x] Popularity of the movie amongst other users
- [ ] User login patterns

3. What machine learning approach should the company use for cases like Bob? `Unsupervised`

Now that the company has some idea about how to use the data, it’s time to design a classifier. Our classifier will be very simple: given a movie and a user, it will classify the movie as either "Good" or "Bad" for this user.

4. Assume all the users of the company have a very simple rule in their movie taste: they like it if Tom Cruise has the lead role. Any other data is mostly irrelevant. However, no one in the company knows about this fact. Which of the following clustering models might be able to detect this rule? Check all that apply.
- [ ] Supervised (label: rating), with data: Director, language, genre
- [x] Supervised (label: rating), with data: Movie length, lead role, director
- [x] Unsupervised, with data: Lead role, movie length, rating
- [ ] Unsupervised, with data: Lead role, genre, director
- [ ] Unsupervised, with data: Number of ratings, lead role

5. Looking at the options they’re given, the board members choose to go with a supervised model with lead role as data. You become outraged. "How can you not include movie length? It’s incredibly important! Who watches a 3 hour long movie --" Your fellow data scientist interrupts you. "Yeah, I agree, but look at these initial results. You see, if we remove movie length, ..." What can your colleague (correctly) say to convince you? Check all that apply.
- [ ] "we can reduce inter-cluster dissimilarity."
- [x] "we can reduce intra-cluster dissimilarity."
- [ ] "the model starts to work. It doesn’t work otherwise."
- [x] "we can consume less memory, and the results look almost the same."

# Lecture 12 - Machine Learning
## Exercise 5
3/3 points (graded)

As Professor Guttag said, there are two types of people in this world: those who know programming and those who don’t. To prove this once and for all, you take a random sampling of edX students and put them through a programming test. Assume that the test is entirely fair and that it reflects the exact level of skill each student has. You also ask them to fill out a small questionnaire about their experience with 6.00.2x.

You receive the results for each student as [Exam grade, Hour spent on 6.00.2x]. That is, if Alice has spent 90 hours on 6.00.2x and received a score of 74 on the exam, you will have [74, 90] as a data point.

1. Based on your initial purposes, what should you choose as k?
- [x] 2
- [ ] 3
- [ ] 4

2. Should you apply scaling to this data?
- [x] Yes
- [ ] No

You run your clustering algorithm and get two clusters:

| CLUSTER 1 | Exam grade | Hours spent on 6.00.2x |
| :-------: | :--------: | :--------------------: |
| Mean      | 46.2	     | 4.3                    |
| Variance  | 15.0       | 0.91                   |

| CLUSTER 2 | Exam grade | Hours spent on 6.00.2x |
| :-------: | :--------: | :--------------------: |
| Mean      | 84.5	     | 60.2                   |
| Variance  | 5.1        | 6.04                   |

Results look good – there are indeed two kinds of people, and curiously one kind seems to absolutely love 6.00.2x. However, when you rerun the code, you get the following clusters:

| CLUSTER 1 | Exam grade | Hours spent on 6.00.2x |
| :-------: | :--------: | :--------------------: |
| Mean      | 12.5	     | 2.34                   |
| Variance  | 0.29       | 0.36                   |

| CLUSTER 2 | Exam grade | Hours spent on 6.00.2x |
| :-------: | :--------: | :--------------------: |
| Mean      | 70.23      | 35.4                   |
| Variance  | 8.65       | 10.84                  |

3. You don’t know what to believe (and, indeed, there’s no reason for you to choose one over another). What can you do to fix this and get a stable result?
- [x] Let k = 3
- [ ] Add more students
- [ ] Add "student name length" as a feature