In [9]:
# Initialize Otter
import otter
grader = otter.Notebook("hw4.ipynb")

# <center> CS 178: Machine Learning &amp; Data Mining </center>
## <center> Homework 4: Solutions </center>

---
## Instructions


This homework (and many subsequent ones) will involve data analysis and reporting on methods and results using Python code. **You will submit a zip file containing: [hw4.ipynb, problem1_solution.png]** to Gradescope. Please do not rename the notebook. This includes any text you wish to include to describe your results, the complete code snippets of how you attempted each problem, and any figures that were generated (make sure to run all cells before submitting). It is important that you include enough detail that we know how you solved the problem, since otherwise we will be unable to grade it.

Your homework will be given to you as a zipfile containing the data and a Jupyter notebook with problem descriptions and some template code that will help you get started. You **must** use this starter Jupyter notebook to complete your assignment.

If you have any questions/concerns about using Jupyter notebooks, ask us on EdD.

### Summary of Assignment: 100 total points
- Problem 1: Drawing a Decision Tree (15 points)
- Problem 2: Computing the Gini Index (15 points)
- Problem 3: Implementing Decision Trees (50 points)
    - Problem 3.1: `class_prob_vector` (5 points)
    - Problem 3.2: `leaf_condition` (10 points)
    - Problem 3.3: `gini_score` (10 points)
    - Problem 3.4: `find_best_split` (15 points)
    - Problem 3.5: `build_tree` (10 points)
- Problem 4: Experimenting with Sklearn (15 points)
    - Problem 4.1: Training a small DT (5 points)
    - Problem 4.2: Varying depth (5 points)
    - Problem 4.3: Varying min_leaf (5 points)
- Statement of Collaboration (5 points)

Before we get started, let's import some libraries that you will make use of in this assignment. Make sure that you run the code cell below in order to import these libraries.

**Important: In the code block below, we set `seed=1234`. This is to ensure your code has reproducible results and is important for grading. Do not change this. If you are not using the provided Jupyter notebook, make sure to also set the random seed as below.**

**Important: Do not change any codes we give you below, except for those waiting for you to complete. This is to ensure your code has reproducible results and is important for grading.**

In [2]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_classification, load_breast_cancer
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings('ignore')

# Fix the random seed for reproducibility
# !! Important !! : do not change this
seed = 1234
np.random.seed(seed)  

<!-- BEGIN QUESTION -->

---
## Problem 1 (15 points): Draw a Decision Tree

In the image below, you are given some data for a binary classification problem with features $X_1$ and $X_2$ and labels $y = 0$ or $y = 1$. In addition, the dashed black lines depict the splits of a particular decision tree trained on this data.


<img src="./problem1_tree.png"  />


- Draw the decision tree corresponding to the splits in the image. For every leaf node, you should also include $p(y = 1 | \text{path})$. 
- Write your answer on paper and include a picture of your answer in this notebook. In order to include an image in Jupyter notebook, save the image in the same directory as the .ipynb file and then write `![caption](problem1_solution.png)`. **The image must be named "problem1_solution.png" and be submitted to gradescope with the notebook.**

![problem1_solution.png](attachment:problem1_solution.png)

<!-- END QUESTION -->

---
## Problem 2 (15 points): Computing the Gini Index

In the table below, you are given some data for a classification problem with 3 features and 3 classes. The feature $X_1$ is a binary feature, and the features $X_2$ and $X_3$ are real-valued features. The three classes are labeled $y = 0, 1, 2$. 

In this problem, you will use the Gini index to decide which feature to split on. In other words, you will be creating the root of a decision tree based on this data. You should do this by **hand** for practice for your exams, then write out code to solve for the outputs.

- What is the Gini index of the data before doing any splitting?
- For each feature, compute the Gini index of splitting on that feature. Use a threshold of $t = 0.4$ for $X_2$ and $t = 2.0$ for $X_3$. (Why don't we need to specify a threshold for $X_1$? You don't have to answer this, but you should think about it.) 
- Based on your answer to the previous question, which feature should we split on for the root of our decision tree?


| X1 | X2   | X3   | y |
|----|------|------|---|
| 0  | 0.1  | 3.4  | 0 |
| 1  | 0.3  | 3.0  | 0 |
| 0  | -0.2 | 2.9  | 0 |
| 0  | 1.3  | 0.1  | 1 |
| 1  | 2.2  | -0.5 | 1 |
| 0  | 4.0  | 0.3  | 1 |
| 1  | 0.5  | 1.2  | 2 |
| 1  | 3.0  | 0.75 | 2 |
| 1  | 2.2  | 0.1  | 2 |
| 0  | 0.25 | 0.1  | 2 |

In [19]:
def calculate_gini(probs:list):
    # probs is the probablity of each class label in subset
    return 1 - sum([x**2 for x in probs])

In [20]:
gini_score_before_split = calculate_gini([3/10,3/10,4/10])
gini_score_before_split

0.6599999999999999

In [21]:
# 0 for left side, 1 for right side

P_r = 0.5
P_l = 1 - P_r

G_r = calculate_gini([2/5,2/5,1/5])
G_l = calculate_gini([1/5,1/5,3/5])

gini_score_for_x1 = P_r*G_r + P_l*G_l

gini_score_for_x1

0.6

In [22]:
P_r = 0.6
P_l = 1 - P_r

G_r = calculate_gini([0,3/6,3/6])
G_l = calculate_gini([3/4,0,1/4])

gini_score_for_x2 = P_r*G_r + P_l*G_l

gini_score_for_x2

0.45

In [23]:
P_r = 0.3
P_l = 1 - P_r

G_r = calculate_gini([1,0,0])
G_l = calculate_gini([0,3/7,4/7])

gini_score_for_x3 = P_r*G_r + P_l*G_l

gini_score_for_x3

0.3428571428571429

gini_index before doing splitting is 0.66

gini_index for splitting on X2 with t = 0.4 is 0.45

gini_index for splitting on x3 with t = 2.0 is approximately 0.343

gini_index for splitting on x1 is 0.6

we should split on feature X3 as it minimize the weighted Gini index.

<!-- END QUESTION -->

---
## Problem 3: Implementing a Decision Tree

You will now implement an algorithm for learning a decision tree from data. You are given some starter code below that you will need to complete. To keep things simple, your implementation will only work for binary classification.

The class `Node` represents a single node in a decision tree. This class is already completed for you, and contains several useful attributes. In addition, the class `DecisionTree` is partially implemented for you. Before attempting this problem, it is important that you read and understand both of these classes.

In [81]:
class Node:
    """ A class representing a node in a decision tree.
    """
    
    def __init__(self, depth):
        self.depth = depth         # What level of the tree this node is at; depth=0 is the root node
        
        self.split_feature = None  # The index of the feature that this node splits, if any
        self.threshold = None      # The threshold used to split the feature
        
        self.left_child = None     # A node object (or None) representing the left-hand child of this node 
        self.right_child = None    # A node object (or None) representing the right-hand child of this node
        
        self.probs = None          # A numpy array of length 2 representing [p(y=0), p(y=1)] at this node
        
    def is_leaf(self):
        return self.left_child == None and self.right_child == None
        
    def __repr__(self):
        # Gives a nice looking representation if you call print on a node 
        if self.is_leaf():
            return f'Leaf Node \n -| Depth: {self.depth} \n -| Probs {self.probs}'
        else:
            return f'Internal Node \n -| Split feature: {self.split_feature} \n -| Threshold: {self.threshold}'

In [85]:
class DecisionTree:
    """
    A class representing a decision tree.
    """
    
    def __init__(self, max_depth=3):
        self.root = None            # A Node object which is the root of our tree
        self.max_depth = max_depth  # An integer representing the maximum allowed depth of the tree
        
    def class_prob_vector(self, y):
        """
        Given an array of labels y, compute p(y=0) and p(y=1).
        
        returns: a numpy array containing [p(y=0), p(y=1)].
        """
        ### YOUR CODE STARTS HERE ###
        p_0 = len(y[y==0]) / len(y)
        p_1 = len(y[y==1]) / len(y)
        probs = np.array([p_0, p_1])
        ###  YOUR CODE ENDS HERE  ###
        return probs
        
    def leaf_condition(self, node:Node):
        """
        Given a Node object, returns True if this is a leaf node and False otherwise.
        
        A Node is considered a leaf node if all labels at the node belong to the same class,
            or if the node is at the maximum allowed depth of the tree.
        """
        ### YOUR CODE STARTS HERE ###
        # Maximum allowed depth of the tree reached
        if node.depth == self.max_depth:
            return True
        # All labels at the node belong to the same class
        elif 0 in node.probs:
            return True
        else:
            return False
        ###  YOUR CODE ENDS HERE  ###
        
    def gini_score(self, X:np.ndarray, y:np.ndarray, i, threshold):
        """
        Given features X and labels y, computes the Gini index of splitting
            the i-th feature at the given threshold.
        """
        ### YOUR CODE STARTS HERE ###

        # Divide subset based on threshold on feature Xi
        left_x = X[X[:,i] <= threshold] # select all row which has its ith column (feature Xi) smaller than threshold
        right_x: np.ndarray = X[X[:,i] > threshold]
        left_y = y[X[:,i] <= threshold]
        right_y = y[X[:,i] > threshold]

        # Calculate probability
        P_r = right_x.shape[0]/len(y)
        P_l = 1 - P_r

        # calculate gini index
        G_r = calculate_gini(self.class_prob_vector(right_y))
        G_l = calculate_gini(self.class_prob_vector(left_y))

        # Calculate weighted gini index
        gini_index= P_r*G_r + P_l*G_l

        return gini_index
        ###  YOUR CODE ENDS HERE  ###
        
    def find_best_split(self, X:np.ndarray, y:np.ndarray):
        """
        Given features X and labels y, finds the best split based on the Gini index.
        
        returns: an index corresponding to which feature we are splitting,
            as well as the threshold we are splitting the feature at.
        """
        ### YOUR CODE STARTS HERE ###
        best_idx = 0
        best_threshold = 0
        lowest_gini_index = 1
        for idx in range(X.shape[1]): # for each index
            # get a list of unique value in idx column, sort them and take the half-way value between each two
            sorted_Xs = np.sort(np.transpose(X[:,idx]))
            possible_thresholds: list[float] = [(sorted_Xs[i]+sorted_Xs[i+1])/2 for i in range(0, len(sorted_Xs)-1)]
            for threshold in possible_thresholds:
                # calculate gini_index
                gini_index: float = self.gini_score(X,y,idx,threshold)
                # update the best idx and best_threshold if gini_index <= lowest_gini_index
                if gini_index < lowest_gini_index:
                    lowest_gini_index = gini_index
                    best_idx = idx
                    best_threshold = threshold
        ###  YOUR CODE ENDS HERE  ###       
        return best_idx, best_threshold
        
    def fit(self, X, y):
        """
        Fits the decision tree given features X and labels y.
        """
        assert isinstance(X, np.ndarray), 'X must be a numpy array'
        assert isinstance(y, np.ndarray), 'y must be a numpy array'
        
        self.build_tree(X, y, 0)
    
    def build_tree(self, X, y, depth):
        """
        Recursively builds the decision tree.
        """
                
        # Create a new node
        node = Node(depth)
        if depth == 0:
            self.root = node
            
        # Get the class probabilities for this node
        node.probs = self.class_prob_vector(y)
                
        # Check if this new node is a leaf node; otherwise, split it
        if self.leaf_condition(node):
            return node
        else:            
            # Find which feature to split on and the splitting threshold
            split_idx, split_threshold = self.find_best_split(X, y)
            
            # Create left/right splits
            left_idx = X[:, split_idx] <= split_threshold
            right_idx = X[:, split_idx] > split_threshold

            X_L, y_L = X[left_idx],y[left_idx]


            X_R, y_R = X[right_idx],y[right_idx]
            
            # Recursively split the left/right nodes
            node_L = self.build_tree(X_L, y_L, depth+1)
            node_R = self.build_tree(X_R, y_R, depth+1)
            
            # Fill in node information
            node.split_feature = split_idx
            node.threshold = split_threshold
            node.left_child = node_L
            node.right_child = node_R
            
            return node
        
    def predict(self, X):
        """
        After fitting the decision tree, this function can be called to make predictions
            for every data point in the feature array X. 
        """
        y_hat = []
        
        for x in X:
            # Make a prediction for every row in X
            y_hat.append(self._predict(x))
        
        y_hat = np.array(y_hat)
        return y_hat
                        
    def _predict(self, x):
        """
        Makes predictions on individual datapoints x.
        """
        current_node = self.root
        
        while True:
            if self.leaf_condition(current_node):
                # If we're at a leaf node, make a prediction based on the probabilities
                probs = current_node.probs
                y_hat = np.argmax(probs)
                return y_hat
            else:
                # Otherwise, traverse the tree based on the splits
                go_left = x[current_node.split_feature] <= current_node.threshold
                if go_left:
                    current_node = current_node.left_child
                else:
                    current_node = current_node.right_child
                
    
    def __repr__(self):
        # Pretty printing if we call print on our DecisionTree
        return f'Decision Tree \n -| Max Depth: {self.max_depth}'
    
    def plot(self,current_node:Node):
        # print current node
        # print left node
        # print right node
        if self.leaf_condition(current_node):
            print(current_node)
            return
        else:
            self.plot(current_node.left_child)
            self.plot(current_node.right_child)
        print(current_node)

### Problem 3.1 class_prob_vector (5 points):

- Complete the function `class_prob_vector`. This function takes in array of labels `y` and returns a numpy array containing $p(y=0)$ and $p(y=1)$.
- Run the code block given below to test your implementation. If your code is correct, all sanity checks should pass.

In [86]:
# Use this code block to test your implementation in Problem 3.1
# Don't change anything here -- just run it

dt = DecisionTree(max_depth=3)

y = np.array([0, 0, 1, 1, 1])
expected = np.array([0.4, 0.6])
out = dt.class_prob_vector(y)

print(f'Sanity Check 1 passed: {np.array_equal(expected, out)}')

y = np.array([1, 1, 1])
expected = np.array([0., 1.])
out = dt.class_prob_vector(y)

print(f'Sanity Check 2 passed: {np.array_equal(expected, out)}')

Sanity Check 1 passed: True
Sanity Check 2 passed: True


In [87]:
grader.check("q3.1 Class Prob Vector")

### Problem 3.2 leaf_condition (10 points):

In our decision tree implementation, we will consider a node to be a leaf node if either (a) all labels at the node belong to the same class, or (b) the node is at depth `max_depth`, where `max_depth` is an attribute of our `DecisionTree` that we can specify.

- Complete the function `leaf_condition`. This function should take in a `Node` object and return True if this node is a leaf node (according to the above criteria) and False otherwise. You should be able to determine this based on the attributes already defined in the `Node` class.
- Run the code block given below to test your implementation. If your code is correct, all sanity checks should pass.


In [88]:
# Use this code block to test your implementation in Problem 3.2
# Don't change anything here -- just run it

dt = DecisionTree(max_depth=3)

node = Node(depth=2)
node.probs = np.array([0.5, 0.5])
expected = False
out = dt.leaf_condition(node)
print(f'Sanity Check 1 passed: {np.array_equal(expected, out)}')

node = Node(depth=3)
node.probs = np.array([0.5, 0.5])
expected = True
out = dt.leaf_condition(node)
print(f'Sanity Check 2 passed: {np.array_equal(expected, out)}')

node = Node(depth=1)
node.probs = np.array([1., 0.])
expected = True
out = dt.leaf_condition(node)
print(f'Sanity Check 3 passed: {np.array_equal(expected, out)}')

Sanity Check 1 passed: True
Sanity Check 2 passed: True
Sanity Check 3 passed: True


In [89]:
grader.check("q3.2 Leaf Condition")

### Problem 3.3 gini_score (10 points):

Your next task is to implement a function that computes the Gini index.

- Complete the function `gini_score`. This function takes in features `X` and labels `y`, as well as a feature index `i` and a scalar `threshold`. Given these inputs, the function `gini_score` should return the Gini index (i.e. a single number) obtained by splitting the `i`th feature in `X` at the specified threshold.
- Run the code block given below to test your implementation. If your code is correct, all sanity checks should pass.

In [90]:
# Use this code block to test your implementation in Problem 3.3
# Don't change anything here -- just run it

dt = DecisionTree(max_depth=3)

X = np.array([[1,1], [2,8], [4,9], [6,7], [7,4], [8,11], [3,3], [5,5], [9,5], [10,8], [11, 6], [12,10]])
y = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

expected = 5/11
out = dt.gini_score(X, y, 0, 1.5)
print(f'Sanity Check 1 passed: {np.isclose(expected, out)}')

expected = 2/5
out = dt.gini_score(X, y, 0, 2.5)
print(f'Sanity Check 2 passed: {np.isclose(expected, out)}')

expected = 5/11
out = dt.gini_score(X, y, 1, 2)
print(f'Sanity Check 3 passed: {np.isclose(expected, out)}')

expected = 4/9
out = dt.gini_score(X, y, 1, 6.5)
print(f'Sanity Check 4 passed: {np.isclose(expected, out)}')

Sanity Check 1 passed: True
Sanity Check 2 passed: True
Sanity Check 3 passed: True
Sanity Check 4 passed: True


In [91]:
grader.check("q3.3 Gini Score")

### Problem 3.4 find_best_split (15 points): 

Now, you will use your function `gini_score` to compute the best splits. To do this, you will need to complete the function `find_best_split`. This function takes in features `X` and labels `y`, and returns the feature index and threshold corresponding to the best split as determined by `gini_score`. That is, for every feature and every threshold, you should compute the Gini index of splitting that feature at that threshold, and you should return the index of the feature and the threshold that results in the lowest Gini index.

To determine the thresholds, we will use the **midpoint** strategy. That is, given an array of feature values, we will consider all thresholds given by the midpoints between consecutive feature values. Here's an example. Suppose we are given a feature matrix $X$ with four datapoints and two features, given by
$$X = \begin{bmatrix} 1 & 2 \\ 1.5 & 2.5 \\ 0.75 & -1.0 \\ 3.0 & 0.5 \end{bmatrix}.$$

The thresholds to consider when splitting the second feature would then be $[-0.25, 1.25, 2.25]$.








- Complete the function `find_best_split` as detailed above.
- Run the code block given below to test your implementation. If your code is correct, all sanity checks should pass.

In [92]:
# Use this code block to test your implementation in Problem 3.4
# Don't change anything here -- just run it

dt = DecisionTree(max_depth=3)

X = np.array([[1,1], [2,8], [4,9], [6,7], [7,4], [8,11], [3,3], [5,5], [9,5], [10,8], [11, 6], [12,10]])
y = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

expected = (0, 8.5)
out = dt.find_best_split(X, y)
print(f'Sanity Check 1 passed: {expected == out}')

X_L, y_L = X[X[:, 0] <= 8.5], y[X[:, 0] <= 8.5]
expected = (1, 6)
out = dt.find_best_split(X_L, y_L)
print(f'Sanity Check 2 passed: {expected == out}')

Sanity Check 1 passed: True
Sanity Check 2 passed: True


In [93]:
grader.check("q3.4 Find Best Split")

### Problem 3.5 build_tree (10 points):

For the last step in implementing `DecisionTree`, you will need to complete the function `build_tree`. This function uses the functions you implemented in Problems 3.1-3.4 in order to recursively build your decision tree.

- Complete the function `build_tree`.
- Run the code block given below to test your implementation. If your code is correct, all sanity checks should pass.

In [96]:
# Use this code block to test your implementation in Problem 3.5
# Don't change anything here -- just run it


X = np.array([[1,1], [2,8], [4,9], [6,7], [7,4], [8,11], [3,3], [5,5], [9,5], [10,8], [11, 6], [12,10]])
y = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

expected = np.array([0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1])
dt = DecisionTree(max_depth=3)
dt.fit(X, y)
out = dt.predict(X)
print(f'Sanity Check passed: {np.allclose(expected, out, atol=1e-3)}')

Sanity Check passed: True


---
## Problem 4: Experimenting with Decision Trees

In the final problem of this assignment, you will experiment with the scikit-learn implementation of decision trees on the breast cancer wisconsin dataset. This dataset consists of 569 datapoints with 30 real-valued features and 2 possible labels. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. See the documentation [here](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html) for some more information on this dataset.

Before attempting this problem, you should read and understand the documentation for the `DecisionTreeClassifier`, available [here](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html).

**Important: for every decision tree classifier you train in this problem, make sure to set `random_state=seed` for reproducibility.**

We will first load in this dataset and create a train/test split. 

In [None]:
X, y = load_breast_cancer(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3, random_state=seed)

In [None]:
X_tr.shape

In [None]:
X_te.shape

<!-- BEGIN QUESTION -->

### Problem 4.1 Training a small DT (5 points):

First, you will see how to fit and visualize a decision tree classifier in sklearn.

- Using the class `DecisionTreeClassifier` from scikit-learn, train a decision tree on the training data. Use `max_depth=2`, and leave all other settings as their defaults. Note that, by default, `DecisionTreeClassifier` uses the Gini index to split nodes.
- Calculate the training set error and testing set error of your classifier.
- Use the function `plot_tree` to visualize your decision tree. This is already imported for you at the top of this notebook. See [here](https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html) for the corresponding documentation page.
- For the first split of the data, what feature and what threshold is being used in your classifier? You don't need to find the name of this feature, just its index, i.e. which column of `X`. 

In [None]:
### YOUR CODE STARTS HERE ###
clf = ...
...

clf_err_tr = ...
clf_err_te = ...

print(clf_err_tr)
print(clf_err_te)

...
###  YOUR CODE ENDS HERE  ###

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### Problem 4.2 Varying depth (5 points):

You will now vary the maximum depth allowed in your decision tree and see what effect this has on the error rate.

- Train a decision tree for every value of `max_depth` in `[1, 2, ..., 15]`. Use the default settings (other than `max_depth`). 
- Plot the resulting training and testing set accuracies as a function of depth. Be sure to include an x-label, a y-label, and a legend in your plot.
- Describe what you see happen as you increase the depth.

In [None]:
### YOUR CODE STARTS HERE ###
...
###  YOUR CODE ENDS HERE  ###

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### Problem 4.3 Varying min_leaf (5 points):

Lastly, you will vary the minimum number of datapoints allowed in a leaf node in your decision tree and see what effect this has on the error rate.

- Train a decision tree with `citerion='gini'` for every value of `min_leaf_samples` in `[1, 2, ..., 15]`. 
- Plot the resulting training and testing set accuracies as a function of the minimum leaf samples. Be sure to include an x-label, a y-label, and a legend in your plot.
- You should see that the training error increases as we increase the minimum number of leaf samples. Give an explanation for why this might happen.

In [None]:
### YOUR CODE STARTS HERE ###
...
###  YOUR CODE ENDS HERE  ###

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

---
### Statement of Collaboration (5 points)

It is **mandatory** to include a Statement of Collaboration in each submission, with respect to the guidelines below. Include the names of everyone involved in the discussions (especially in-person ones), and what was discussed. If you did not collaborate with anyone, you should write something like "I completed this assignment without any collaboration."

All students are required to follow the academic honesty guidelines posted on the course website. For
programming assignments, in particular, I encourage the students to organize (perhaps using EdD) to
discuss the task descriptions, requirements, bugs in my code, and the relevant technical content before they start
working on it. However, you should not discuss the specific solutions, and, as a guiding principle, you are not
allowed to take anything written or drawn away from these discussions (i.e. no photographs of the blackboard,
written notes, referring to EdD, etc.). Especially after you have started working on the assignment, try
to restrict the discussion to EdD as much as possible, so that there is no doubt as to the extent of your
collaboration.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

