**Important Note**: In this assignment, we will focus on building decision trees where the data contain only binary (0 or 1) 
    features. This allows us to avoid dealing with:

Multiple intermediate nodes in a split

The thresholding issues of real-valued features.


In [1]:
import numpy as np 
import pandas as pd 
pd.set_option('display.max_colwidth',-1)

# Load the lending club dataset

In [19]:
loans = pd.read_csv('lending-club-data.csv')

#2. Like the previous assignment, reassign the labels to have +1 for a safe loan, and -1 for a risky (bad) loan. You should have code analogous to

In [20]:
loans['safe_loans'] = loans['bad_loans'].apply(lambda x:+1 if  x==0 else -1)
del loans['bad_loans']

In [21]:
loans.head(2)

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,...,delinq_2yrs_zero,pub_rec_zero,collections_12_mths_zero,short_emp,payment_inc_ratio,final_d,last_delinq_none,last_record_none,last_major_derog_none,safe_loans
0,1077501,1296599,5000,5000,4975,36 months,10.65,162.87,B,B2,...,1.0,1.0,1.0,0,8.1435,20141201T000000,1,1,1,1
1,1077430,1314167,2500,2500,2500,60 months,15.27,59.83,C,C4,...,1.0,1.0,1.0,1,2.3932,20161201T000000,1,1,1,-1


Unlike the previous assignment where we used several features, in this assignment, we will just be using 4 categorical
features: 

1. grade of the loan 
2. the length of the loan term
3. the home ownership status: own, mortgage, rent
4. number of years of employment.

Since we are building a binary decision tree, we will have to convert these categorical features to a binary representation in a subsequent section using 1-hot encoding.

In [22]:
features = ['grade',              # grade of the loan
            'term',               # the term of the loan
            'home_ownership',     # home_ownership status: own, mortgage or rent
            'emp_length',         # number of years of employment
           ]
target = 'safe_loans'

#Extract these feature columns from the dataset, and discard the rest of the feature columns.

loans =loans[features+[target]]

Let's explore what the dataset looks like.

In [23]:
loans.head(3)

Unnamed: 0,grade,term,home_ownership,emp_length,safe_loans
0,B,36 months,RENT,10+ years,1
1,C,60 months,RENT,< 1 year,-1
2,C,36 months,RENT,10+ years,1


In [24]:
loans.columns

Index([u'grade', u'term', u'home_ownership', u'emp_length', u'safe_loans'], dtype='object')

**Notes to people using other tools**

If you are using SFrame, proceed to the section "Subsample dataset to make sure classes are balanced".

**If you are NOT using SFrame**, download the list of indices for the training and test sets:

module-5-assignment-2-train-idx.json.zip

module-5-assignment-2-test-idx.json.zip
Then follow the following steps:

Apply one-hot encoding to loans. Your tool may have a function for one-hot encoding.

Load the JSON files into the lists train_idx and test_idx.

Perform train/validation split using train_idx and test_idx. In Pandas, for instance:


In [25]:
# Apply one-hot encoding to loans. Your tool may have a function for one-hot encoding.


categorical_variables =[]
for feat_name,feat_type in zip(loans.columns,loans.dtypes):
    if feat_type==object: # In pandas dataframe string types shows as object 
        categorical_variables.append(feat_name)

#df['list_from_dict'] = [[x['name'] for x in list_dict] for list_dict in df['list_dicts']]

for feature in categorical_variables:
    loans_one_hot = loans[feature].apply(lambda x:{x:1})
    # the above o/p will give like :  1 {u' 60 months': 1}- so need to convert it like {' 60 months': 1}which is list of dicts
    loans_one_hot_encoded =loans_one_hot.values.tolist() # gives list of dict 
    loans_unpacked = pd.DataFrame(loans_one_hot_encoded) # gives a dataframe 
    
    # Change NaN's to 0's
    for columns in loans_unpacked.columns:
        loans_unpacked[columns]=loans_unpacked[columns].fillna(0)
        loans[columns] = loans_unpacked[columns].values
    del loans[feature]  # removing cols ['grade', 'sub_grade', 'home_ownership', 'purpose', 'term']
    


Let's see what the feature columns look like now:


In [26]:
features = loans.columns.drop('safe_loans')
# Remove the response variable safe_loans'
features

Index([u'A', u'B', u'C', u'D', u'E', u'F', u'G', u' 36 months', u' 60 months',
       u'MORTGAGE', u'OTHER', u'OWN', u'RENT', u'1 year', u'10+ years',
       u'2 years', u'3 years', u'4 years', u'5 years', u'6 years', u'7 years',
       u'8 years', u'9 years', u'< 1 year', u'n/a'],
      dtype='object')

In [27]:
len(features)

25

In [28]:
##Load the JSON files into the lists train_idx and test_idx.

# 1st read the indexes in a json file 
train_val=pd.read_json('module-5-assignment-2-train-idx.json')
test_val=pd.read_json('module-5-assignment-2-test-idx.json')

# list out the values which is ndarray
lst_train =train_val.values.tolist()
lst_test = test_val.values.tolist()
# flattening the list of list to single list 
train_idx = [item for sublist in lst_train for item in sublist]
test_idx = [item  for sublist in lst_test  for item in sublist]

In [29]:
#Perform train/validation split using train_idx and test_idx.

train_data = loans.iloc[train_idx]
test_data = loans.iloc[test_idx]
print len(train_data) , len(test_data)

37224 9284


In [30]:
loans.head(2)

Unnamed: 0,safe_loans,A,B,C,D,E,F,G,36 months,60 months,...,2 years,3 years,4 years,5 years,6 years,7 years,8 years,9 years,< 1 year,n/a
0,1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,-1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


**Note. **Some elements in loans are included neither in train_data nor test_data. This is to perform sampling to achieve class balance.

## Now proceed to the section "Decision tree implementation", skipping three sections below.
#4.Subsample dataset to make sure classes are balanced

#Transform categorical data into binary features

#Train-test split


## Decision tree implementation

In this section, we will implement binary decision trees from scratch.

There are several steps involved in building a decision tree. For that reason, we have split the entire assignment into several sections.

**Function to count number of mistakes while predicting majority class**

Recall from the lecture that prediction at an intermediate node works by predicting the majority class for all data points that belong to this node. Now, we will write a function that calculates the number of misclassified examples when predicting the majority class. This will be used to help determine which feature is the best to split on at a given node of the tree.



**Note:** Keep in mind that in order to compute the number of mistakes for a majority classifier, we only need the label (y values) of the data points in the node.

** Steps to follow **:
* ** Step 1:** Calculate the number of safe loans and risky loans.
* ** Step 2:** Since we are assuming majority class prediction, all the data points that are **not** in the majority class are considered **mistakes**.
* ** Step 3:** Return the number of **mistakes**.


Now, let us write the function `intermediate_node_num_mistakes` which computes the number of misclassified examples of an intermediate node given the set of labels (y values) of the data points contained in the node. Fill in the places where you find `## YOUR CODE HERE`. There are **three** places in this function for you to fill in.

In [17]:
sum(loans['safe_loans']==-1)


23150

In [31]:
def intermediate_node_num_mistakes(labels_in_node):
    # Corner case: If labels_in_node is empty, return 0
    if len(labels_in_node) == 0:
        return 0
    
    # Count the number of 1's (safe loans)
    ## YOUR CODE HERE
    safe =sum(labels_in_node==1)
    
    # Count the number of -1's (risky loans)
    ## YOUR CODE HERE
    risky = sum(labels_in_node==-1)
                
    # Return the number of mistakes that the majority classifier makes.
    # All the data points that are not in the majority class are considered mistakes
    ## YOUR CODE HERE
    if safe<risky:
        num_mistakes=safe
    else :
        num_mistakes=risky
    return num_mistakes

Because there are several steps in this assignment, we have introduced some stopping points where you can check your code and make sure it is correct before proceeding. To test your `intermediate_node_num_mistakes` function, run the following code until you get a **Test passed!**, then you should proceed. Otherwise, you should spend some time figuring out where things went wrong.

In [40]:
# Test case 1
example_labels = np.array([-1, -1, 1, 1, 1])
if intermediate_node_num_mistakes(example_labels) == 2:
    print 'Test passed!'
else:
    print 'Test 1 failed... try again!'

# Test case 2
example_labels = np.array([-1, -1, 1, 1, 1, 1, 1])
if intermediate_node_num_mistakes(example_labels) == 2:
    print 'Test passed!'
else:
    print 'Test 2 failed... try again!'
    
# Test case 3
example_labels = np.array([-1, -1, -1, -1, -1, 1, 1])
if intermediate_node_num_mistakes(example_labels) == 2:
    print 'Test passed!'
else:
    print 'Test 3 failed... try again!'

# Test case 4
example_labels = np.array([-1, -1, -1, -1, -1, -1, -1])
if intermediate_node_num_mistakes(example_labels) == 0:
    print 'Test passed!'
else:
    print 'Test 4 failed... try again!'

Test passed!
Test passed!
Test passed!
Test passed!


## Function to pick best feature to split on

The function **best_splitting_feature** takes 3 arguments: 
1. The data (SFrame of data which includes all of the feature columns and label column)
2. The features to consider for splits (a list of strings of column names to consider for splits)
3. The name of the target/label column (string)

The function will loop through the list of possible features, and consider splitting on each of them. It will calculate the classification error of each split and return the feature that had the smallest classification error when split on.

Recall that the **classification error** is defined as follows:
$$
\mbox{classification error} = \frac{\mbox{# mistakes}}{\mbox{# total examples}}
$$

Follow these steps: 
* **Step 1:** Loop over each feature in the feature list
* **Step 2:** Within the loop, split the data into two groups: one group where all of the data has feature value 0 or False (we will call this the **left** split), and one group where all of the data has feature value 1 or True (we will call this the **right** split). Make sure the **left** split corresponds with 0 and the **right** split corresponds with 1 to ensure your implementation fits with our implementation of the tree building process.
* **Step 3:** Calculate the number of misclassified examples in both groups of data and use the above formula to compute the **classification error**.
* **Step 4:** If the computed error is smaller than the best error found so far, store this **feature and its error**.

This may seem like a lot, but we have provided pseudocode in the comments in order to help you implement the function correctly.

**Note:** Remember that since we are only dealing with binary features, we do not have to consider thresholds for real-valued features. This makes the implementation of this function much easier.

Fill in the places where you find `## YOUR CODE HERE`. There are **five** places in this function for you to fill in.

In [36]:
def best_splitting_feature(data ,features,target):
    best_feature = None # Keep track of the best feature
    best_error =10 # Keep track of the best error so far 
    # Note: Since error is always <= 1, we should intialize it with something larger than 1.
    
    # Convert to float to make sure error gets computed correctly.
    num_data_points = float(len(data)) 
    
    #Step 1: Loop over each feature in the feature list
    for feat in features:
         #Step 2: Within the loop, split the data into two groups
        #The left split will have all data points where the feature value is 0
        left_split= data[data[feat]==0]
        # The right split will have all data points where the feature value is 1
        right_split = data[data[feat]==1]
        
        #Step 3: Calculate the number of misclassified examples in both groups of data and use the above formula 
        #to compute the classification error.
        left_misclassified = intermediate_node_num_mistakes(left_split[target])
        right_misclassified = intermediate_node_num_mistakes(right_split[target])
        error = (left_misclassified + right_misclassified) / num_data_points 
        
        #Step 4: If the computed error is smaller than the best error found so far, store this feature and its error.
        if (error < best_error):
            best_error = error
            best_feature= feat
    
    return best_feature # Return the best feature we found        

In [37]:
best_splitting_feature(train_data, features, 'safe_loans')

' 36 months'

To test your best_splitting_feature function, run the following code:

In [38]:
if best_splitting_feature(train_data, features, 'safe_loans') == ' 36 months':
    print 'Test passed!'
else:
    print 'Test failed... try again!'

Test passed!


## Building the tree

With the above functions implemented correctly, we are now ready to build our decision tree. Each node in the decision tree is represented as a dictionary which contains the following keys and possible values:

    { 
       'is_leaf'            : True/False.
       'prediction'         : Prediction at the leaf node.
       'left'               : (dictionary corresponding to the left tree).
       'right'              : (dictionary corresponding to the right tree).
       'splitting_feature'  : The feature that this node splits on.
    }

First, we will write a function that creates a leaf node given a set of target values. Fill in the places where you find `## YOUR CODE HERE`. There are **three** places in this function for you to fill in.

In [67]:
def create_leaf(target_values):
    # create a leaf node 
    leaf ={'splitting_feature':None,
          'left':None,
          'right' :None,
          'is_leaf' : True,
          'prediction':None}
    
    # Count the number of data points that are +1 and -1 in this node.
    num_ones = len(target_values[target_values==1])
    num_minus_ones = len(target_values[target_values == -1]) 
    
    # For the leaf node, set the prediction to be the majority class.
    # Store the predicted class (1 or -1) in leaf['prediction']
    if num_ones > num_minus_ones:
        leaf['prediction'] = 1
    else:
        leaf['prediction'] = -1
        
    # Return the leaf node  
    return leaf
        

We have provided a function that learns the decision tree recursively and implements 3 stopping conditions:
1. **Stopping condition 1:** All data points in a node are from the same class.
2. **Stopping condition 2:** No more features to split on.
3. **Additional stopping condition:** In addition to the above two stopping conditions covered in lecture, in this assignment we will also consider a stopping condition based on the **max_depth** of the tree. By not letting the tree grow too deep, we will save computational effort in the learning process. 

Now, we will write down the skeleton of the learning algorithm. Fill in the places where you find `## YOUR CODE HERE`. There are **seven** places in this function for you to fill in.

In [65]:
def decision_tree_create(data, features, target, current_depth = 0, max_depth = 10):
    remaining_features = features[:] # Make a copy of the features.
    target_values = data[target]
    print "--------------------------------------------------------------------"
    print "Subtree, depth = %s (%s data points)." % (current_depth, len(target_values))
    

    # Stopping condition 1
    # (Check if there are mistakes at current node.
    # Recall you wrote a function intermediate_node_num_mistakes to compute this.)
    if  intermediate_node_num_mistakes(target_values) == 0:  ## YOUR CODE HERE
        print "Stopping condition 1 reached."     
        # If not mistakes at current node, make current node a leaf node
        return create_leaf(target_values)
    
    # Stopping condition 2 (check if there are remaining features to consider splitting on)
    if len(remaining_features)==0:   ## YOUR CODE HERE
        print "Stopping condition 2 reached."    
        # If there are no remaining features to consider, make current node a leaf node
        return create_leaf(target_values)    
    
    # Additional stopping condition (limit tree depth)
    if current_depth >= max_depth:  ## YOUR CODE HERE
        print "Reached maximum depth. Stopping for now."
        # If the max tree depth has been reached, make current node a leaf node
        return create_leaf(target_values)

    # Find the best splitting feature (recall the function best_splitting_feature implemented above)
    ## YOUR CODE HERE
    splitting_feature = best_splitting_feature(data,remaining_features,target)
    
    # Split on the best feature that we found. 
    left_split = data[data[splitting_feature] == 0]
    right_split = data[data[splitting_feature] == 1]      ## YOUR CODE HERE
    remaining_features.remove(splitting_feature)
    print "Split on feature %s. (%s, %s)" % (\
                      splitting_feature, len(left_split), len(right_split))
    
    # Create a leaf node if the split is "perfect"
    # i.e all data have same target value
    if len(left_split) == len(data):
        print "Creating leaf node."
        return create_leaf(left_split[target])
    if len(right_split) == len(data):
        print "Creating leaf node."
        ## YOUR CODE HERE
        return create_leaf(right_split[target])
        
    # Repeat (recurse) on left and right subtrees
    left_tree = decision_tree_create(left_split, remaining_features, target, current_depth + 1, max_depth)        
    ## YOUR CODE HERE
    right_tree = decision_tree_create(right_split, remaining_features, target, current_depth + 1, max_depth)        

    return {'is_leaf'          : False, 
            'prediction'       : None,
            'splitting_feature': splitting_feature,
            'left'             : left_tree, 
            'right'            : right_tree}

Here is a recursive function to count the nodes in your tree:

In [55]:
def count_nodes(tree):
    if tree['is_leaf']:
        return 1
    return 1 + count_nodes(tree['left']) + count_nodes(tree['right'])

Run the following test code to check your implementation. Make sure you get 'Test passed' before proceeding.




In [56]:
#we need to drop target from independent features:train_data.drop('safe_loans',axis=1)
x=train_data.drop('safe_loans',axis=1)
features_new = [col for col in x.columns]
print features_new

['A', 'B', 'C', 'D', 'E', 'F', 'G', ' 36 months', ' 60 months', 'MORTGAGE', 'OTHER', 'OWN', 'RENT', '1 year', '10+ years', '2 years', '3 years', '4 years', '5 years', '6 years', '7 years', '8 years', '9 years', '< 1 year', 'n/a']


In [68]:
small_data_decision_tree = decision_tree_create(train_data,features_new, 
                                                target,max_depth = 3)
if count_nodes(small_data_decision_tree) == 13:
    print 'Test passed!'
else:
    print 'Test failed... try again!'
    print 'Number of nodes found                :', count_nodes(small_data_decision_tree)
    print 'Number of nodes that should be there : 13' 

--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature  36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Split on feature B. (8074, 1048)
--------------------------------------------------------------------
Subtree, depth = 3 (8074 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------------------------
Subtree, depth = 3 (1048 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Split on feature n/a. (96, 5)
--------------------------------------------------------------------
Subtree, depth = 3 (96 data points).
Reached maximum depth. Sto

## Build the tree!

Now that all the tests are passing, we will train a tree model on the **train_data**. Limit the depth to 6 (**max_depth = 6**) to make sure the algorithm doesn't run for too long. Call this tree **my_decision_tree**. 

**Warning**: This code block may take 1-2 minutes to learn. 

In [69]:
# Make sure to cap the depth at 6 by using max_depth = 6
my_decision_tree = decision_tree_create(train_data, features_new, 'safe_loans', max_depth = 6)

--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature  36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Split on feature B. (8074, 1048)
--------------------------------------------------------------------
Subtree, depth = 3 (8074 data points).
Split on feature C. (5884, 2190)
--------------------------------------------------------------------
Subtree, depth = 4 (5884 data points).
Split on feature D. (3826, 2058)
--------------------------------------------------------------------
Subtree, depth = 5 (3826 data points).
Split on feature E. (1693, 2133)
--------------------------------------------------------------------
Subtree, depth = 6 (1693 data points).
Reached maximum depth. Stopping for 

## Making predictions with a decision tree

As discussed in the lecture, we can make predictions from the decision tree with a simple recursive function. Below, we call this function `classify`, which takes in a learned `tree` and a test point `x` to classify.  We include an option `annotate` that describes the prediction path when set to `True`.

Fill in the places where you find `## YOUR CODE HERE`. There is **one** place in this function for you to fill in.

In [71]:
def classify(tree, x, annotate = False):   
    # if the node is a leaf node.
    if tree['is_leaf']:
        if annotate: 
            print "At leaf, predicting %s" % tree['prediction']
        return tree['prediction'] 
    else:
        # split on feature.
        split_feature_value = x[tree['splitting_feature']]
        if annotate: 
            print "Split on %s = %s" % (tree['splitting_feature'], split_feature_value)
        if split_feature_value == 0:
            return classify(tree['left'], x, annotate)
        else:
               ### YOUR CODE HERE
            return classify(tree['right'], x, annotate)

Now, let's consider the first example of the test set and see what my_decision_tree model predicts for this data point.

In [24]:
my_test = dict(zip(test_data.columns, test_data.iloc[0,:]))
my_test

{' 36 months': 0.0,
 ' 60 months': 1.0,
 '1 year': 0.0,
 '10+ years': 0.0,
 '2 years': 1.0,
 '3 years': 0.0,
 '4 years': 0.0,
 '5 years': 0.0,
 '6 years': 0.0,
 '7 years': 0.0,
 '8 years': 0.0,
 '9 years': 0.0,
 '< 1 year': 0.0,
 'A': 0.0,
 'B': 0.0,
 'C': 0.0,
 'D': 1.0,
 'E': 0.0,
 'F': 0.0,
 'G': 0.0,
 'MORTGAGE': 0.0,
 'OTHER': 0.0,
 'OWN': 0.0,
 'RENT': 1.0,
 'n/a': 0.0,
 'safe_loans': -1.0}

In [72]:
print 'Predicted class: %s ' % classify(my_decision_tree, test_data.iloc[0,:])

Predicted class: -1 


In [37]:
#Let's add some annotations to our prediction to see what the prediction path was that lead to this predicted class:

In [73]:
classify(my_decision_tree, test_data.iloc[0,:], annotate=True)

Split on  36 months = 0.0
Split on A = 0.0
Split on B = 0.0
Split on C = 0.0
Split on D = 1.0
At leaf, predicting -1


-1

**Quiz Question:** What was the feature that my_decision_tree first split on while making the prediction for test_data[0]?


36 months is root 

**Quiz Question:** What was the first feature that lead to a right split of test_data[0]?

grade.D

**Quiz Question:** What was the last feature split on before reaching a leaf node for test_data[0]?

grade.D

In [41]:
target

'safe_loans'

In [75]:
def evaluate_classification_error(tree, data, target):
    # Apply the classify(tree, x) to each row in your data
    prediction=[]
    for j in xrange(len(data)):
        #print i
        prediction.append(classify(tree, data.iloc[j]))

    #prediction = data.apply(lambda x: classify(tree, x))
    # Once you've made the predictions, calculate the classification error and return it
    ## YOUR CODE HERE
    num_errors = 0 
    for i in xrange(len(data)):
        if data[target].iloc[i]!=prediction[i]:
            num_errors+=1
    return num_errors/float(len(data))


In [76]:
evaluate_classification_error(my_decision_tree, test_data, target) # target is safe_loans

0.3837785437311504

Quiz Question: Rounded to 2nd decimal point, what is the classification error of my_decision_tree on the test_data?

In [46]:
#0.38

## Printing out a decision stump
As discussed in the lecture, we can print out a single decision stump (printing out the entire tree is left as an exercise to the curious reader).




In [89]:
def print_stump(tree, name = 'root'):
    split_name = tree['splitting_feature'] # split_name is something like 'term. 36 months'
    if split_name is None:
        print "(leaf, label: %s)" % tree['prediction']
        return None
    #split_feature, split_value = split_name.split('.')
    print '                       %s' % name
    print '         |---------------|----------------|'
    print '         |                                |'
    print '         |                                |'
    print '         |                                |'
    print '  [{0} == 0]                      [{0} == 1]'.format(split_name)
    print '         |                                |'
    print '         |                                |'
    print '         |                                |'
    print '    (%s)                         (%s)' \
        % (('leaf, label: ' + str(tree['left']['prediction']) if tree['left']['is_leaf'] else 'subtree'),
           ('leaf, label: ' + str(tree['right']['prediction']) if tree['right']['is_leaf'] else 'subtree'))

In [90]:
print_stump(my_decision_tree)

                       root
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [ 36 months == 0]                      [ 36 months == 1]
         |                                |
         |                                |
         |                                |
    (subtree)                         (subtree)


**Quiz Question:** What is the feature that is used for the split at the root node?

### Exploring the intermediate left subtree

The tree is a recursive dictionary, so we do have access to all the nodes! We can use
* `my_decision_tree['left']` to go left
* `my_decision_tree['right']` to go right

In [87]:
#  feature that is used for the split at the root  - 36 months

In [91]:
print_stump(my_decision_tree['left'], my_decision_tree['splitting_feature'])

                        36 months
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [A == 0]                      [A == 1]
         |                                |
         |                                |
         |                                |
    (subtree)                         (subtree)


### Exploring the left subtree of the left subtree


In [92]:
print_stump(my_decision_tree['left']['left'], my_decision_tree['left']['splitting_feature'])

                       A
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [B == 0]                      [B == 1]
         |                                |
         |                                |
         |                                |
    (subtree)                         (subtree)


**Quiz Question:** What is the path of the **first 3 feature splits** considered along the **left-most** branch of **my_decision_tree**?

**Quiz Question:** What is the path of the **first 3 feature splits** considered along the **right-most** branch of **my_decision_tree**?

In [93]:
print_stump(my_decision_tree['left']['left']['left'], my_decision_tree['left']['splitting_feature'])

                       A
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [C == 0]                      [C == 1]
         |                                |
         |                                |
         |                                |
    (subtree)                         (leaf, label: -1)


In [94]:
print_stump(my_decision_tree['left']['left']['left']['left'], my_decision_tree['left']['splitting_feature'])

                       A
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [D == 0]                      [D == 1]
         |                                |
         |                                |
         |                                |
    (subtree)                         (leaf, label: -1)


In [95]:
#Quiz Question: What is the path of the first 3 feature splits considered along the left-most branch of my_decision_tree?

#   36 months ->  .A-> .B 

### Exploring the right subtree of the right subtree

In [96]:
print_stump(my_decision_tree['right'], my_decision_tree['right']['splitting_feature'])

                       D
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [D == 0]                      [D == 1]
         |                                |
         |                                |
         |                                |
    (subtree)                         (leaf, label: -1)


In [97]:
print_stump(my_decision_tree['right']['right'], my_decision_tree['right']['splitting_feature'])

(leaf, label: -1)


In [98]:
#Quiz Question: What is the path of the first 3 feature splits considered along the right-most branch of my_decisi
# . 36 months ->   .D->leaf
# . 36 months,  .D, no third feature because second split resulted in leaf