# Introduction

### See the whitepaper in this repository for more description of the ideas.

### This notebook is experiments with a tree of centroids with an option to append them to the image.

### Experiments use one of two types of models for prediction: kNN and CNN. 

### kNN treats the 28x28 image as an array of 784 pixels and uses L2 distance metric. 

### CNN is lifted verbatim from the book "Deep Learning with Python" by Francois Chollet, 2018 (page 120-122) the inventor of Keras. The CNN is not SOTA but is useful to see whether adding information about centroids improve a neural net model.

### The training/testing set of images is either small (6,000/1,000) or large (60,000/10,000).

### The bottom line result is the error rate reduction.

## Results from experiments

| model\#images | <font size="5">6,000</font> | <font size="5">60,000</font> |
| --- | --- | --- |
| baseline kNN | <font size="5">91.6%</font> | <font size="5">96.88%</font> |
| improved kNN | <font size="5">94.1%</font> | <font size="5">97.12%</font> |

### and

| model\#images | <font size="5">6,000</font> | <font size="5">60,000</font> |
| --- | --- | --- |
| baseline CNN | <font size="5">96.29%</font> | <font size="5">99.02%</font> |
| improved CNN | <font size="5">96.90%</font> | <font size="5">99.10%</font> |

### From these results, the _*error rate reduction*_ is

| model\#images | <font size="5">6,000</font> | <font size="5">60,000</font> |
| --- | --- | --- |
| kNN | <font size="5">29.76%</font> | <font size="5">7.7%</font> | 
| CNN | <font size="5">16.4%</font> | <font size="5">9%</font> | 


## <a id='toc'></a>
# Table of Contents

## <a href='#section1'>section 1. import libraries and Configuration parameters</a>

## <a href='#section1a'>section 1a. function defs only; no test runs</a>

### list of all functions and a brief description of each.

## <a href='#section2'>section 2. test runs only; no function defs</a>

## <a href='#s2baselines'>section 2a. baselines</a>

##  <a href='#s2improvements'>section 2b. improvements summary</a>

## <a href='#s2quadrants'>section 2c. answering specific questions</a>

### <a href='#s2quadrants'>trying overlapping quadrants</a>

### <a href='#s2noappendimage'>try just the centroid info without the image; Do this with option parent centroids with overlap</a>

### <a href='#s2weightcentroids'>try weighted centroids (no overlap) with image</a>

### <a href='#s2treesizes'>try different sizes for the tree of centroids</a>

### <a href='#s2weightedparents'>compare weighted parent centroids to just the image</a>

### <a href='#s2relativebdy'>try using relation of image to boundary</a>

### <a href='#s2onlyleaves'>try without image (just the centroid info); Use only the leaves of the tree of centroids</a>

### <a href='#s2onlybigtree'>try without image (just the centroid info) and more neighbors, larger tree</a>

### <a href='#s2norelativebdy'>try not getting points relative to the bounding rectangle. </a>

### <a href='#s2weightcentroids'>append the image to the weighted centroids</a>

### <a href='#s2onlyleaves2'>Try weighted centroids with parents</a>

### <a href='#s2overlapparentweight'>...and now overlap the weighted centroids with parents improved CNN and kNN for small set of images</a>

### <a href='#s2overlaponlyleaves'>...and now do without the parents; CNN did worse than baseline for small set of images</a>

### <a href='#s2reprokNNimprove'>Reproduce the improvement for kNN with options weighted centroid but no overlap, nor parents,</a>

## <a href='#section3'>section 3. reproduce results before summary</a>

### Repeating baseline give the exact same result for kNN but CNN fluctuates significantly

### Repeat the largest improvement for kNN small 6,000 set of images
- overlapping quadrants by 1/8; all with weighted centroids

### Repeat the largest improvement for kNN large 60,000 set of images
- overlapping quadrants by 1/8; all with weighted centroids

### Repeat the largest improvement for CNN small 6,000 set of images
- using weighted centroids with parents

### Repeat the largest improvement for CNN large 60,000 set of images
- using weighted centroids with parents

## <a href='#section4'>section 4. summary</a>

## <a href='#section5'>section 5. areas for further work</a>

<a id='section1'></a>
<a href='#toc'>Goto Table of Contents</a>


In [2]:
%matplotlib inline
import math as math
import random as rand
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold
from sklearn.decomposition import PCA

from datetime import datetime
import statistics 
%matplotlib inline

#
# start comment out if Config.USE_AWS Amazon Web Service
#
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential, load_model
from keras.datasets import mnist
from keras.utils import to_categorical
# end of comment out if Config.USE_AWS

from numpy.linalg import svd
from numpy.linalg import inv
from numpy.linalg import matrix_rank
from numpy.linalg import norm
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from scipy import stats
    

Using TensorFlow backend.


In [3]:
import cProfile
import re

def debug(detail,the_output):
    if detail < 2:
        print(datetime.now(),the_output)
    return

pd.set_option('display.max_rows', 120)
pd.set_option('display.max_columns', 999)
pd.set_option('display.width', 1000)
np.set_printoptions(edgeitems=150,linewidth=200)

In [4]:
class Config:
    NUM_KERAS_TRAIN_IMAGES=12_000
    NUM_KERAS_TRAIN_LABELS=NUM_KERAS_TRAIN_IMAGES
    
    NUM_KERAS_TEST_IMAGES=2_000
    NUM_KERAS_TEST_LABELS=NUM_KERAS_TEST_IMAGES
    
    # the number of pixels of each image along x and y axis
    NUM_X_PIXELS=28
    NUM_Y_PIXELS=28
    
    NUM_PIXELS_PER_IMAGE=NUM_X_PIXELS*NUM_Y_PIXELS
    
    # the number of output classes i.e. digits 0-9
    NUM_DIGITS=10

    # only points whose intensity if greater than the cutoff are considered for centroids
    # or any other purpose
    PIXEL_CUTOFF=5

    # kNN number of neighbors to vote on final classification
    N_NEIGHBORS=5
    
    # run the train/tests with no options at all;
    # do NOT append the centroid information to the image
    JUST_DO_BASELINE=False
    
    # use the image as well as the centroid information of the image
    APPEND_IMAGE=False
    
    # collect centroid information of each parent i.e. level(s) above the leaves of the tree of centroids
    # of quadrants, subquadrants, subsubquadrants, etc.
    USE_PARENT_CENTROIDS=False
    
    # scale the relative position to match the range of pixel intensities (0-255) so that they are
    # of comparable importance in kNN distances
    WEIGHTED_CENTROID=False
    
    # use quadrant subrectangles that overlap by some percentage;
    # therefore, some points will be in more than one quadrant
    OVERLAP_QUADS_RATIO=1

<a id='section1a'></a>
<a href='#toc'>Goto Table of Contents</a>

## Function defs only; no test runs

## List of all functions and a brief description of each.

### Terminology:

### A rectangle is represented by the UpperLeft corner point and the LowerRight corner point

### The leaves of the tree of centroids are the centroid of each subquadrant of each subquadrant of each... of each quadrant of the bounding rectangle.

### Functions

**Get the minimum bounding rectangle of a set of points**
- get_max_min_rect(points)
    return rect

**Get the center of mass of a set of points**
- get_centroid(points)
    return centroid

**Get all points in an image darker than PIXEL_CUTOFF**
- get_dark_pixel_coords(one_image)
    return points

**Get the height of the tree of centroids 
The leaves are the centroid of each quadrant of each subquadrant of each...etc**
- get_tree_height(rect)
    return height

**Get the rectangle that is one of the four quadrants of the given rectangle**
- get_quadrant(quad_ndx,rect)
    return subrect

**Get the set of points inside a given subrectangle**
- get_subrect_pts(subrect,points)
    return subrect_pts

**Get a numpy array of all centroids within a rectangle and its quadrants and its subquadrants etc.
There is one centroid per tree leaf and, if parent centroids config option is true, one per node
in the tree levels above the leaves.**
- get_centroids_tree(points,rect,tree_level)
    return centroids

**Get all the centroids for one image, primarily it just calls get_centroids_tree()**
- get_centroids(one_image)
    return (centroids,rect)

**Get the size of the rect along the y-axis in number of pixels**
- get_length(rect)
    return length

**Get the size of the rect along the x-axis in number of pixels**
- get_width(rect)
    return width
	
**Get the image's minimum bounding rectangle and then get all the centroids relative to that rectangle**
- get_centroids_relative(one_image)
    return centroids_relative

**Create a deep learning CNN model for an image with a given number of pixels for width and length**
- create_CNN_model(num_x_pixels,num_y_pixels)
    return model

**Evaluate the deep learning CNN model with the training and testing test and labels**
- train_eval_CNN_model(model,train_images,train_labels,\
                         test_images,test_labels,num_extra_rows)
    return scores

**Output a report with accuracy for deep learning CNN model**
- CNN_report(X,y,Xtest,ytest)
    return cvscores

**Output a report with accuracy when using kNN model for prediction**
- kNN_report(centroids_array,y,test_centroids_array,ytest)
    return

**Run the test; assumes the configuration parameters are set to choose what options are used**
- run_tests()
    return


In [5]:
def get_max_min_rect(points):
    x_coords=points[:,0]
    y_coords=points[:,1]
    
    if len(points)>0:
        max_x=np.max(x_coords)
        min_x=np.min(x_coords)

        max_y=np.max(y_coords)
        min_y=np.min(y_coords)
    else:
        max_x=0
        min_x=0
        max_y=0
        min_y=0
    rect=((min_x,max_y),(max_x,min_y))
    return rect
# get_max_min_rect(points)

In [6]:
def get_centroid(points):
    if len(points)>0:
        x_centroid=np.mean(points[:,0])
        y_centroid=np.mean(points[:,1])
    else:
        x_centroid=0
        y_centroid=0
    if Config.WEIGHTED_CENTROID:
        centroid=np.ndarray((0,3))
        if len(points)>0:
            weight=len(points[0])
        else:
            weight=0
        centroid=np.vstack((centroid,(x_centroid,y_centroid,weight)))
    else:
        centroid=np.ndarray((0,2))
        centroid=np.vstack((centroid,(x_centroid,y_centroid)))
#     print(type(centroid),centroid.shape)
    return centroid
# centroid=get_centroid(points)
# print(centroid)

In [7]:
def get_dark_pixel_coords(one_image):
    points=np.where(one_image>Config.PIXEL_CUTOFF)

#     print('points',points)
    points=np.vstack([points[0],points[1]]).T
#     print('points[0]',points[0])
#     print('points[-1]',points[-1])
    
    return points

In [8]:
def get_tree_height(rect):
    height=TREE_HEIGHT
    return height

In [9]:
def get_quadrant(quad_ndx,rect):
#     print('get_quadrant:quad_ndx',quad_ndx,':rect',rect)
    minx=rect[0][0]
    maxy=rect[0][1]
    
    maxx=rect[1][0]
    miny=rect[1][1]
    
    center_pt=((maxx+minx)/2,(maxy+miny)/2)
    
    is_y_axis_length=((maxx-minx)<(maxy-miny))
#     print('is_y_axis_length',is_y_axis_length)

    UR=(maxx,maxy)
    LR=(maxx,miny)

    UL=(minx,maxy)
    LL=(minx,miny)
        
    top_center=np.sum((UL,UR),axis=0)/2
    bottom_center=np.sum((LL,LR),axis=0)/2

#     print('UL',UL)
#     print('LL',LL)
    left_center=np.sum((UL,LL),axis=0)/2
    right_center=np.sum((UR,LR),axis=0)/2
#     print('top_center',top_center)
#     print('bottom_center',bottom_center)
#     print('left_center',left_center)
#     print('right_center',right_center)

    if quad_ndx==0:
        UL_subrect_corner=top_center
        LR_subrect_corner=right_center
    elif quad_ndx==1:
        UL_subrect_corner=UL
        LR_subrect_corner=center_pt
    elif quad_ndx==2:
        UL_subrect_corner=left_center
        LR_subrect_corner=bottom_center
    elif quad_ndx==3:
        UL_subrect_corner=center_pt
        LR_subrect_corner=LR
        
    if Config.OVERLAP_QUADS_RATIO!=1:
        overlap_x=(LR_subrect_corner[0]-UL_subrect_corner[0])*\
                    Config.OVERLAP_QUADS_RATIO
        overlap_y=(UL_subrect_corner[1]-LR_subrect_corner[1])*\
                    Config.OVERLAP_QUADS_RATIO
#         print('overlap_x',overlap_x)
#         print('overlap_y',overlap_y)
        
        UL_subrect_corner=UL_subrect_corner+(-overlap_x,overlap_y)
        LR_subrect_corner=LR_subrect_corner+(overlap_x,-overlap_y)
        subrect=(UL_subrect_corner,LR_subrect_corner)
    else:
        subrect=(UL_subrect_corner,LR_subrect_corner)
#     print('return get_quadrant:subrect',subrect)
    return subrect
Config.OVERLAP_QUADS_RATIO=1
get_quadrant(0,((5, 122), (24, 6)))
# get_quadrant(1,((5, 122), (24, 6)))
# get_quadrant(2,((5, 122), (24, 6)))
# get_quadrant(3,((5, 122), (24, 6)))
# get_quadrant(0,((5, 22), (124, 6)))
# get_quadrant(1,((5, 22), (124, 6)))
# get_quadrant(2,((5, 22), (124, 6)))
# get_quadrant(3,((5, 22), (124, 6)))
Config.OVERLAP_QUADS_RATIO=0.5
get_quadrant(0,((5, 122), (24, 6)))
# get_quadrant(1,((5, 122), (24, 6)))
# get_quadrant(2,((5, 122), (24, 6)))
# get_quadrant(3,((5, 122), (24, 6)))
# get_quadrant(0,((5, 22), (124, 6)))
# get_quadrant(1,((5, 22), (124, 6)))
# get_quadrant(2,((5, 22), (124, 6)))
# get_quadrant(3,((5, 22), (124, 6)))
pass

In [10]:
def get_subrect_pts(subrect,points):
#     print('get_subrect_pts:subrect',subrect)
#     print('get_subrect_pts:len(points)',len(points))
    (UL,LR)=subrect
    ULx=UL[0]
    ULy=UL[1]
    LRx=LR[0]
    LRy=LR[1]
    subrect_pts_ndx=np.where((points[:,0]>=ULx) & (points[:,0]<=LRx)&\
                     (points[:,1]<=ULy) & (points[:,1]>=LRy))
#     print('subrect_pts_ndx',subrect_pts_ndx)
    subrect_pts=points[subrect_pts_ndx]
#     print('return get_subrect_pts:subrect_pts',len(subrect_pts),subrect_pts)
    return subrect_pts
points=np.array(((1,2),(1,3),(4,2),(5,4)),dtype=float)
subrect=((0,3),(1,1))
get_subrect_pts(subrect,points)

array([[1., 2.],
       [1., 3.]])

In [11]:
def get_centroids_tree(points,rect,tree_level):

    if Config.USE_PARENT_CENTROIDS:
        centroids=get_centroid(points)
    else:
        if Config.WEIGHTED_CENTROID:
            centroids=np.ndarray((0,3))
        else:
            centroids=np.ndarray((0,2))

    if tree_level==0:
        centroids=get_centroid(points)
    else:
        for quad_ndx in range(4):
            subrect=get_quadrant(quad_ndx,rect)
            subrect_pts=get_subrect_pts(subrect,points)
            quad_centroids=get_centroids_tree(subrect_pts,subrect,tree_level-1)
            centroids=np.vstack((centroids,quad_centroids))
    
    return centroids
# get_centroids(points,rect,tree_level)

In [12]:
def get_centroids(one_image):

    points=get_dark_pixel_coords(one_image)
    rect=get_max_min_rect(points)
    tree_height=get_tree_height(rect)
    centroids=get_centroids_tree(points,rect,tree_height)
#     print(type(centroids),centroids.shape)
    return (centroids,rect)
# centroids_list=get_centroids(one_image)
# print(centroids_list)

In [13]:
def get_length(rect):
    minx=rect[0][0]
    maxy=rect[0][1]
    
    maxx=rect[1][0]
    miny=rect[1][1]
    
    length=(maxy-miny)
    return length

In [14]:
def get_width(rect):
    minx=rect[0][0]
    maxy=rect[0][1]
    
    maxx=rect[1][0]
    miny=rect[1][1]
    
    width=(maxx-minx)
    return width

In [15]:
def get_centroids_relative(one_image):
    (centroids,boundary_rect)=get_centroids(one_image)
    boundary_x_width=get_width(boundary_rect)
    boundary_y_length=get_length(boundary_rect)
    if Config.WEIGHTED_CENTROID:
        boundary_x_width=boundary_x_width/255# scaleto max pixel intensity
        boundary_y_length=boundary_y_length/255
    if True or USE_REL_BDY:
        LLx=boundary_rect[0][0]
        LLy=boundary_rect[1][1]
        centroids_relative=((centroids[:,0]-LLx),\
                            (centroids[:,1]-LLy))
        centroids_relative=((centroids[:,0]-LLx)/boundary_x_width,\
                            (centroids[:,1]-LLy)/boundary_y_length)#-bottom_of_dark_pixels
#         print('centroids_relative',centroids_relative)
        centroids_relative=np.vstack((np.array(centroids_relative[0]),\
                                      np.array(centroids_relative[1])))
#         centroids_relative=(centroids)/boundary_y_length#-bottom_of_dark_pixels
    else:
        centroids_relative=centroids#(centroids-bottom_of_dark_pixels)/boundary_length
    centroids_relative=centroids_relative.flatten()

#     print('centroids_relative flatten',centroids_relative)
#     print('return get_centroids_relative centroids',type(centroids),centroids.shape)
    return centroids_relative

In [16]:
def create_CNN_model(num_x_pixels,num_y_pixels):
    model = tf.keras.Sequential()
    model.add(layers.Conv2D(32,(3,3), activation='relu',\
                            input_shape=(num_x_pixels,num_y_pixels,1)))
    model.add(layers.MaxPooling2D((2,2)))
    model.add(layers.Conv2D(64,(3,3), activation='relu'))
    model.add(layers.MaxPooling2D((2,2)))
    model.add(layers.Conv2D(64,(3,3), activation='relu'))

    model.add(layers.Flatten())
    model.add(layers.Dense(64,activation='relu'))
    model.add(layers.Dense(Config.NUM_DIGITS,activation='softmax'))
    return model

In [17]:
def train_eval_CNN_model(model,train_images,train_labels,\
                         test_images,test_labels,num_extra_rows):
    train_images=train_images.reshape((len(train_images),\
                                       Config.NUM_X_PIXELS+num_extra_rows,\
                                       Config.NUM_Y_PIXELS,1))#60000
    train_images=train_images.astype('float32')/255

    test_images=test_images.reshape((len(test_images),\
                                     Config.NUM_X_PIXELS+num_extra_rows,\
                                     Config.NUM_Y_PIXELS,1))#10000
    test_images=test_images.astype('float32')/255

    train_labels=to_categorical(train_labels)
    test_labels=to_categorical(test_labels)

    model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])

    debug(0,'Start of fit')
    model.fit(train_images,train_labels,epochs=5,batch_size=64,verbose=1)
    debug(0,'End of fit')

    scores = model.evaluate(test_images,test_labels,verbose=0)
    return scores

In [18]:
def CNN_report(X,y,Xtest,ytest):
    X=np.vstack((X,Xtest))
    Y=np.hstack((y,ytest))

    kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
    cvscores = []
    
    X_num_images=X.shape[0]
    X_image_num_pixels=X.shape[1]
    num_extra_rows=\
        np.ceil((X_image_num_pixels-Config.NUM_PIXELS_PER_IMAGE)\
                /Config.NUM_X_PIXELS).astype(int)
    if num_extra_rows>0:
        fill_up_size=(X_image_num_pixels-Config.NUM_PIXELS_PER_IMAGE)\
                    %Config.NUM_X_PIXELS
        fill_up_size=Config.NUM_X_PIXELS-fill_up_size
        fill_up_last_row=np.zeros((X_num_images,fill_up_size))
        print('fill_up_last_row.shape',fill_up_last_row.shape)
        X=np.hstack((X,fill_up_last_row))
    print('X_num_images',X_num_images,'X_image_num_pixels',X_image_num_pixels)
    print('num_extra_rows',num_extra_rows)
    
    for train, test in kfold.split(X, Y):
        model=None
        model=create_CNN_model(Config.NUM_X_PIXELS+num_extra_rows,Config.NUM_Y_PIXELS)
        scores=train_eval_CNN_model(model,X[train],Y[train],\
                                    X[test], Y[test],num_extra_rows)
        print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
        cvscores.append(scores[1] * 100)
    print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))
    debug(0,'end of verbatim_from_book_CNN')
    return cvscores

In [19]:
def kNN_report(centroids_array,y,test_centroids_array,ytest):
    kNN=KNeighborsClassifier(n_neighbors=Config.N_NEIGHBORS)
    kNN.fit(centroids_array,y)
    predicted_labels=kNN.predict(test_centroids_array)
    print('predicted_labels',predicted_labels)

    ndx_errs=np.where(predicted_labels!=ytest)
    print('ndx_errs',ndx_errs)
    print('num correct is ',len(ytest) - len(ndx_errs[0]),\
          ' an accuracy of ',1 - len(ndx_errs[0])/len(ytest))
    return

In [20]:
def run_tests():
    debug(0,('start'))
    (X, y),(Xtest,ytest)=mnist.load_data()

    X=X[0:Config.NUM_KERAS_TRAIN_IMAGES]
    y=y[0:Config.NUM_KERAS_TRAIN_LABELS]

    Xtest=Xtest[0:Config.NUM_KERAS_TEST_IMAGES]
    ytest=ytest[0:Config.NUM_KERAS_TEST_LABELS]
    
    print('ytest',ytest)
    
    # baseline results
    if Config.JUST_DO_BASELINE:
        X=np.reshape(X,(len(X),Config.NUM_PIXELS_PER_IMAGE))
        Xtest=np.reshape(Xtest,(len(Xtest),Config.NUM_PIXELS_PER_IMAGE))
        CNN_report(X,y,Xtest,ytest)
        kNN_report(X,y,Xtest,ytest)
        return

    if Config.USE_PARENT_CENTROIDS:
        centroids_array=np.ndarray((0,2*(4**(TREE_HEIGHT+1)-1)//3))#682))#170))#42))#10))
    else:
        centroids_array=np.ndarray((0,2*4**TREE_HEIGHT))#682))#170))#42))#10))
    print(type(centroids_array),centroids_array.shape)
    if Config.APPEND_IMAGE:
        centroids_array=np.hstack((centroids_array,\
                                  np.ndarray((0,Config.NUM_PIXELS_PER_IMAGE))))
        print(type(centroids_array),centroids_array.shape)
    for ndx in range(len(X)):
        if ndx%2000==0:
            print(30*'*',ndx,30*'*')
        centroids_relative=get_centroids_relative(X[ndx])
        
        if Config.APPEND_IMAGE:
#             print('main():centroids_array.shape',centroids_array.shape)
#             print('main():X[ndx].shape',X[ndx].shape)
#             print('main():centroids_relative.shape',centroids_relative.shape)
            centroids_relative=np.hstack((centroids_relative,\
                                       np.reshape(X[ndx],(Config.NUM_PIXELS_PER_IMAGE))))
            centroids_array=np.vstack((centroids_array,centroids_relative))
#             np.reshape(centroids_array,((1,len(centroids_array))))
        else:
            centroids_array=np.vstack((centroids_array,centroids_relative))
            
        if ndx<0:
            print('main():centroids_array',centroids_array)
            print('y[ndx]',y[ndx])
            print('one_image',X[ndx])

    centroids_array=np.nan_to_num(centroids_array)

    if Config.USE_PARENT_CENTROIDS:
        test_centroids_array=np.ndarray((0,2*(4**(TREE_HEIGHT+1)-1)//3))#682))#170))#42))#10))
    else:
        test_centroids_array=np.ndarray((0,2*4**TREE_HEIGHT))
    if Config.APPEND_IMAGE:
        test_centroids_array=np.hstack((test_centroids_array,\
                                  np.ndarray((0,Config.NUM_PIXELS_PER_IMAGE))))
        print(type(test_centroids_array),test_centroids_array.shape)
    for ndx in range(len(Xtest)):
        if ndx%2000==1999:
            print(30*'*',ndx,30*'*')
            test_centroids_array=np.nan_to_num(test_centroids_array)
            
            CNN_report(centroids_array,y,test_centroids_array,ytest[:ndx])
            kNN_report(centroids_array,y,test_centroids_array,ytest[:ndx])

        test_centroids_relative=get_centroids_relative(Xtest[ndx])
        if Config.APPEND_IMAGE:
#             print('main():centroids_array.shape',centroids_array.shape)
#             print('main():X[ndx].shape',X[ndx].shape)
#             print('main():centroids_relative.shape',centroids_relative.shape)
            test_centroids_relative=np.hstack((test_centroids_relative,\
                                       np.reshape(Xtest[ndx],(Config.NUM_PIXELS_PER_IMAGE))))
            test_centroids_array=np.vstack((test_centroids_array,test_centroids_relative))
#             np.reshape(centroids_array,((1,len(centroids_array))))
        else:
            test_centroids_array=np.vstack((test_centroids_array,test_centroids_relative))

        if ndx<0:
            print(type(test_centroids_array),test_centroids_array.shape)
            print('main():test_centroids_array',test_centroids_array)

    test_centroids_array=np.nan_to_num(test_centroids_array)

    CNN_report(centroids_array,y,test_centroids_array,ytest)
    kNN_report(centroids_array,y,test_centroids_array,ytest)
    debug(0,('end'))
    return


<a id='section2'></a>
<a href='#toc'>Goto Table of Contents</a>

# End of function defs; start of running tests


<a id='s2baselines'></a>
<a href='#toc'>Goto Table of Contents</a>

## kNN baseline (6,000 acc is 91.6%; 60,000 acc is 96.88%)

In [84]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-31 09:14:45.643227 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [35]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-30 09:19:16.465133 start
num correct is  9688  an accuracy of  0.9688


## CNN and kNN baseline (more recent code)  

## For CNN baseline (6,000 acc is 96.06% (+/- 1.42%); 60,000 acc is 99.00% (+/- 0.21%))
## For kNN baseline (6,000 acc is 91.6%; 60,000 acc is 96.88%)

In [152]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-02 10:29:22.461520 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [181]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-02 20:33:06.993948 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
X_num_images 70000 X_image_num_pixels 784
num_extra_rows 0
2020-06-02 20:33:08.390259 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 20:35:48.166404 End of fit
acc: 98.87%
2020-06-02 20:35:53.905455 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 20:38:32.533659 End of fit
acc: 99.26%
2020-06-02 20

<a id='s2improvements'></a>
<a href='#toc'>Goto Table of Contents</a>

# Improvements by adding centroid information to each image

## adding 2**TREE_HEIGHT weighted centroids improves 

## - kNN to 97.0% from kNN baseline of 96.88% for 60,000 and to 93.90% from 91.60% for 6,000

## Improved kNN (6,000 93.9%; 60,000 acc is 97.0%)
## For kNN baseline (6,000 acc is 91.6%; 60,000 acc is 96.88%)

In [99]:
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-31 11:08:01.346483 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 128)
<class 'numpy.ndarray'> (0, 912)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 3 5 6 7 8 9 0 1 2 3 5 6 7 8 9 9 7 0 9 0 1 5 8 8 0 9 3 2 7 8 4 6 1 0 4 9 4 2 0 5 0 1 6 9 3 2
 9 1 6 0 1 1 8 7 7 6 3 6 0 7 2 4 1 7 0 6 7 1 2 5 8 1 0 2 8 7 6 8 7 1 6 2 9 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 8 9 5 7 0 3 1 6 8 4 1 5 6 4 2 7 8 1 3 4 3 4 7 2 0 5 0 1 9 2 3
 2 3 5 5 7]
ndx_errs (array([ 151,  241,  247,  320,  321,  338,  376,  381,  444,  445,  448,  464,  495,  542,  582,  628,  659,  691,  707,  716,  740,  760,  839,  844,  877,  883,  924,  938,  939,  947,  951,  956,
        957, 1003, 1014, 1015, 1039, 1062, 1077, 1089, 1107, 1112, 1173, 1192, 1226, 1228, 1232, 1242, 1247, 1260, 1283, 1299, 1319, 1325, 1326, 1364, 1393, 1414, 14

In [98]:
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-31 11:03:46.507743 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

## Adding weighted, parent centroids for TREE_HEIGHT=3 increased acc:

## for 60,000: for CNN to 99.13% (+/- 0.09%) from baseline of ; increased acc for kNN to 96.97% from baseline of


## increased kNN to (6,000 96.97%;  %)
## from kNN baseline (6,000 91.6%; 60,000 96.88%)

## increased CNN to (6,000 ; 60,000 99.13% (+/- 0.09%))
## and CNN baseline (6,000 96.06% (+/- 1.42%); 60,000 acc is 99.00% (+/- 0.21%))


In [178]:
Config.USE_PARENT_CENTROIDS=True
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-02 13:27:17.891703 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 170)
<class 'numpy.ndarray'> (0, 954)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

Epoch 4/5
Epoch 5/5
2020-06-02 16:11:31.146086 End of fit
acc: 98.95%
2020-06-02 16:11:35.177780 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:14:03.590131 End of fit
acc: 98.95%
2020-06-02 16:14:07.544199 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:16:36.812338 End of fit
acc: 99.17%
2020-06-02 16:16:40.727313 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:19:11.053668 End of fit
acc: 99.06%
2020-06-02 16:19:15.181542 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:21:45.319868 End of fit
acc: 98.83%
2020-06-02 16:21:49.582200 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:24:19.748084 End of fit
acc: 99.00%
2020-06-02 16:24:23.999648 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:26:54.479336 End of fit
acc: 99.14%
2020-06-02 16:26:58.295362 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


****************************** 5999 ******************************
fill_up_last_row.shape (65999, 26)
X_num_images 65999 X_image_num_pixels 954
num_extra_rows 7
2020-06-02 16:38:38.298435 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:41:14.772799 End of fit
acc: 98.85%
2020-06-02 16:41:18.916787 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:44:03.223378 End of fit
acc: 99.12%
2020-06-02 16:44:08.044080 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:46:47.715090 End of fit
acc: 98.99%
2020-06-02 16:46:52.070873 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:49:31.906811 End of fit
acc: 99.17%
2020-06-02 16:49:36.666117 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:52:21.586248 End of fit
acc: 99.05%
2020-06-02 16:52:30.796444 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 16:55:12.995814 End of fit
acc: 99.08%

Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 17:30:05.990878 End of fit
acc: 99.09%
2020-06-02 17:30:12.687764 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 17:33:02.543107 End of fit
acc: 99.12%
2020-06-02 17:33:07.282576 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 17:35:53.434072 End of fit
acc: 99.13%
2020-06-02 17:35:58.113865 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 17:38:47.052798 End of fit
acc: 98.96%
2020-06-02 17:38:52.090528 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 17:41:38.696353 End of fit
acc: 98.98%
2020-06-02 17:41:43.401187 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 17:44:33.506395 End of fit
acc: 99.04%
99.00% (+/- 0.15%)
2020-06-02 17:44:37.620925 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 

fill_up_last_row.shape (70000, 26)
X_num_images 70000 X_image_num_pixels 954
num_extra_rows 7
2020-06-02 18:41:09.607828 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 18:44:06.301131 End of fit
acc: 99.10%
2020-06-02 18:44:12.226555 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 18:47:14.027849 End of fit
acc: 99.16%
2020-06-02 18:47:19.392418 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 18:50:21.610132 End of fit
acc: 99.11%
2020-06-02 18:50:27.439952 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 18:53:36.654994 End of fit
acc: 99.21%
2020-06-02 18:53:42.301260 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 18:56:52.414997 End of fit
acc: 98.91%
2020-06-02 18:57:00.260000 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 19:00:10.980063 End of fit
acc: 99.11%
2020-06-02 19:00:16.847021 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 

<a id='s2quadrants'></a>
<a href='#toc'>Goto Table of Contents</a>

## Try overlapping quadrants with different amounts of overlap; all with weighted centroids

## kNN acc overlap 1/8   (6,000 94.1%; 60,000 97.12%)
## kNN acc overlap 1/16 (6,000 94.2%; 60,000 97.09%)
## kNN acc overlap 1/32 (6,000 93.9%; 60,000 )

## from kNN baseline of (6,000 91.6%; 60,000 96.88%)

In [120]:
Config.OVERLAP_QUADS_RATIO=0.125
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-01 07:31:53.020183 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [125]:
Config.OVERLAP_QUADS_RATIO=0.0625
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-01 12:39:08.737056 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [126]:
Config.OVERLAP_QUADS_RATIO=0.03125
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-01 12:43:54.150585 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [127]:
Config.OVERLAP_QUADS_RATIO=0.0625
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-01 12:47:26.354540 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 128)
<class 'numpy.ndarray'> (0, 912)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

****************************** 9999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 7 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 2 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 6 7 0 6 8 6 3 9 9 8 8 7
 7 1 0 1 7 8 9 0 1 0 9 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 8 5 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 6 0 1 2 3 4 5 6 7 8 9 0
 1 2 3 4 5]
ndx_errs (array([  73,  115,  151,  241,  247,  320,  321,  338,  362,  376,  381,  444,  445,  448,  464,  478,  479,  495,  519,  582,  628,  659,  716,  740,  791,  810,  839,  924,  938,  939,  947,  951,
        956,  957, 1003, 1014, 1039, 1062, 1077, 1089, 1107, 1112, 1114, 1156, 1173, 1181, 1192, 1

In [121]:
Config.OVERLAP_QUADS_RATIO=0.125
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-01 07:34:34.450129 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 128)
<class 'numpy.ndarray'> (0, 912)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

****************************** 9999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 2 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 8 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 8 7
 7 1 0 1 7 8 9 0 1 0 9 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 8 8 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 6 0 1 2 3 4 5 6 7 8 9 0
 1 2 3 4 5]
ndx_errs (array([ 151,  241,  247,  320,  321,  338,  340,  362,  376,  381,  444,  445,  448,  464,  478,  479,  495,  519,  571,  578,  582,  628,  659,  689,  691,  716,  740,  791,  810,  839,  844,  938,
        939,  947,  951,  957, 1003, 1014, 1039, 1062, 1077, 1089, 1107, 1112, 1114, 1173, 1192, 1

<a id='s2noappendimage'></a>
<a href='#toc'>Goto Table of Contents</a>

## Try just the centroid info without the image.
## Do this with option parent centroids with overlap

- kNN acc (6,000        ; 60,000 89.91%)
- kNN acc (6,000        ; 60,000 89.62%)

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

## This shows that we need the image in addition to centroid info for the best results


In [123]:
Config.OVERLAP_QUADS_RATIO=0.125
Config.WEIGHTED_CENTROID=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-01 10:35:13.470001 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 128)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************

****************************** 5999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 7 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 9 4 6 3 5 5 6 0 4 1 9 5 7 8 9 2 7 4 6 4 3 0 7 0 2 7 1 7 3 2 8 7 7 6 2 7 8 4 7 7 6 1 3 6 4 3 1 9 1 4 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 4 1 8 2 0 2 ... 2 3 4 5 6 4 8 9 3 3 1 3 7 3 2 8 0 9 3 9 9 0 9 1 1 5 8 3 6 3 2 1 8 3 2 6 3 6 2 2 1 0 3 3 1 9
 2 1 9 6 0 4 6 1 9 3 8 9 8 9 6 5 8 3 3 7 1 6 1 0 2 6 2 3 4 2 3 4 4 6 0 0 2 0 1 2 3 4 3 6 7 2 9 0 1 2 3 4 8 6 7 8 9 0 1 2 8 4 5 6 7 8 7 6 6 5 0 6 0 9 9 1 9 3 8 0 4 3 9 1 4 0 5 3 2 1 3 4 0 7 6 0 1 7 0
 6 8 9 5 1]
ndx_errs (array([   9,   48,   63,   73,   78,   87,   92,   95,   97,  144,  151,  165,  184,  206,  207,  217,  232,  241,  247,  250,  259,  266,  282,  301,  307,  316,  318,  320,  338,  340,  359,  362,
        376,  386,  391,  405,  406,  436,  444,  445,  447,  448,  457,  464,  468,  478,  479,  

****************************** 9999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 7 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 9 4 6 3 5 5 6 0 4 1 9 5 7 8 9 2 7 4 6 4 3 0 7 0 2 7 1 7 3 2 8 7 7 6 2 7 8 4 7 7 6 1 3 6 4 3 1 9 1 4 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 4 1 8 2 0 2 ... 2 0 7 7 5 6 2 9 8 0 7 3 4 6 8 7 0 4 8 7 7 5 4 3 0 2 8 1 5 1 0 8 3 3 6 7 0 6 8 6 3 9 9 5 8 7
 7 1 0 1 7 8 9 0 1 0 9 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 9 1 4 8 8 4 4 7 0 1 9 2 8 7 5 2 6 0 6 5 3 8 3 9 1 2 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 3 2 4 4 4 3 4 4 1 7 2 6 6 0 1 2 3 4 5 6 7 8 9 0
 1 2 3 4 5]
ndx_errs (array([   9,   48,   63,   73,   78,   87,   92,   95,   97,  144,  151,  165,  184,  206,  207,  217,  232,  241,  247,  250,  259,  266,  282,  301,  307,  316,  318,  320,  338,  340,  359,  362,
        376,  386,  391,  405,  406,  436,  444,  445,  447,  448,  457,  464,  468,  478,  479,  

In [124]:
Config.USE_PARENT_CENTROIDS=True
Config.OVERLAP_QUADS_RATIO=0.125
Config.WEIGHTED_CENTROID=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-01 11:04:46.494003 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 170)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************

****************************** 5999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 7 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 9 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 7 1 7 3 2 8 7 7 6 2 7 8 4 7 3 6 1 3 6 4 3 1 9 1 4 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 9 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 4 1 8 2 0 2 ... 2 3 4 5 6 4 8 4 3 3 1 3 7 3 2 8 0 9 3 9 9 0 9 1 1 5 8 2 6 3 2 1 8 3 2 6 3 6 2 2 1 0 3 3 1 9
 2 1 9 6 0 4 6 1 9 3 8 7 8 9 6 5 8 3 3 7 1 6 1 0 2 6 2 3 4 2 3 4 4 6 0 0 2 0 1 2 2 4 3 6 7 0 9 0 1 2 3 4 8 6 7 8 9 0 1 2 8 4 5 6 7 8 9 8 6 5 0 6 0 9 9 1 9 3 3 0 4 3 9 1 4 0 5 3 2 1 3 4 0 7 6 0 1 7 0
 6 8 9 3 1]
ndx_errs (array([   9,   48,   73,   78,   92,   95,   97,  115,  121,  144,  151,  158,  165,  173,  184,  206,  207,  217,  232,  241,  247,  250,  259,  266,  282,  307,  316,  318,  320,  338,  340,  359,
        362,  369,  376,  386,  391,  405,  406,  436,  444,  445,  447,  448,  457,  464,  468,  

****************************** 9999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 7 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 9 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 7 1 7 3 2 8 7 7 6 2 7 8 4 7 3 6 1 3 6 4 3 1 9 1 4 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 9 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 4 1 8 2 0 2 ... 2 0 7 7 5 6 2 9 8 0 7 3 4 6 8 7 0 4 8 7 7 5 4 3 0 2 8 1 5 1 0 8 3 3 6 7 0 6 8 6 3 9 9 5 8 7
 7 1 0 1 7 8 9 0 1 0 9 0 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 9 1 9 8 8 4 4 7 0 1 9 2 8 7 5 2 6 0 6 5 3 8 5 9 1 2 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 3 2 4 4 4 3 4 4 1 7 2 6 6 0 1 2 3 4 5 6 7 8 9 0
 1 2 5 4 5]
ndx_errs (array([   9,   48,   73,   78,   92,   95,   97,  115,  121,  144,  151,  158,  165,  173,  184,  206,  207,  217,  232,  241,  247,  250,  259,  266,  282,  307,  316,  318,  320,  338,  340,  359,
        362,  369,  376,  386,  391,  405,  406,  436,  444,  445,  447,  448,  457,  464,  468,  

<a id='s2weightcentroids'></a>
<a href='#toc'>Goto Table of Contents</a>

## Try weighted centroids (no overlap) with image

- kNN acc (6,000      ; 60,000 93.6%)

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

## This is no improvement

In [107]:
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-01 06:33:49.154334 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

<a id='s2treesizesweighted'></a>
<a href='#toc'>Goto Table of Contents</a>

## Try different sizes for the tree of weighted centroids


## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

### A tree size of 3 seems to be best


In [104]:
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=2

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-31 15:22:21.258761 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [101]:
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=4

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-31 15:05:36.526985 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

<a id='s2weightedparents2'></a>
<a href='#toc'>Goto Table of Contents</a>

## Compare weighted parent centroids to just the image

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))


In [100]:
Config.USE_PARENT_CENTROIDS=True
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.USE_PARENT_CENTROIDS=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-31 15:00:51.343737 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [70]:
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-30 12:36:15.479827 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 128)
<class 'numpy.ndarray'> (0, 912)
****************************** 0 ******************************


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************************
****************************** 10000 ******************************
****************************** 12000 ******************************
****************************** 14000 ******************************
****************************** 16000 ******************************
****************************** 18000 ******************************
****************************** 20000 ******************************
****************************** 22000 ******************************
****************************** 24000 ******************************
****************************** 26000 ******************************
****************************** 28000 ******************************
****************************** 30000 ***************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 9 1 1 0 7 5 9 9 1 9 5 9 2 5 0 4 1 0 8 9 0 8 9 8 9 4 2 5 7 9 8 9 8 0 9 9 6 8 9 9 5 9 8 0 1
 0 3 3 5 2 1 6 3 0 2 8 1 5 6 2 3 0 2 2 6 4 3 5 5 1 7 2 1 6 9 1 3 9 5 5 1 6 2 2 8 6 7 1 4 6 0 6 0 3 3 2 2 3 6 8 9 8 5 3 8 5 4 5 2 0 5 6 3 2 8 3 9 9 5 7 9 4 6 7 1 3 1 3 6 6 0 9 0 1 9 9 2 8 8 0 1 6 9 7
 5 3 4 7 4]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 2 3 4 5 6 7 8 9 8 7 1 3 7 5 2 8 0 7 5 9 9 0 9 1 1 5 8 8 6 3 2 1 8 3 2 6 5 6 9 0 1 0 3 3 1 9
 2 1 9 6 0 4 6 1 7 3 8 9 2 9 6 5 8 3 5 7 1 6 1 0 9 6 2 5 4 2 3 4 4 6 0 0 2 0 1 2 3 9 3 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 8 4 5 6 7 8 9 8 6 5 0 6 8 9 4 1 9 5 3 0 4 8 9 1 4 0 5 5 2 1 5 4 0 7 6 0 1 7 0
 6 8 9 3 1]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 3 5 6 7 8 9 0 1 2 3 5 6 7 8 9 9 7 0 9 0 1 5 8 8 0 9 3 2 7 8 4 6 1 0 4 9 4 2 0 5 0 1 6 9 3 2
 9 1 6 0 1 1 8 7 7 6 3 6 0 7 2 4 1 7 0 6 7 1 2 5 8 1 8 2 8 7 6 8 7 1 6 2 9 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 8 9 5 7 0 3 1 6 8 4 1 5 6 4 2 7 8 1 3 4 3 4 7 2 0 5 0 1 9 2 3
 2 3 5 5 7]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 2 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 6 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7
 7 1 0 1 7 8 9 0 1 0 9 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 5 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0
 1 2 3 4 5]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 6 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 0 9 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 5 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1192, 1202, 1226, 1242, 1247, 1260, 1289, 1299, 1319, 1325, 1326, 13

<a id='s2relativebdy'></a>
<a href='#toc'>Goto Table of Contents</a>

## Try using relation of image to boundary

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))


In [73]:
Config.APPEND_IMAGE=True
Config.USE_PARENT_CENTROIDS=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3
USE_REL_BDY=True

run_tests()

Config.APPEND_IMAGE=False
Config.USE_PARENT_CENTROIDS=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-30 15:53:35.022865 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 170)
<class 'numpy.ndarray'> (0, 954)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 9 1 1 0 7 5 9 9 1 9 5 9 2 5 0 4 1 0 8 9 0 8 9 8 9 4 2 5 7 9 8 9 8 0 9 9 6 8 9 9 5 9 8 0 1
 0 3 3 5 2 1 6 3 0 2 8 1 5 6 2 3 0 2 2 6 4 3 5 5 1 7 2 1 6 9 1 3 9 5 5 1 6 2 2 8 6 7 1 4 6 0 6 0 3 3 2 2 3 6 8 9 8 5 3 8 5 4 5 2 0 5 6 3 2 8 3 9 9 5 7 9 4 6 7 1 3 1 3 6 6 0 9 0 1 9 9 2 8 8 0 1 6 9 7
 5 3 4 7 4]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 2 3 4 5 6 7 8 9 8 7 1 3 7 5 2 8 0 7 5 9 9 0 9 1 1 5 8 8 6 3 2 1 8 3 2 6 5 6 9 0 1 0 3 3 1 9
 2 1 9 6 0 4 6 1 7 3 8 9 2 9 6 5 8 3 5 7 1 6 1 0 9 6 2 5 4 2 3 4 4 6 0 0 2 0 1 2 3 9 3 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 8 4 5 6 7 8 9 8 6 5 0 6 8 9 4 1 9 5 3 0 4 8 9 1 4 0 5 5 2 1 5 4 0 7 6 0 1 7 0
 6 8 9 3 1]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 3 5 6 7 8 9 0 1 2 3 5 6 7 8 9 9 7 0 9 0 1 5 8 8 0 9 3 2 7 8 4 6 1 0 4 9 4 2 0 5 0 1 6 9 3 2
 9 1 6 0 1 1 8 7 7 6 3 6 0 7 2 4 1 7 0 6 7 1 2 5 8 1 8 2 8 7 6 8 7 1 6 2 9 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 8 9 5 7 0 3 1 6 8 4 1 5 6 4 2 7 8 1 3 4 3 4 7 2 0 5 0 1 9 2 3
 2 3 5 5 7]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 2 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 6 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7
 7 1 0 1 7 8 9 0 1 0 9 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 5 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0
 1 2 3 4 5]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 0 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 9 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 6 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 0 9 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 5 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
ndx_errs (array([  33,  115,  195,  241,  247,  300,  318,  320,  321,  341,  358,  381,  444,  445,  464,  479,  495,  542,  551,  565,  582,  583,  628,  646,  659,  691,  707,  740,  791,  839,  844,  877,
        881,  924,  938,  939,  947,  951,  957, 1014, 1039, 1062, 1068, 1082, 1089, 1107, 1112, 1192, 1202, 1226, 1242, 1247, 1260, 1289, 1299, 1319, 1325, 1326, 13

In [74]:
Config.APPEND_IMAGE=True
Config.USE_PARENT_CENTROIDS=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=4
USE_REL_BDY=True

run_tests()

Config.APPEND_IMAGE=False
Config.USE_PARENT_CENTROIDS=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-30 20:59:52.515593 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 682)
<class 'numpy.ndarray'> (0, 1466)
****************************** 0 ******************************


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


KeyboardInterrupt: 

<a id='s2onlyleaves'></a>
<a href='#toc'>Goto Table of Contents</a>

## try without image (just the centroid info); Use only the leaves of the tree of centroids

## Try more kNN neighbors and a bigger tree

- kNN acc (6,000    ; 60,000 93.95%)
- kNN acc (6,000    ; 60,000 92.20%)

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

## Without the image, centroid info is not enough for best results

In [52]:
USE_REL_BDY=True
Config.N_NEIGHBORS=5
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=4

run_tests()

2020-05-28 19:25:11.832647 start
<class 'numpy.ndarray'> (0, 512)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************************
****************************** 10000 ******************************
****************************** 12000 ******************************
****************************** 14000 ******************************
****************************** 16000 ******************************
****************************** 18000 ******************************
****************************** 20000 ******************************
****************************** 22000 ******************************
****************************** 24000 ******************************
****************************** 26000 *********************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
num correct is  3664  an accuracy of  0.9162290572643161


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
num correct is  5541  an accuracy of  0.9236539423237207


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
num correct is  7474  an accuracy of  0.9343667958494812


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
num correct is  9394  an accuracy of  0.9394939493949395


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


num correct is  9395  an accuracy of  0.9395
2020-05-28 21:37:22.176638 end


In [71]:
USE_REL_BDY=True
Config.N_NEIGHBORS=5
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=4

run_tests()

2020-05-29 10:15:41.947293 start
<class 'numpy.ndarray'> (0, 512)
****************************** 0 ******************************


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************************
****************************** 10000 ******************************
****************************** 12000 ******************************
****************************** 14000 ******************************
****************************** 16000 ******************************
****************************** 18000 ******************************
****************************** 20000 ******************************
****************************** 22000 ******************************
****************************** 24000 ******************************
****************************** 26000 ******************************
****************************** 28000 ******************************
****************************** 30000 ***************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
num correct is  3585  an accuracy of  0.8964741185296324


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
num correct is  5426  an accuracy of  0.9044840806801133


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
num correct is  7336  an accuracy of  0.9171146393299162


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
num correct is  9219  an accuracy of  0.921992199219922


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


num correct is  9220  an accuracy of  0.922
2020-05-29 12:30:32.921778 end


<a id='s2onlybigtree'></a>
<a href='#toc'>Goto Table of Contents</a>

## try without image (just the centroid info) and more neighbors, larger tree

- kNN acc (6,000 83.9%; 60,000 ) Try weighted centroid without image
- kNN acc (6,000 ; 60,000 91.22%)
- kNN acc (6,000 ; 60,000 88.47%)

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

## This shows that we need the image for best results. The centroids by themselves is not enough.

In [102]:
Config.WEIGHTED_CENTROID=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-31 15:13:11.757654 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [72]:
USE_REL_BDY=True
Config.N_NEIGHBORS=5
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=5

run_tests()

2020-05-29 12:33:19.023909 start
<class 'numpy.ndarray'> (0, 2048)
****************************** 0 ******************************


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************************
****************************** 10000 ******************************
****************************** 12000 ******************************
****************************** 14000 ******************************
****************************** 16000 ******************************
****************************** 18000 ******************************
****************************** 20000 ******************************
****************************** 22000 ******************************
****************************** 24000 ******************************
****************************** 26000 ******************************
****************************** 28000 ******************************
****************************** 30000 ***************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
num correct is  3545  an accuracy of  0.8864716179044762


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
num correct is  5363  an accuracy of  0.893982330388398


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
num correct is  7246  an accuracy of  0.905863232904113


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
num correct is  9121  an accuracy of  0.9121912191219121


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


num correct is  9122  an accuracy of  0.9122
2020-05-29 21:22:15.245199 end


In [73]:
USE_REL_BDY=True
Config.N_NEIGHBORS=5
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

2020-05-29 21:22:15.480034 start
<class 'numpy.ndarray'> (0, 128)
****************************** 0 ******************************


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************************
****************************** 10000 ******************************
****************************** 12000 ******************************
****************************** 14000 ******************************
****************************** 16000 ******************************
****************************** 18000 ******************************
****************************** 20000 ******************************
****************************** 22000 ******************************
****************************** 24000 ******************************
****************************** 26000 ******************************
****************************** 28000 ******************************
****************************** 30000 ***************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
num correct is  3459  an accuracy of  0.86496624156039


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
num correct is  5233  an accuracy of  0.8723120520086681


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
num correct is  7040  an accuracy of  0.8801100137517189


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
num correct is  8846  an accuracy of  0.8846884688468847


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


num correct is  8847  an accuracy of  0.8847
2020-05-29 21:52:31.007097 end


<a id='s2norelativebdy'></a>
<a href='#toc'>Goto Table of Contents</a>
## Try not getting points relative to the bounding rectangle. 

## Try several configurations

- kNN acc (6,000 ; 60,000 94.17%)
- kNN acc (6,000 ; 60,000 90.85%)
- kNN acc (6,000 ; 60,000 91.15%)

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

## This shows that we must look at points relative to the bounding rectangle.

In [48]:
USE_REL_BDY=False
Config.N_NEIGHBORS=5
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=4

run_tests()

2020-05-28 14:05:28.133736 start
<class 'numpy.ndarray'> (0, 512)
****************************** 0 ******************************
main():centroids_array [[        nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan
          nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan
          nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan         nan
          nan         nan         nan         nan 15.         19.         16.         19.                 nan         nan 17.         20.         17.5        19.         19.         19.
  19.         18.         17.5        18.         17.66666667 16.66666667 19.  

****************************** 1000 ******************************
****************************** 2000 ******************************
****************************** 3000 ******************************
****************************** 4000 ******************************
****************************** 5000 ******************************
****************************** 6000 ******************************
****************************** 7000 ******************************
****************************** 8000 ******************************
****************************** 9000 ******************************
****************************** 10000 ******************************
****************************** 11000 ******************************
****************************** 12000 ******************************
****************************** 13000 ******************************
****************************** 14000 ******************************
****************************** 15000 ********************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 1999 ******************************
num correct is  1839  an accuracy of  0.919959979989995


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 2999 ******************************
num correct is  2754  an accuracy of  0.9183061020340113


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
num correct is  3679  an accuracy of  0.9199799949987497


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 4999 ******************************
num correct is  4598  an accuracy of  0.9197839567913583


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
num correct is  5552  an accuracy of  0.9254875812635439


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 6999 ******************************
num correct is  6515  an accuracy of  0.9308472638948422


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
num correct is  7488  an accuracy of  0.9361170146268284


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 8999 ******************************
num correct is  8461  an accuracy of  0.9402155795088343


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
num correct is  9416  an accuracy of  0.9416941694169417


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


num correct is  9417  an accuracy of  0.9417
2020-05-28 16:41:40.285282 end


In [27]:
USE_REL_BDY=False
Config.N_NEIGHBORS=5
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-30 06:00:58.422655 start
<class 'numpy.ndarray'> (0, 128)
****************************** 0 ******************************


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************************
****************************** 10000 ******************************
****************************** 12000 ******************************
****************************** 14000 ******************************
****************************** 16000 ******************************
****************************** 18000 ******************************
****************************** 20000 ******************************
****************************** 22000 ******************************
****************************** 24000 ******************************
****************************** 26000 ******************************
****************************** 28000 ******************************
****************************** 30000 ***************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
num correct is  3595  an accuracy of  0.8989747436859215


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
num correct is  5411  an accuracy of  0.9019836639439907


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
num correct is  7261  an accuracy of  0.9077384673084136


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
num correct is  9114  an accuracy of  0.9114911491149115


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


num correct is  9115  an accuracy of  0.9115
2020-05-30 06:32:46.927915 end


In [26]:
USE_REL_BDY=False
Config.USE_PARENT_CENTROIDS=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-05-30 05:20:55.465422 start
<class 'numpy.ndarray'> (0, 170)
****************************** 0 ******************************


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
****************************** 8000 ******************************
****************************** 10000 ******************************
****************************** 12000 ******************************
****************************** 14000 ******************************
****************************** 16000 ******************************
****************************** 18000 ******************************
****************************** 20000 ******************************
****************************** 22000 ******************************
****************************** 24000 ******************************
****************************** 26000 ******************************
****************************** 28000 ******************************
****************************** 30000 ***************

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 3999 ******************************
num correct is  3577  an accuracy of  0.8944736184046012


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 5999 ******************************
num correct is  5392  an accuracy of  0.8988164694115686


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 7999 ******************************
num correct is  7228  an accuracy of  0.9036129516189524


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


****************************** 9999 ******************************
num correct is  9084  an accuracy of  0.9084908490849085


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


num correct is  9085  an accuracy of  0.9085
2020-05-30 06:00:58.404735 end


<a id='s2weightcentroids'></a>
<a href='#toc'>Goto Table of Contents</a>

## append the image to the weighted centroids

- kNN acc (6,000 93.6%; 60,000 )
- CNN acc (6,000 94.39% (+/- 1.10%); 60,000 )

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

- kNN acc (6,000 91.6%; 60,000 )
- CNN acc (6,000 94.61% (+/- 1.72%); 60,000 )

## The results are not the best in any situation.

In [169]:
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-02 10:58:59.086326 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

acc: 95.40%
2020-06-02 11:03:20.839472 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-02 11:03:36.383244 End of fit
acc: 93.39%
94.39% (+/- 1.10%)
2020-06-02 11:03:37.021901 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 9 9 8 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 3 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 7 7 9 2 2 4 1 5 8 8 4 2 3 0 6 4 2 9 1 9 5 7 7 2 6 2 6 8 5 7 7 4 1 0 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 7 7 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 5 6 5 4 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 7 2 5 0 6 5 6 3 7 2 0 8 8 5

In [174]:
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-02 11:34:23.991176 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

<a id='s2onlyleaves2'></a>
<a href='#toc'>Goto Table of Contents</a>

## Try weighted centroids with parents

- kNN acc (6,000 93.4%; 60,000 )
- CNN acc (6,000 97.01% (+/- 0.80%); 60,000 )

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))


In [175]:
Config.USE_PARENT_CENTROIDS=True
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-02 11:42:05.852666 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

<a id='s2overlapparentweight'></a>
<a href='#toc'>Goto Table of Contents</a>


## ...and now overlap the weighted centroids with parents improved CNN and kNN for small set of images

- weighted overlap 1/16 
- CNN (6,000 96.69% (+/- 1.29%))
- kNN (6,000 94.2%)

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))


In [176]:
Config.OVERLAP_QUADS_RATIO=0.0625
Config.USE_PARENT_CENTROIDS=True
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.OVERLAP_QUADS_RATIO=1
Config.USE_PARENT_CENTROIDS=False
Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-02 12:04:11.394332 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

<a id='s2overlaponlyleaves'></a>
<a href='#toc'>Goto Table of Contents</a>

## ...and now do without the parents; CNN did worse than baseline for small set of images

- weighted overlap 1/16 
- CNN (6,000 95.26% (+/- 1.11%))
- kNN (6,000 94.2%)

## whereas the baseline is

## kNN acc (6,000 91.6% 60,000 96.88%)
## CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))


In [177]:
Config.OVERLAP_QUADS_RATIO=0.0625
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.OVERLAP_QUADS_RATIO=1
Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-02 12:10:47.621524 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

<a id='s2reprokNNimprove'></a>
<a href='#toc'>Goto Table of Contents</a>

## Reproduce the improvement for kNN with options weighted centroid but no overlap, nor parents, 

- kNN acc (6,000 93.6%; 60,000 96.98%)

## whereas the baseline is

- kNN acc (6,000 91.6% 60,000 96.88%)

## The error rate reduction for kNN is (2/8.4)=23.8% for 6,000 and (0.1/3.12)=3.2% for 60,000

- kNN error rate reduction (6,000 ; 60,000)

## Although kNN improved, CNN improved only for the small sample size (6,000)

- CNN acc (6,000 94.80% (+/- 1.28%); 60,000 98.29% (+/- 0.22%))

## whereas the baseline is
- CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

## In general, small sample size (6,000) always seems to benefit more than large (60,000)

In [188]:
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-05 09:09:19.155800 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 128)
<class 'numpy.ndarray'> (0, 912)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

2020-06-05 12:09:14.706813 End of fit
acc: 98.53%
2020-06-05 12:09:22.629495 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 12:12:17.377481 End of fit
acc: 98.44%
2020-06-05 12:12:25.036148 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 12:15:22.229774 End of fit
acc: 98.53%
2020-06-05 12:15:29.574965 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 12:18:30.391221 End of fit
acc: 98.30%
98.35% (+/- 0.19%)
2020-06-05 12:18:36.307681 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 9 1 1 5 7 5 9 9 1 9 5 9 2 5 0 4 1 0 8 9 0 8 9 8 9 4 2 5 7 9 8 9 8 0 9 9 6 8 9 9 5 9 8 0 1
 0 3 3 5 2 1 6 3 0 2 8 1 5 6 2 3 0 2 2 

2020-06-05 13:28:21.868709 End of fit
acc: 98.37%
2020-06-05 13:28:30.733680 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 13:31:48.619913 End of fit
acc: 98.41%
2020-06-05 13:31:56.619786 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 13:35:18.426565 End of fit
acc: 98.54%
2020-06-05 13:35:26.873571 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 13:38:50.000644 End of fit
acc: 98.66%
2020-06-05 13:38:58.354689 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 13:42:21.207627 End of fit
acc: 97.78%
2020-06-05 13:42:29.263458 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 13:45:45.307709 End of fit
acc: 98.65%
98.46% (+/- 0.26%)
2020-06-05 13:45:52.055119 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 14:56:19.703204 End of fit
acc: 97.72%
2020-06-05 14:56:28.268772 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 14:59:59.449074 End of fit
acc: 98.19%
2020-06-05 15:00:07.916907 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 15:03:41.954325 End of fit
acc: 98.36%
2020-06-05 15:03:50.198906 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 15:07:25.582635 End of fit
acc: 98.57%
2020-06-05 15:07:34.088005 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 15:11:09.093726 End of fit
acc: 98.31%
2020-06-05 15:11:17.333997 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 15:14:52.747739 End of fit
acc: 98.40%
2020-06-05 15:15:01.306249 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 15:18:32.377845 End of fit
acc: 98.49%
2020-06-05 15:18:41.280889 Start of fit
Epoch 1/5
Epoch 2/5


In [187]:
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-05 06:41:30.613845 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

<a id='section3'></a>
<a href='#toc'>Goto Table of Contents</a>

## Reproduce results before summary

## Repeating baseline give the exact same result for kNN but CNN fluctuates significantly

## The average of all baseline runs for 60,000 was 99.02%

- CNN acc (6,000                   ; 60,000 99.06% (+/- 0.13%))
- CNN acc (6,000                   ; 60,000 99.00% (+/- 0.20%))
- CNN acc (6,000                   ; 60,000 99.08% (+/- 0.12%))
- CNN acc (6,000                   ; 60,000 98.96% (+/- 0.16%))

## whereas the first baseline for the large set of images was

CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

## For small set of images, repeated baselines were

- CNN acc (6,000 96.61% (+/- 1.20%); 60,000 )
- CNN acc (6,000 96.20% (+/- 0.85%); 60,000 )

## whereas the first baseline for the small set of images was

- CNN acc (6,000 96.06% (+/- 1.42%); 60,000 )

## The average of all CNN baseline runs for 6,000 was 96.29%

In [22]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-06 07:28:59.377255 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
X_num_images 70000 X_image_num_pixels 784
num_extra_rows 0
2020-06-06 07:29:00.072729 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 07:30:55.653793 End of fit
acc: 99.03%
2020-06-06 07:30:57.007375 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 07:33:00.936946 End of fit
acc: 99.23%
2020-06-06 07

In [23]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-06 08:58:49.004025 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
X_num_images 70000 X_image_num_pixels 784
num_extra_rows 0
2020-06-06 08:58:49.703194 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 09:00:49.906982 End of fit
acc: 99.14%
2020-06-06 09:00:51.421512 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 09:03:01.949640 End of fit
acc: 98.46%
2020-06-06 09

In [26]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-06 09:32:29.252506 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
X_num_images 70000 X_image_num_pixels 784
num_extra_rows 0
2020-06-06 09:32:29.957462 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 09:34:30.626460 End of fit
acc: 99.17%
2020-06-06 09:34:32.339282 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 09:36:34.635808 End of fit
acc: 99.07%
2020-06-06 09

In [29]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-06 10:05:41.664250 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
X_num_images 70000 X_image_num_pixels 784
num_extra_rows 0
2020-06-06 10:05:42.396844 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 10:07:45.902762 End of fit
acc: 99.04%
2020-06-06 10:07:47.788836 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 10:09:55.012911 End of fit
acc: 98.96%
2020-06-06 10

In [30]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-06 14:42:44.645986 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

In [31]:
Config.JUST_DO_BASELINE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.JUST_DO_BASELINE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-06 14:45:37.909202 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

## Repeat the largest improvement for kNN small set of images

## overlapping quadrants by 1/8; all with weighted centroids

## The initial run was
- kNN acc overlap 1/8 (6,000 94.1%; 60,000 )

## The baseline average for small set of images using kNN is
- kNN acc overlap 1/8 (6,000 91.6%; 60,000 )

## Error reduction is (2.5/8.4)=29.76%

In [32]:
Config.OVERLAP_QUADS_RATIO=0.125
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-06 15:05:14.730571 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

## Repeat the largest improvement for kNN large 60,000 set of images

## overlapping quadrants by 1/8; all with weighted centroids

- kNN acc weighted overlap 1/8 (6,000 ; 60,000 97.12%)

## The initial run was
- kNN acc weighted overlap 1/8 (6,000 ; 60,000 97.12%)

## and the baseline is
- kNN acc weighted overlap 1/8 (6,000 ; 60,000 96.88%)

## an error reduction of (.24/3.12)=7.7%

In [33]:
Config.OVERLAP_QUADS_RATIO=0.125
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.APPEND_IMAGE=False
Config.WEIGHTED_CENTROID=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
Config.OVERLAP_QUADS_RATIO=1

2020-06-06 15:13:33.155523 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 128)
<class 'numpy.ndarray'> (0, 912)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

Epoch 4/5
Epoch 5/5
2020-06-06 18:07:55.318230 End of fit
acc: 98.39%
2020-06-06 18:07:58.847702 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 18:10:35.596281 End of fit
acc: 98.16%
2020-06-06 18:10:39.121534 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 18:13:19.443405 End of fit
acc: 98.26%
98.33% (+/- 0.19%)
2020-06-06 18:13:22.398833 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 9 1 1 0 7 5 9 9 1 9 5 9 2 5 0 4 1 0 8 9 0 8 9 8 9 4 2 5 7 9 8 9 8 0 9 9 6 8 9 9 5 9 8 5 1
 0 3 3 5 2 1 6 3 0 2 8 1 5 6 2 3 0 2 2 6 4 3 5 5 1 7 2 1 6 9 1 3 9 5 5 1 6 2 2 8 6 7 1 4 6 0 6 0 3 3 2 2 3 6 8 9 8 5 3 8 5 4 5 2 0 5 6 3 2 8 3 9 9 3 7 9 4 6 7 

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 18:41:46.451464 End of fit
acc: 98.26%
2020-06-06 18:41:50.435140 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 18:44:36.485191 End of fit
acc: 98.11%
2020-06-06 18:44:40.624973 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 18:47:22.388252 End of fit
acc: 98.50%
98.15% (+/- 0.45%)
2020-06-06 18:47:25.703702 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 2 3 4 5 6 7 8 9 8 7 1 3 7 5 2 8 0 7 3 9 9 0 9 1 1 5 8 8 6 3 2 1 8 3 2 6 5 6 7 6 1 0 5 3 1 9
 2 1 9 6 0 4 6 1 7 3 8 9 2 9 6 5 8 3 5 7 1 6 1 0 9 6 2 5 4 2 3 4 4 6 0 0 2 0 1 2 3 4 3 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 8 4 5 6 

Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 19:55:51.525240 End of fit
acc: 98.47%
2020-06-06 19:55:56.070733 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 19:58:52.002410 End of fit
acc: 98.30%
2020-06-06 19:58:56.513568 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 20:01:55.357887 End of fit
acc: 98.18%
2020-06-06 20:02:00.026473 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 20:04:59.822288 End of fit
acc: 98.24%
2020-06-06 20:05:04.400425 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-06 20:08:03.231169 End of fit
acc: 98.27%
98.35% (+/- 0.20%)
2020-06-06 20:08:07.158140 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5

<a id='s2bestCNNsmall'></a>
<a href='#toc'>Goto Table of Contents</a>

## Repeat the largest improvement for CNN small set of images

## using weighted centroids with parents

- CNN acc (6,000 97.26% (+/- 0.77%); 60,000 )
- CNN acc (6,000 96.43% (+/- 1.40%); 60,000 )

## and the original was
- CNN acc (6,000 97.01% (+/- 0.80%); 60,000 )

## for an average of 96.90%

## whereas the average of all CNN baseline runs for 6,000 was 96.29%
- CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))


## So the error reduction is (.90-.29)/3.71=16.4%

In [189]:
Config.USE_PARENT_CENTROIDS=True
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-05 16:01:02.915798 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 16:11:28.378141 End of fit
acc: 95.83%
97.26% (+/- 0.77%)
2020-06-05 16:11:33.916244 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 7 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 8 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 3 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 7 7 9 2 2 4 1 5 8 8 4 2 6 0 6 4 2 9 1 9 5 7 7 2 6 2 6 8 5 7 7 9 1 0 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 7 7 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 5 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 7 2 5 0 6 5 6 3 7 2 0 8 8 5 9 1 1 4 0 7 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 

In [38]:
Config.USE_PARENT_CENTROIDS=True
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=6_000
Config.NUM_KERAS_TEST_IMAGES=1_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-07 11:56:12.439398 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 9 9 5 5 1 5 6 0 3 4 4 6 5 4 6 5 4 5 1 4 4 7 2 3 2 7 1 8 1 8 1 8 5 0 8 9 2 5 0 1 1 1 0 9 0 3 1 6
 4 2 3 6 1 1 1 3 9 5 2 9 4 5 9 3 9 0 3 6 5 5 7 2 2 7 1 2 8 4 1 7 3 3 8 8 7 9 2 2 4 1 5 9 8 7 2 3 0 4 4 2 4 1 9 5 7 7 2 8 2 6 8 5 7 7 9 1 8 1 8 0 3 0 1 9 9 4 1 8 2 1 2 9 7 5 9 2 6 4 1 5 8 2 9 2 0 4 0
 0 2 8 4 7 1 2 4 0 2 7 4 3 3 0 0 3 1 9 6 5 2 5 9 2 9 3 0 4 2 0 7 1 1 2 1 5 3 3 9 7 8 6 5 6 1 3 8 1 0 5 1 3 1 5 5 6 1 8 5 1 7 9 4 6 2 2 5 0 6 5 6 3 7 2 0 8 8 5 4 1 1 4 0 3 3 7 6 1 6 2 1 9 2 8 6 1 9 5
 2 5 4 4 2 8 3 8 2 4 5 0 3 1 7 7 5 7 9 7 1 9 2 1 4 2 9 2 0 4 9 1 4 8 1 8 4 5 9 8 8 3 7 6 0 0 3 0 2 6 6 4 9 3 3 3 2 3 9 1 2 6 8 0 5 6 6 6 3 8 8 2 7 5 8 9 6 1 8 4 1 2 

<a id='s2bestCNNlarge'></a>
<a href='#toc'>Goto Table of Contents</a>

## Repeat the largest improvement for CNN large 60,000 set of images

## using weighted centroids with parents

## The previous run increased CNN to (6,000 ; 60,000 99.13% (+/- 0.09%))

## whereas the baseline is
- CNN acc (6,000 96.06% (+/- 1.42%); 60,000 99.00% (+/- 0.21%))

## This run's result is
99.06% (+/- 0.09%)

## so an average of 

## and an error reduction of


In [36]:
Config.USE_PARENT_CENTROIDS=True
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-07 04:40:37.819266 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 170)
<class 'numpy.ndarray'> (0, 954)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-07 07:57:36.000654 End of fit
acc: 99.00%
98.97% (+/- 0.09%)
2020-06-07 07:57:40.851410 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 9 1 1 0 7 5 9 9 1 9 5 9 2 5 0 4 1 0 8 9 0 8 9 8 9 4 2 5 7 9 8 9 8 0 9 9 6 8 9 9 5 9 8 6 1
 0 3 3 5 2 1 6 3 0 2 8 1 5 6 2 3 0 2 2 6 4 3 5 5 1 7 2 1 6 9 1 3 9 5 5 1 6 2 2 8 6 7 1 4 6 0 6 0 3 3 2 2 3 6 8 9 8 5 3 8 5 4 5 2 0 5 6 3 2 8 3 9 9 3 7 9 4 6 7 1 3 7 3 6 6 0 9 0 1 9 9 2 8 8 0 1 6 9 7
 5 3 4 7 4]
ndx_errs (array([ 151,  241,  247,  318,  320,  321,  362,  376,  381,  444,  445,  448,  464,  479,  495,  519,  550,  578,  582,  628,  659,  691,  716,  726,  740,  760,  839,  881,  883,  924,  926,  938,
        9

2020-06-07 09:21:27.514743 End of fit
acc: 99.10%
2020-06-07 09:21:34.690033 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-07 09:25:00.768881 End of fit
acc: 99.22%
99.07% (+/- 0.15%)
2020-06-07 09:25:06.531671 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 3 5 6 7 8 9 0 1 2 3 5 6 7 8 9 9 7 0 9 0 1 5 8 8 0 9 3 2 7 8 4 6 1 0 4 9 4 2 0 5 0 1 6 9 3 2
 9 1 6 0 1 1 8 7 7 6 3 6 0 7 2 4 1 7 0 6 7 1 2 5 8 1 1 2 8 7 6 8 7 1 6 2 9 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 8 9 5 7 0 3 1 6 8 4 1 5 6 4 2 7 8 1 3 4 3 4 7 2 0 5 0 1 9 2 3
 2 3 5 5 7]
ndx_errs (array([ 151,  241,  247,  318,  320,  321,  362,  376,  381,  444,  445,  448,  464,  479,  495,  

Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-07 10:57:47.621879 End of fit
acc: 99.21%
2020-06-07 10:57:55.376820 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-07 11:01:49.840852 End of fit
acc: 99.17%
2020-06-07 11:01:57.973011 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-07 11:05:50.234182 End of fit
acc: 99.06%
2020-06-07 11:05:58.350666 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-07 11:09:47.640762 End of fit
acc: 99.04%
99.06% (+/- 0.09%)
2020-06-07 11:09:54.116699 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 5 7 0 4 7 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 6 7 0 6 8 6 3 9 9 8 8 7 7
 1 0 1 7 

In [190]:
Config.USE_PARENT_CENTROIDS=True
Config.WEIGHTED_CENTROID=True
Config.APPEND_IMAGE=True
Config.NUM_KERAS_TRAIN_IMAGES=60_000
Config.NUM_KERAS_TEST_IMAGES=10_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES
TREE_HEIGHT=3

run_tests()

Config.USE_PARENT_CENTROIDS=False
Config.WEIGHTED_CENTROID=False
Config.APPEND_IMAGE=False
Config.NUM_KERAS_TRAIN_IMAGES=12_000
Config.NUM_KERAS_TEST_IMAGES=2_000
Config.NUM_KERAS_TEST_LABELS=Config.NUM_KERAS_TEST_IMAGES
Config.NUM_KERAS_TRAIN_LABELS=Config.NUM_KERAS_TRAIN_IMAGES

2020-06-05 16:11:43.715738 start
ytest [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 7 7 5 8 2 9 8 6 7 3 4 6 8 7 0 4 2 7 7 5 4 3 4 2 8 1 5 1 0 2 3 3 5 7 0 6 8 6 3 9 9 8 2 7 7
 1 0 1 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 7 8 9 7 8 6 4 1 9 3 8 4 4 7 0 1 9 2 8 7 8 2 6 0 6 5 3 3 3 9 1 4 0 6 1 0 0 6 2 1 1 7 7 8 4 6 0 7 0 3 6 8 7 1 5 2 4 9 4 3 6 4 1 7 2 6 5 0 1 2 3 4 5 6 7 8 9 0 1
 2 3 4 5 6]
<class 'numpy.ndarray'> (0, 170)
<class 'numpy.ndarray'> (0, 954)
****************************** 0 ******************************
****************************** 2000 ******************************
****************************** 4000 ******************************
****************************** 6000 ******************************
*********************

2020-06-05 19:39:22.009957 End of fit
acc: 98.92%
2020-06-05 19:39:32.609796 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 19:43:14.282697 End of fit
acc: 99.00%
98.98% (+/- 0.21%)
2020-06-05 19:43:23.177676 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 0 9 1 1 0 7 5 9 9 1 9 5 9 2 5 0 4 1 0 8 9 0 8 9 8 9 4 2 5 7 9 8 9 8 0 9 9 6 8 9 9 5 9 8 6 1
 0 3 3 5 2 1 6 3 0 2 8 1 5 6 2 3 0 2 2 6 4 3 5 5 1 7 2 1 6 9 1 3 9 5 5 1 6 2 2 8 6 7 1 4 6 0 6 0 3 3 2 2 3 6 8 9 8 5 3 8 5 4 5 2 0 5 6 3 2 8 3 9 9 3 7 9 4 6 7 1 3 7 3 6 6 0 9 0 1 9 9 2 8 8 0 1 6 9 7
 5 3 4 7 4]
ndx_errs (array([ 151,  241,  247,  318,  320,  321,  362,  376,  381,  444,  445,  448,  464,  479,  495,  

Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 21:09:08.691100 End of fit
acc: 99.06%
2020-06-05 21:09:20.016693 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 21:13:19.720766 End of fit
acc: 99.06%
2020-06-05 21:13:34.403289 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 21:17:34.913136 End of fit
acc: 99.19%
2020-06-05 21:17:46.185055 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 21:21:46.656612 End of fit
acc: 99.35%
99.13% (+/- 0.11%)
2020-06-05 21:21:56.530118 end of verbatim_from_book_CNN
predicted_labels [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6
 9 6 0 5 4 9 9 2 1 9 4 8 7 3 9 7 4 4 4 9 2 5 4 7 6 7 9 0 5 8 5 6 6 5 7 8 1 0 1 6 4 6 7 3 1 7 1 8 2 0 2 ... 3 5 6 7 8 9 0 1 2 3 5 6 7 8 9 9 7 0 9 0 1 5 8 8 0 9 3 2 7 8 4 6 1 0 4 9 4 2 0 5 0 1 6 9 3 2
 9 1 6 0 

Epoch 4/5
Epoch 5/5
2020-06-05 23:02:29.534832 End of fit
acc: 99.09%
2020-06-05 23:02:49.825120 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 23:07:16.938720 End of fit
acc: 98.99%
2020-06-05 23:07:30.891273 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 23:11:49.928078 End of fit
acc: 98.84%
2020-06-05 23:12:05.448755 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 23:16:30.362279 End of fit
acc: 99.14%
2020-06-05 23:16:44.858218 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 23:21:06.918349 End of fit
acc: 99.19%
2020-06-05 23:21:21.531036 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 23:25:48.609937 End of fit
acc: 98.94%
2020-06-05 23:26:05.236336 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
2020-06-05 23:30:36.897618 End of fit
acc: 99.03%
2020-06-05 23:30:51.890218 Start of fit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<a id='section4'></a>
<a href='#toc'>Goto Table of Contents</a>

## summary


### See the whitepaper in this repository for more description of the ideas.

### This notebook is experiments with a tree of centroids with an option to append them to the image.

### Experiments use one of two types of models for prediction: kNN and CNN. 

### kNN treats the 28x28 image as an array of 784 pixels and uses L2 distance metric. 

### CNN is lifted verbatim from the book "Deep Learning with Python" by Francois Chollet, 2018 (page 120-122) the inventor of Keras. The CNN is not SOTA but is useful to see whether adding information about centroids improve a neural net model.

### The training/testing set of images is either small (6,000/1,000) or large (60,000/10,000).

### The bottom line result is the error rate reduction.

## Results from experiments

| model\#images | <font size="5">6,000</font> | <font size="5">60,000</font> |
| --- | --- | --- |
| baseline kNN | <font size="5">91.6%</font> | <font size="5">96.88%</font> |
| improved kNN | <font size="5">94.1%</font> | <font size="5">97.12%</font> |

### and

| model\#images | <font size="5">6,000</font> | <font size="5">60,000</font> |
| --- | --- | --- |
| baseline CNN | <font size="5">96.29%</font> | <font size="5">99.02%</font> |
| improved CNN | <font size="5">96.90%</font> | <font size="5">99.10%</font> |

### From these results, the _*error rate reduction*_ is

| model\#images | <font size="5">6,000</font> | <font size="5">60,000</font> |
| --- | --- | --- |
| kNN | <font size="5">29.76%</font> | <font size="5">7.7%</font> | 
| CNN | <font size="5">16.4%</font> | <font size="5">9%</font> | 

### In the process of getting the above results the following observations were noteworthy and guided which combination of options to try for improvement.

### <a href='#s2weightcentroids'>No improvement for weighted centroids (no overlap) with image</a>

### <a href='#s2treesizesweighted'>A height of 3 for the tree of centroids seems best</a>

### <a href='#s2onlyleaves'>centroid info without image is not an improvement</a>
### <a href='#s2onlyleaves'>we need the image for best results. The centroids by themselves is not enough.

### <a href='#s2norelativebdy'>always get points relative to the bounding rectangle. </a>


### <a href='#s2reprokNNimprove'>kNN improvement for kNN with options weighted centroid but no overlap, nor parents,</a>

<a id='section5'></a>
<a href='#toc'>Goto Table of Contents</a>

## areas for further work

Trying overlapping concentric doughnuts before doing other areas described in the whitepaper.