# Search Space Summary
The purpose of this notebook is to provide a summary of the search space. In this notebook, we describe the size of the search space as well as the number of unique individuals for each classifier. This will help understand what portion of the search space is meaningful because many algo-vectors are distinct but not meaningfully different. For example, even though K-Neighbors only uses one parameter that has a range 30 values, there will be much more than 30 algorithms_vectors that share the name "K Nearest Neighbors".

In [1]:
# Path hack so that we can import see library.
import sys, os
sys.path.insert(0, os.path.abspath('..'))

In [2]:
from see.classifiers import Classifier, ClassifierParams, AdaBoostContainer

Use search space for Dhahri 2019 (i.e. Wisconsin Breast Cancer Dataset)

In [3]:
Classifier.use_dhahri_space()
algorithm_space = Classifier.algorithmspace
print(list(algorithm_space.keys())) # Check algorithm space

['Ada Boost', 'Decision Tree', 'Extra Trees', 'Gaussian Naive Bayes', 'Gradient Boosting', 'Linear Discriminant Analysis', 'Logistic Regression', 'K Nearest Neighbors', 'Random Forest', 'SVC']


In [9]:
import pandas as pd

# Report search space size
search_space_size = 1

# Loop through the parameter space
for key in ClassifierParams.pkeys:
    search_space_size *= len(ClassifierParams.ranges[key])

# Includes individuals where changes do not matter
# Use string to force pandas to show value in scientific notation
df_space = pd.DataFrame(
    data=["{:e}".format(search_space_size)], columns=["Size"], index=["Search Space"]
)
df_space.style.set_caption("Includes redundant individuals")

Unnamed: 0,Size
Search Space,12486700000000.0


Calculate the number of unique individuals

In [5]:
# Report number of unique individuals for each classifier with actual relevant changes to parameters
unique_space = 0

unique_clf = {}

for clf_name in algorithm_space:
    unique_clf_count = 1
    container = algorithm_space[clf_name]()
    for key in container.paramindexes:
        unique_clf_count *= len(ClassifierParams.ranges[key])

    unique_clf[clf_name] = unique_clf_count

    unique_space += unique_clf_count

In [6]:
df_unique = pd.DataFrame(
    data=unique_clf.values(), index=unique_clf.keys(), columns=["Unique individuals"]
)
df_unique.append(df.sum().rename("Total"))

Unnamed: 0,Unique individuals
Ada Boost,140
Decision Tree,31
Extra Trees,620
Gaussian Naive Bayes,37
Gradient Boosting,140
Linear Discriminant Analysis,1
Logistic Regression,60
K Nearest Neighbors,30
Random Forest,620
SVC,360
