### Decision Tree testing results
- The test was made to detect the score of 5 out of other scores, to create a balanced dataset.
- When tested without text tone features (text tone not included):
    - The accuracy of the model is 73%
    - The interpretation of the tree is not clear
- When tested with text tone features (text tone included) as percentages:
    - The accuracy of the model is 73%
    - The tree rules is easy to interpret
- When tested with text tone features (text tone included) as counts:
    - The accuracy of the model is 82%
    - The tree rules is easy to interpret but not like when the features were percentages
- This model could assist in detecting the top performing members from other members.
- Clear cut point for total workflow time

## Install required packages

This script was based on the article below:
https://medium.com/@knoldus/how-to-find-correlation-value-of-categorical-variables-23de7e7a9e26

I should add a reference for this

"It calculates the correlation/strength-of-association of features in the data-set with both categorical and continuous features using: Pearson’s R for continuous-continuous cases, Correlation Ratio for categorical-continuous cases, Cramer’s V or Theil’s U for categorical-categorical cases."

In [None]:
# !pip install psycopg2-binary
!pip --version

!pip install -r requirements.txt

## Do general imports

In [None]:
from classifiers.testing import cycle_test,TestType,TestInputs,DatasetFeatures
import pandas as pd


### Decision Tree Classifier

In [None]:
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt

trees = []

def fit_and_test(inputs: TestInputs):
    clf = DecisionTreeClassifier(random_state=42,criterion="entropy",ccp_alpha=0.01)
    clf.fit(inputs.x_train, inputs.y_train)   
    # most_features_frame = pd.DataFrame(
    #     data=clf.feature_importances_,
    #     columns=["importance"],
    #     index=inputs.x_train.columns,
    # ).sort_values(by=["importance"], ascending=False)
    # most_features_frame = most_features_frame[most_features_frame['importance'] > 0]
    # print(most_features_frame)

    features = inputs.x_train.columns
    class_labels = [str(x) for x in inputs.y_test.iloc[:,0].drop_duplicates().sort_values()]
    trees.append((clf, features,class_labels))
    predicted = clf.predict(inputs.x_test)
    return predicted

def plot_tree(the_tree,size,fontsize=12):
    fig = plt.figure(figsize=size)
    ax = fig.add_subplot(111)
    tree.plot_tree(the_tree[0],feature_names=the_tree[1],ax=ax,fontsize=fontsize,class_names=the_tree[2],filled=True)
    # print(tree.export_text(clf,feature_names=[c for c in x_train.columns],show_weights=True))

In [None]:
trees = []
cycle_test('Decision Tree Classifier',fit_and_test,test_type=TestType.FIVE_VS_ALL,dataset_types=DatasetFeatures.WITHOUT_TEXT_TONE,add_dummies=True,drop_categories=True)
plot_tree(trees[0],size=(40,25),fontsize=12)

In [None]:
trees = []
cycle_test('Decision Tree Classifier',fit_and_test,test_type=TestType.FIVE_VS_ALL,dataset_types=DatasetFeatures.WITH_TEXT_TONE_AS_COUNTS,add_dummies=True,drop_categories=True)
plot_tree(trees[0],size=(40,25),fontsize=14)

In [None]:
trees = []
cycle_test('Decision Tree Classifier',fit_and_test,test_type=TestType.FIVE_VS_ALL,dataset_types=DatasetFeatures.WITH_TEXT_TONE_AS_PERCENTAGES,add_dummies=True,drop_categories=True)
plot_tree(trees[0],size=(40,25),fontsize=13)

In [None]:
trees = []
cycle_test('Decision Tree Classifier',fit_and_test,test_type=TestType.THREE_LEVELS,dataset_types=DatasetFeatures.WITHOUT_TEXT_TONE,add_dummies=True,drop_categories=True)
plot_tree(trees[0],size=(40,25),fontsize=11)

In [None]:
trees = []
cycle_test('Decision Tree Classifier',fit_and_test,test_type=TestType.THREE_LEVELS,dataset_types=DatasetFeatures.WITH_TEXT_TONE_AS_COUNTS,add_dummies=True,drop_categories=True)
plot_tree(trees[0],size=(40,25),fontsize=12)

In [None]:
trees = []
cycle_test('Decision Tree Classifier',fit_and_test,test_type=TestType.THREE_LEVELS,dataset_types=DatasetFeatures.WITH_TEXT_TONE_AS_PERCENTAGES,add_dummies=True,drop_categories=True)
plot_tree(trees[0],size=(40,25),fontsize=12)