<a href="https://colab.research.google.com/github/lordbigot/UTS_ML2019_ID13191655/blob/master/A2B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Vector-based normal-finding decision tree**

**Algorithm Brief**

*Concept*

For the creation of a new algorithm, I considered the fundamentals of decision tree classifiers.

Decision Tree Classifiers consist of a series of internal nodes, which split the data into smaller and smaller sets. Typically, each decision node evaluates one attribute, in comparison to a target value. Everything below the value falls into a first partition, everything equal to or above the value falls into a second partition.

However, it is possible to use more than one attribute in a single step. Consider a linear node, weighing all of the attributes and comparing the result to a static value. Splitting the data set this way reduces the bias that comes from dependence on orthogonal lines to represent the data space. It also significantly increases the complexity of the algorithm, as well as the variance.

However, using linear splitting opens unique opportunities. Consider the following example:

![alt text](https://i.postimg.cc/GtZy5Hh1/justification.png)

In the above diagram, the depicted space is a form of underlying truth. The red space definitively belongs to one class, the yellow space belongs to a different class, and the orange space is disputed. An ideal solution would put lines cleanly through the middle of the orange space.

![alt text](https://i.postimg.cc/c1jwH575/justification-1.png)

If an orthagonal decision tree algorithm was used, the added line represents a plausible first step. Note that the presence of additional red on the left side of the diagram is likely to weight the line further towards the bottom of the space than an ideal solution.

![alt text](https://i.postimg.cc/DyXsbdBf/justification-2.png)

This second line completes a good solution. Any additional lines beyond this point would be overfitting the data.

![alt text](https://i.postimg.cc/q7jCdmMw/justification-3.png)

This diagram shows 2 possible first steps produced by a tree using linear nodes. This data used directly appears to produce worse results than the orthogonal solution. However, the use of lines enables additional calculation. These two lines can be considered an approximation of a curve, and their intersection a point of interest

![alt text](https://i.postimg.cc/GpcsV0r9/justification-4.png)

The line that has been added represents the tangent to the proposed approximation of a curve.

![alt text](https://i.postimg.cc/FHXSbdBy/justification-5.png)

The line that has been added represents the normal to the proposed approximation of a curve.

![alt text](https://i.postimg.cc/vZ7V0YJp/justification-6.png)

If the normal is used to partition the data, instead of either earlier linear node, this will split the data into regions that can be judged more fairly.

![alt text](https://i.postimg.cc/Tw152rK6/justification-7.png)

This possible result indicates how the use of the normal may limit the influence of unrelated curves upon the final tree, and may limit variance. Note that the depicted scenario has been deliberately chosen to favour orthagonal lines, and yet the linear normal method displays a significant advantage.

*Inputs*

This algorithm will be dependent on specific gradients, and so for best results, all data points should be linearly normalised into an n+1-dimensional array with attribute values (up to index n-1) between 0.0 and 1.0, and whatever attribute best represents the class in index n.

If, after the training set has been used to find values, the testing set includes results that fall outside the boundaries, the classifier should be able to handle these inputs normally.

*Outputs*

After training data is input, the output should be in the form of a decision tree. An efficient method of storing this in Python is a list with a length of n+2. The first n values represent multipliers for each attribute of a new data point. If the sum of each attribute, multiplied by its multiplier, is less than 1.0, then it should be sent to the nested tree at index n. If this sum is equal to or greater than 1.0, it should instead be sent to the nested tree at index n+1. This procedure is continued until in the place where a tree would normally be, an object is present, representing the decided class.

The output represents the decided class. It is a single value of whatever type the class attributes provided in the training data were. This should not be a list.

*Intermediate Data Structure*

The classifier is best resolved as a recursive function, which must accept its current set as a parameter.

The classifier needs to store vectors, and can do so in the format of an n-dimensional array.

The classifier must be capable of identifying the proportions of all labels present in the set it has been passed. This information can be stored as 2 parallel 1-dimensional arrays.

The classifier must compare gradients. The metric I am using to compare them is derived from entropy, but I have labelled it "certainty" to match it's new meaning. a certainty of 1.0 is the best-case scenario, and anything less indicates that the attached gradient cannot divide the data completely in half.

The calculation of the "normal" in n-dimensions requires an abstract representation of the circle that the two gradients lie between. My solution uses the following formulae as its basis:

x = asin(α)

y = asin(α+θ)

where: x represents an attribute of the first vector; y represents the same attribute of the second vector; a represents the amplitude of the sine wave, the distance between the maximum attribute and 0; α represents the angle on the sine wave at which the first vector is situated, and θ represents the angle between the first and second vector. It's possible that α will be 180° out of phase, and a will be negative, but use of Python 3's atan2() function effectively negate any mistake due to this.

**Implementation**