Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.
- Split the data set.
- Train the model without PCA.
- Train the model with PCA.
- Evaluate the results of the two models.
We will use the Decision Tree model and calculate how accurate it is without using PCA.
Accuracy without PCA is 0.9 or 90%.
We will use PCA and calculate the variance of each attribute. The results of the variance of each attribute are as follows.
The result is 1 attribute has a variance of 0.931, which means that the attribute stores high information and is much more significant than other attributes. Looking at the previous variances, we can take the best 2 principal components because the total variance when added up is 0.977 which is quite high.The results of the accuracy test after using PCA are as follows.
In the experiment above, we can see that with only 2 main components or 2 attributes, the model still has a fairly high accuracy, which is 80%. With principal components, you can reduce less significant attributes in predictions and speed up machine learning model training time.