Results:
Looking in the evaluation tool results it is fair to say that CART decision tree is better for number of reasons: accuracy , area under the curve(AUC) and F1 Score, along with sensitivity scores are higher. According to these scores CART DT is better in predicting the “defaulted” class, with lower probabil-ity of missqualification. Especially F1 Score, which means test accuracy is considerably higher than c5.0 algorithm (0.8242812 opposed to 0.7957746). To improve the models I have manually forced the model to split according to Gini index, but it did not make any difference to the evaluation measurements. However, pruning the tree to avoid over-fitting the data, by following cross validated error achieved the best results.
- MySQL/R Queries/Visualization
1.1. Import and inspect
1.2. Query R using RMySQL/DBI
1.3. DPLYR/DBPLYR methodology with pipe operations
1.4. Visualisation - OLAP Operations in R
2.1. Create data
2.2. Generate sale table
2.3. Revenue cube / Multi-dimentional cube
2.4. OLAP Operations - Decision Support Systems in BI
3.1. Data import from MySQL to R Studio
3.2. Explore and prepare the data
• Frequency, proportion and average
• Visualisation
• Shuffle and re-order the provided data, then split into training and testing sets
3.3. C5.0 algorithm decision tree model
3.4. Decision tree model based on CART
3.5. Improvement of current models