This repository is made to publish the report that lead to the development of this Breast Cancer Classifier. The log file EDA_log_NaomiHindriks.pdf reports on the steps taken to explore and filter the data and test different learning algorithms. The subsequent report Verslag_Naomi_Hindriks.pdf. These files can be rendered with the Rmarkdown files EDA_log_NaomiHindriks.rmd and Verslag_Naomi_Hindriks.Rmd respectively. The data folder contains all the data that is necessary to render the Rmarkdown files, below will follow a discription of these files. This repository is made to publish the report that lead to the development of this Breast Cancer Classifier. The log file EDA_log_NaomiHindriks.pdf reports on the steps taken to explore and filter the data and test different learning algorithms. The subsequent report Verslag_Naomi_Hindriks.pdf. These files can be rendered with the Rmarkdown files EDA_log_NaomiHindriks.rmd and Verslag_Naomi_Hindriks.Rmd respectively. The data folder contains all the data that is necessary to render the Rmarkdown files, below will follow a discription of these files. The ieee.csl and references.bib files are used in the report to cite the resources that were used. The report is published under the GNU GENERAL PUBLIC LICENSE. ROC-curve.png is an image that is used in EDA_log_NaomiHindriks.pdf, this is where is was downloaded from.
Data folder:
- breast-cancer-wisconsin.data
- This is the file that contains the original data downloaded from the UCI machine learning repository
- breast-cancer-wisconsin.names
- This is the file that contains information about the original data, explaining the different attributes. Also downloaded from the UCI machine learning repository
- attribute_info.csv
- CSV file with information derived from breast-cancer-wisconsin.names to be used to couple the correct data to the correct attribute name.
- filtered_data.arff
- The breast-cancer-wisconsin data after the filtering steps as described in the EDA_log_NaomiHindriks.pdf is saved in this file.
- Classification_options_experiment.exp
- The options of the first experiment that can be loaded into weka as explained in the EDA_log_NaomiHindriks.pdf.
- ROC.arff
- File that holds information to make ROC curve for final algorithm exported from weka while running cross validation on final algorithm.
- weka_experiment_learing_curve.arff
- File that holds information to make learning curves for final algorithm exported from weka while running cross validation on final algorithm.
- All the other ARFF files are the experiments that were run in Weka and loaded into EDA_log_NaomiHindriks.pdf to evaluate the performance of the different algorithms:
- weka_experiment.arff
- weka_experiment_IBK_KNN.arff
- weka_experiment_IBK_attributeSelect.arff
- weka_experiment_IBK_boosting.arff
- weka_experiment_IBK_crossValidate.arff
- weka_experiment_IBK_distanceMetric.arff
- weka_experiment_IBK_distanceWeight.arff
- weka_experiment_naive_bayes_attributeSelect.arff
- weka_experiment_naive_bayes_boosting.arff
- weka_experiment_naive_bayes_useSupervisedDiscretization.arff
- weka_experiment_voting.arff
- weka_experiment_voting_cost.arff
Naomi Hindriks