This project uses WEKA (data mining software) to apply several machine learning algorithms onto a dataset containing indian liver patient data. The coursework brief is attached in this repository, as well as the final pdf file which presents the implementation of data cleaning (such as normalisation, discretisation, handling missing data and outliers) and machine learning algorithms such as Naive Bayes, Multilayer Perceptron, and Clustering.
This data set contains 416 liver patient records and 167 non liver patient records. The dataset was collected from northeast of Andhra Pradesh, India. Selector is a class label used to divide into groups (liver patient or not). This data set contains 441 male patient records and 142 female patient records. Any patient whose age exceeded 89 is listed as being of age "90".
Attribute Information:
- Age - Age of the patient
- Gender - Gender of the patient
- TB - Total Bilirubin
- DB - Direct Bilirubin
- Alkphos - Alkaline Phosphatase
- Sgpt - Alamine Aminotransferase
- Sgot - Aspartate Aminotransferase
- TP - Total Proteins
- ALB - Albumin
- A/G - Ratio Albumin and Globulin Ratio
- Class - Selector field used to split the data into two sets (labelled by the experts) liver disease and No liver disease