Skip to content

Latest commit

 

History

History
37 lines (19 loc) · 2.62 KB

README.md

File metadata and controls

37 lines (19 loc) · 2.62 KB

Repo for Verbal autopsy paper using OAA-NBC algorithm

Journal Publication URL: https://gatesopenresearch.org/articles/2-63/v2

DOI

To execute the experiments follow these steps (note all the experiments have been successfully tested on a Linux machine not on Windows):

  1. Get VA datasets and convert them into CSV format with symptoms being columns, causes of death being the last column and each row being the record. Download Weka, Machine Learning software, from: https://www.cs.waikato.ac.nz/ml/weka/. Open CSV file in Weka and save it as an ARFF (Weka's format). Make sure all the attribute types are numeric, except the class attribute. Open ARFF file in a text editor, rename your class attribute as "Cause" and odd "others" as one of the class values; e.g., @attribute Cause {1,3,4,5,6,8,9,10,11,13,14,16,17,others}. The "others" value is used by OAANBC for one-against-all approach.

    1.1 All the datasets used in the paper are present in Arff format in the dataset folder, except MDS dataset. For an example, we have also provided different variations of Matlab dataset as used in the experiments. There are actually two variations of 10 splits of the Matlab dataset using 10 fold cross validation concept: one for dirichlet distribution and one based on the original distribution. These different variations can also be generated using the code provided here for any of the data file in .arff format.

  2. If you have a VA dataset in arff format ready, you can generate 10-folds cross validation splits (training and test set pair for each fold) for the given dataset. Code examples are present in the file: generatefolds.sh

  3. Execute R code on the generated data from step 2 (i.e., on 10 different folders, see R code for the details.) USe R to execute file: openva_execute.r

  4. Perform measurements for sensitivity, specificity, PCCC and CSMF accuracy from the data generated by R in Step 3. Example code is present in the file: measures.sh

  5. Build OAA-NBC models on the data of Step 2 and get all the measures of sensitivity, specificity, PCCC and CSMF accuracy. Example code is present in the file: ooanbc.sh

Directories in the repository

oaanbcProject: All the source code of OAA-NBC in Java along with the code for generation of data for n-folds and the code for measurements of output or R code.

R-code: R code using OpenVA package and weka (input) files to execute OAA-NBC, InterVA-4, Tariff, InSilicoVA and NBC

lib: This folder contains the compiled jar file of the OAA-NBC source code. This file is used in above scripts.