# Classifier performances for deriving replicates
For each disease, we derive replicates of the mapping of RCTs across diseases after simulating what would have been the mapping of RCTs within regions if the misclassification of RCTs towards groups of diseases was corrected, given the sensitivities and specificities of the classifier to identify each group of disease.

To estimate the performances of the classifier for each group of diseases, we dispose a test set with 2,763 trials manually classified towards the 27-class grouping of diseases used in this work. The test set is described at Atal et al. BMC Bioinformatics 2016.

This script is for calculating sensitivity and specificity of the classifier to identify the disease and other studies relevant to the burden of diseases, and the number of success and number of trials to derive beta distributions

## 1. Sensitivities and specificities based on test set

In [2]:
#test set, not included in the repo
test_set <- read.table("/media/igna/Elements/HotelDieu/Cochrane/MappingRCTs_vs_Burden/test_set_classified_to28cats.txt")
dim(test_set)

In [3]:
#We supress injuries from trials concerning the burden of diseases (category nro 28)
test_set$GBDnp <- sapply(strsplit(as.character(test_set$GBDnp),"&&"),function(x){paste(x[x!="28"],collapse="&")})
test_set$GBD28 <- sapply(strsplit(as.character(test_set$GBD28),"&"),function(x){paste(x[x!="28"],collapse="&")})

In [4]:
tst <- strsplit(test_set$GBDnp,"&")
alg <- strsplit(test_set$GBD28,"&")
tst <- lapply(tst,as.numeric)
alg <- lapply(alg,as.numeric)

In [5]:
source('../utils/Evaluation_metrics.R')

In [6]:
dis <- 1:27
Mgbd <- read.table("../Data/27_gbd_groups.txt")

In [7]:
#For each category in 1:27, TP, TN, FP and FN of finding the disease and of finding another disease
set.seed(7212)

dis <- as.character(1:27)

PERF_F  <- data.frame()
for(i in dis){
    ALG <- lapply(alg,function(x){rs <- c()
                                  if(i%in%x) rs <- c(1)
                                  if(sum(setdiff(dis,i)%in%x)!=0) rs <- c(rs,2)
                                  return(rs)
                                      })

    DT <- lapply(tst,function(x){rs <- c()
                                if(i%in%x) rs <- c(1)
                                if(sum(setdiff(dis,i)%in%x)!=0) rs <- c(rs,2)
                                return(rs)
                                    })

    CM <- conf_matrix(ALG,DT,c(1,2))

    PERF <- c(CM[1,],CM[2,])
    PERF_F <- rbind(PERF_F,PERF)
}


In [8]:
#We add performances of classifier to identify trials relevant to the burden of diseases
    ALG <- lapply(alg,length)
    DT <- lapply(tst,length)
    CM <- conf_matrix(ALG,DT,1)
    PERF <- c(CM,rep(NA,4))
    PERF_F <- rbind(PERF_F,PERF)

In [9]:
PERF_F <- data.frame(PERF_F)
names(PERF_F) <- paste(rep(c("TP","FP","TN","FN"),2),rep(c("_Dis","_Oth"),each=4),sep="")

In [14]:
PERF_F$dis <- c(dis,0)
PERF_F$GBD <- c(as.character(Mgbd$x[-28]),"All")

In [15]:
PERF_F <- PERF_F[,c(9,10,1:8)]

In [16]:
Mgbd

Unnamed: 0,x
1,Tuberculosis
2,HIV/AIDS
3,"Diarrhea, lower respiratory infections, meningitis, and other common infectious diseases"
4,Malaria
5,Neglected tropical diseases excluding malaria
6,Maternal disorders
7,Neonatal disorders
8,Nutritional deficiencies
9,Sexually transmitted diseases excluding HIV
10,Hepatitis


In [17]:
PERF_F

Unnamed: 0,TN_Oth,FN_Oth,dis,GBD,TP_Dis,FP_Dis,TN_Dis,FN_Dis,TP_Oth,FP_Oth
1,267.0,150.0,1,Tuberculosis,14,2,2745,2,2142.0,204.0
2,333.0,144.0,2,HIV/AIDS,86,7,2659,11,2072.0,214.0
3,299.0,144.0,3,"Diarrhea, lower respiratory infections, meningitis, and other common infectious diseases",40,21,2693,9,2113.0,207.0
4,267.0,150.0,4,Malaria,14,1,2748,0,2142.0,204.0
5,261.0,149.0,5,Neglected tropical diseases excluding malaria,6,0,2756,1,2150.0,203.0
6,289.0,134.0,6,Maternal disorders,17,5,2715,26,2130.0,210.0
7,262.0,148.0,7,Neonatal disorders,4,7,2746,6,2148.0,205.0
8,272.0,150.0,8,Nutritional deficiencies,11,15,2732,5,2140.0,201.0
9,255.0,150.0,9,Sexually transmitted diseases excluding HIV,0,3,2759,1,2155.0,203.0
10,262.0,152.0,10,Hepatitis,14,4,2742,3,2141.0,208.0


In [12]:
write.csv(PERF_F,'../Tables/Performances_per_27disease_data.csv')