## Heatmap representation

Once we have the data extracted for the comorbidities per medication group, we proceed to the visual representation. 


First, we load the required libraries:

In [None]:
library("devtools")
library("SqlServerJtds")
library("SqlTools")
library("FactToCube")
library("ggplot2")

Then we create a table with the total number of patients taking only one of the medications. The table, inputData, contains three columns:
- the drug name (Drug)
- the total number of patients taking that drug (n)
- the name of the table containing that information in the database (tableName)

In [None]:
inputData <- as.data.frame( matrix(ncol=3, nrow=7))
colnames(inputData) <- c("Drug", "n", "tableName")

inputData$Drug <- c( "Methylphenidate", "Guanfacine", "Atomoxetine", "Fluoxetine", 
                     "Citalopram", "Risperidone", "Aripiprazole")

inputData$n <- c( 4373, 3387, 561, 3053, 943, 2670, 1997 )

inputData$tableName <- c( "Methylphenidate_CMs", "Guanfacine_CMs", "Atomoxetine_CMs", 
                          "Fluoxetine_CMs", "Citalopram_CMs", "Risperidone_CMs", 
                          "Aripiprazole_CMs")

Then, we do a for loop to put all the information in the same data.frame that we called output and add a column with the total number of patients that take each drug to easily estimate the percentage of affected patients per comorbidity.

In [None]:
for( i in 1:nrow( inputData)){
    
    queryCounts <- paste0( "SELECT * FROM ", inputData$tableName[i], 
                           " ORDER BY Level3_prevalence DESC")
    
    print( i )
    if( i == 1){
        output <- dbGetQuery( cnag, queryCounts )
        output$drug <- inputData$Drug[i]
        output$totalPatients <- inputData$n[i]
        
    }else{
        intermediateOutput <- dbGetQuery( cnag, queryCounts )
        intermediateOutput$drug <- inputData$Drug[i]
        intermediateOutput$totalPatients <- inputData$n[i]
        output <- rbind( output, intermediateOutput )
    }
}


We estimate the percentage of patients with each comorbidity:

In [None]:
output$percentage <- round( 100*(output$Level3_prevalence / output$totalPatients), 3)

We select those comorbidities that are in at least 1% of the patients under each medication class. 

In [None]:
outputSubset <- output[ output$percentage >= 1, ]

## ACT mapping and heatmap representation
To create the heatmap at different levels we merge the information with the ACT mapping ontoly at levels 1 and 3. We select those levels after careful analysis because are detailed enough and not too specific. We were looking for balance between billing codes and clinically relevance. 

In [None]:
map to act
actMapping <- dbGetQuery( cnag, "SELECT Level1, Level3 FROM ACT_ICD10_ICD9_3")
actMapping <- actMapping[!duplicated( actMapping), ]

#there are some level 3 that have different level 1, we should check it
actMapping <- actMapping[!duplicated( actMapping$Level3 ), ]

#mapped the level3 to level1
outputMapped <- merge( outputSubset, actMapping)

#exclude the comorbidities that are not clinically relevant
excludedGroups <- c('Autistic disorder',
                    'Encounter for newborn, infant and child health examinations',
                    'motorized bicycle',
                    'Other unknown and unspecified cause of morbidity or mortality',
                    'Need for prophylactic vaccination and inoculation, Influenza',
                    'Bus occupant injured in transport accident (v70-v79)',
                    'Encounter for other specified aftercare',
                    'Other long term (current) drug therapy',
                    'Body mass index (bmi) pediatric')

outputMapped <- outputMapped[! outputMapped$Level3 %in%  excludedGroups, ]
save(outputMapped, file = "outputMapped.RData")


Then before creating the heatmap we filter by the percentage of paatients having each comorbidity by each medication groups. In this case we will filter by 10%.

In [None]:
drugs <- unique(toplot$drug)
for( i in 1:length(drugs)){
    selection <- toplot[ toplot$drug == drugs[i] & 
                             toplot$percentage >= 10, ]
    if(i == 1){
        phenoList <- selection$Level3
    }else{
        subSet <- selection$Level3
        phenoList <- unique( c( phenoList, subSet))
    }
}

toplot <- toplot[ toplot$Level3 %in% phenoList, ]

Finally we create the heatmap plots:
- one aggregating by Level 1 category. 
- another without the Level 1 aggregation category. 

In [None]:
htmpOutput<- ggplot(toplot, aes(drug, stringr::str_wrap(Level3, 48), fill= percentage)) + # 60
  geom_tile()+
  scale_fill_gradient(low="white", high="blue") +
  #scale_fill_distiller(palette = "YlOrRd")+
  ggplot2::theme_bw() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1,face="bold"),
                 panel.grid = element_blank(), 
                 axis.text.y = ggplot2::element_text(size=rel(1.0)), 
                 axis.title = ggplot2:: element_text(size=rel(1.05)))+
  labs(title = NULL, x = "Drug", y =  "Level3",fill="Percentage")

htmpOutput
ggsave(filename="htmpOutput.png", plot=htmpOutput, device="png",
 height=9, width=11, units="in", dpi=500)

htmpOutput2 <- htmpOutput + facet_grid( vars( stringr::str_wrap(Level1, 50)), scales = "free", space = "free") +
  theme(strip.text.y = element_text(angle = 0,size=rel(0.85)))
htmpOutput2
ggsave(filename="htmpOutput2.png", plot=htmpOutput2, device="png",
       height=9, width=12, units="in", dpi=500)