
# 📈 Bar Plots & Group Comparison with Kruskal-Wallis Test
Analyze and visualize group differences with bar plots and ANOVA in R.

---

**🗂️ Last updated:** 18 September 2025  
**🐳 Docker image:** `gnasello/datascience-env:2025-09-18`


## 📦 Load Libraries

In [None]:

library(dataprepUtils)
library(statsUtils)
library(ggplotUtils)
library(dplyr)


## 🏷️ Customize Plot Labels

In [None]:

title <- ""
xlabel <- ""
ylabel <- "Drug amount (µg)"


## 🎨 Set Colors for Groups

In [None]:

scale_color_manual.values <- c(
  "Blank" = "#8b8c8cff",
  "Blank+Laponite" = "#4c4d4dff",
  "AS286361" = "#dca01cff",
  "AS286361+Laponite" = "#386e28ff"
)


## 📁 Import Dataset

In [None]:
filetable <- "data.csv"

filename <- tools::file_path_sans_ext(filetable)
df <- read_and_process_data(
  filetable,
  x_col = "x",
  y_col = "y",
  xlabels_ordered = names(scale_color_manual.values)
)
head(df)
tail(df)


## 🧹 Optional Data Manipulation *(commented out)*

### ✅ Option 1: Keep only specific values in a column

In [None]:

# Uncomment and edit this section to keep only specific values in a chosen column
# values_to_keeplot_base <- c("Value1", "Value2")      # <-- Replace with values you want to keep
# column <- "ColumnName"                       # <-- Replace with the column name
# data <- subset(df, df[[column]] %in% values_to_keep)
# head(df)


### ❌ Option 2: Remove specific values from a column


In [None]:

# Uncomment and edit this section to remove specific values from a chosen column
# values_to_remove <- c("Value1", "Value2")    # <-- Replace with values you want to remove
# column <- "ColumnName"                       # <-- Replace with the column name
# data <- subset(df, !(df[[column]] %in% values_to_remove))
# head(df)


## 📊 Summarize the Data

In [None]:

df_summary <- statsUtils::data_summary(df, varname = "y", groupnames = c("x"))
print(df_summary)


## 🧪 Validate Statistical Assumptions

In [None]:
check_anova_assumptions(df, response = 'y', group = 'x')

## 🧮 Run Kruskal-Wallis Test

In [None]:
kruskal_results <- df %>% rstatix::kruskal_test(y ~ x)
kruskal_results

## 📏 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. It is calculated as follow : `eta2[H] = (H - k + 1)/(n - k)`; where `H` is the value obtained in the Kruskal-Wallis test; `k` is the number of groups; `n` is the total number of observations (M. T. Tomczak and Tomczak 2014).

The eta-squared estimate assumes values from 0 to 1 and multiplied by 100 indicates the percentage of variance in the dependent variable explained by the independent variable.

The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

In [None]:
df %>% rstatix::kruskal_effsize(y ~ x)

## 🔍 Multiple pairwise-comparisons

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

In [None]:
# Pairwise comparisons
pwc <- df %>% 
              rstatix::dunn_test(y ~ x, p.adjust.method = "bonferroni") 
pwc

## 📊 Visualize Group Differences

In [None]:

ylim <- c(0, 1.24)
width <- 3.9
height <- 7.6

p <- create_complete_barplot(
  df,
  width = width,
  height = height,
  ylim = ylim,
  scale_color_manual.values = scale_color_manual.values,
  filename = paste(filename, '-barplot', sep=''),
  ylabel = ylabel
)

p


## 📐 Add p-values to Plot

In [None]:

pwc <- rstatix::add_xy_position(pwc, x = "x")
p_stats <- add_stat_annotations_auto(p, pwc, y.buffer = 0.5)
p_stats


## 🧩 Arrange Plots Side-by-Side

In [None]:

width_aligned <- 2 * width
options(repr.plot.width = width_aligned)

aligned_plots <- ggpubr::ggarrange(
  p, p_stats,
  nrow = 1,
  align = "hv",
  common.legend = FALSE
)

aligned_plots


## 💾 Export Plots

In [None]:

fileoutput <- paste0(filename, "-barplot_stats.svg")
ggplot2::ggsave(file = fileoutput, plot = aligned_plots, width = width_aligned, height = height)

fileoutput <- paste0(filename, "-barplot_stats.png")
ggplot2::ggsave(file = fileoutput, plot = aligned_plots, width = width_aligned, height = height)


## 📚 References


- [Kruskal-Wallis Test](https://www.datanovia.com/en/lessons/kruskal-wallis-test-in-r/)
