In [154]:
library(pheatmap)
library(dplyr)
library(tidyr)
library(RColorBrewer)
library(viridis)

In [169]:
# Read and prepare the data
metadata <- read.delim("../Metadata.txt", sep="\t", header=TRUE)

heatmap_data <- metadata %>%
  filter(target != "Control") %>%
  group_by(read_length, run_type) %>%
  summarise(count = n_distinct(exp)) %>%
  ungroup() %>%
  mutate(run_type = ifelse(run_type == "single-ended", "SE", "PE"))

# Reshape the data into a matrix
heatmap_matrix <- heatmap_data %>%
  pivot_wider(names_from = read_length, values_from = count, values_fill = 0) %>%
  column_to_rownames("run_type") %>%
  as.matrix()

# Add "nt" suffix to column names
colnames(heatmap_matrix) <- paste0(colnames(heatmap_matrix), " nt")

# Apply log1p transformation
heatmap_matrix_log <- log1p(heatmap_matrix)

# Create a color palette similar to matplotlib's viridis
color_palette <- viridis(100, option = "magma")

# Calculate breaks for color scale
breaks <- seq(0, max(heatmap_matrix_log), length.out = 101)

# Create the heatmap
plot <- pheatmap(heatmap_matrix_log,
         color = color_palette,
         breaks = breaks,
         cluster_rows = FALSE,
         cluster_cols = TRUE,
         show_colnames = TRUE,
         show_rownames = TRUE,
         main = "Number of Experiments by Read Length and Sequencing Type (log1p scale)",
         fontsize = 10,
         fontsize_row = 12,
         fontsize_col = 8,
         angle_col = 45,
         display_numbers = FALSE,
         number_color = "white",
         number_format = "%.2f",
         filename = "readlength_sequencing_type_heatmap_clustered_lognorm.png")

print(plot)

[1m[22m`summarise()` has grouped output by 'read_length'. You can override using the `.groups` argument.


In [148]:
help(pheatmap)

0,1
pheatmap {pheatmap},R Documentation

0,1
mat,numeric matrix of the values to be plotted.
color,vector of colors used in heatmap.
kmeans_k,"the number of kmeans clusters to make, if we want to aggregate the rows before drawing heatmap. If NA then the rows are not aggregated."
breaks,"a sequence of numbers that covers the range of values in mat and is one element longer than color vector. Used for mapping values to colors. Useful, if needed to map certain values to certain colors, to certain values. If value is NA then the breaks are calculated automatically. When breaks do not cover the range of values, then any value larger than max(breaks) will have the largest color and any value lower than min(breaks) will get the lowest color."
border_color,"color of cell borders on heatmap, use NA if no border should be drawn."
cellwidth,"individual cell width in points. If left as NA, then the values depend on the size of plotting window."
cellheight,"individual cell height in points. If left as NA, then the values depend on the size of plotting window."
scale,"character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. Corresponding values are ""row"", ""column"" and ""none"""
cluster_rows,"boolean values determining if rows should be clustered or hclust object,"
cluster_cols,boolean values determining if columns should be clustered or hclust object.
