# Heatmaps

In past sessions we've learned to read in Excel files, extract out data from tables, and then graph them. We also learned to use Git. This session we will plot differential expression transcriptomics data as heatmaps.

We will:
1) Pull the updated Git repository
2) Install the Seaborn Python package
3) Filter data for rows with Log2FC >3 or <-3.
4) Plot that data as a heatmap.

__Additional Concepts__:
- Relative paths
- Preventing file overwriting using timestamps

*Last edited: Isabella Casini 30.09.2025*

# 1) Pulling (updated Git repository)

1. Open up Git Bash
2. Use the command: "git pull origin main"
3. You should now have the new files (007_Heatmaps.ipynb and the /007_heatmap_data/)

# 2) Install Seaborn package

1. Open Anaconda Prompt
2. Activate your "biotech" environment
	- Hint: conda active biotech
3. Install "seaborn" using conda
	- Hint: conda install seaborn

# 3) Read in the transcriptomics data (comparing two strain to a reference strain)

(We'll use "Sheet4" and "Sheet5" - strains Marburg vs DeltaH and Z-245 vs DeltaH)

In [None]:
# import required libraries
import pandas as pd # call pandas "pd" for short (midline comment)
import numpy as np

# Plotting libraries
import matplotlib.pyplot as plt # import pyplot from matplotlib and call it "plt"
import matplotlib as mpl

# Plotting heatmaps
import seaborn as sns

# Related to file paths
import os

In [None]:
# Path to the file (change your path to where you save your file)
# pathin = r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\ProgrammingWorkshop_Git\ProgrammingWorkshop\007_heatmap_data\Data_S4.xlsx" # what we've used in the past

# Using relative path
pathin = os.path.join("007_heatmap_data", "Data_S4.xlsx") # to go back one directory, use ".."
print(os.path.abspath(pathin))

In [None]:
# Set the gene column to be the index
df_MvsDH = pd.read_excel(pathin, sheet_name='Sheet4', index_col=0)
df_ZvsDH = pd.read_excel(pathin, sheet_name='Sheet5', index_col=0)

In [None]:
# Remove rows with NaN as index in df_MvsDH
df_MvsDH = df_MvsDH[~df_MvsDH.index.isna()]

# Remove rows with NaN as index in df_ZvsDH
df_ZvsDH = df_ZvsDH[~df_ZvsDH.index.isna()]

# 4) Filter out values from each that are Log2FC >3 or <-3

In [None]:
df_MvsDH_filtered = df_MvsDH[(df_MvsDH["log2FoldChange"] <= -3) | (df_MvsDH["log2FoldChange"] >= 3)]
df_ZvsDH_filtered = df_ZvsDH[(df_ZvsDH["log2FoldChange"] <= -3) | (df_ZvsDH["log2FoldChange"] >= 3)]

# 5) Merge the two dataframes keeping only columns of interest

In [None]:
# select only the columns you want
columns_to_keep = ['log2FoldChange', 'padj', 'Gene Group']

# add a suffix to the columns to differentiate between the two dataframes
df_M_Z_vs_DH = pd.concat([df_MvsDH_filtered[columns_to_keep].add_suffix('_MvsDH'), df_ZvsDH_filtered[columns_to_keep].add_suffix('_ZvsDH')], axis=1)

In [None]:
df_M_Z_vs_DH

In [None]:
# remove rows with NaN values in either log2FoldChange_MvsDH or log2FoldChange_ZvsDH
df_M_Z_vs_DH_filtered = df_M_Z_vs_DH[pd.notna(df_M_Z_vs_DH["log2FoldChange_MvsDH"]) & pd.notna(df_M_Z_vs_DH["log2FoldChange_ZvsDH"])]


In [None]:
df_M_Z_vs_DH_filtered

# 6) Plot the heatmap

In [None]:
# NEW define fonts (including size and color)
# Set global font and font size
mpl.rcParams['font.family'] = 'Arial'         # or 'DejaVu Sans', 'Times New Roman', etc.
mpl.rcParams['font.size'] = 10              # global font size

# Set global figure size (width, height) in inches
mpl.rcParams['figure.figsize'] = (8, 6)       # all figures will be 8x6 inches

mpl.rcParams['axes.titlesize'] = 10          # title font size
mpl.rcParams['axes.labelsize'] = 10          # axis label font size
mpl.rcParams['xtick.labelsize'] = 10          # x-axis tick font size
mpl.rcParams['ytick.labelsize'] = 10          # y-axis tick font size
mpl.rcParams['legend.fontsize'] = 10          # legend font size

In [None]:
## Default settings

# Set the figure size
plt.figure(figsize=(2.5, 10))  # slightly wider for readability

# Plot the heatmap
ax = sns.heatmap(
    df_M_Z_vs_DH_filtered[["log2FoldChange_MvsDH", "log2FoldChange_ZvsDH"]],
    xticklabels=["MM vs DH", "ZZ vs DH"],
    yticklabels=df_M_Z_vs_DH_filtered.index,
    cmap="coolwarm"
)

plt.tight_layout()

# Save the figure
pathout = r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\007_heatmap_data"
filename = f"{pathout}_DE_Trans_simple.svg"
# plt.savefig(filename, dpi=300, bbox_inches='tight') # uncomment to save the figure

plt.show()

# 6.1) Customize the plot

In [None]:
# Save the figure with a timestamp
# Need a new package
from datetime import datetime


In [None]:
# Set font scale globally
# sns.set(font_scale=0.6)

# Set the figure size
plt.figure(figsize=(2.5, 10))  # slightly wider for readability

# Plot the heatmap
ax = sns.heatmap(
    df_M_Z_vs_DH_filtered[["log2FoldChange_MvsDH", "log2FoldChange_ZvsDH"]],
    xticklabels=["MM vs DH", "ZZ vs DH"],
    yticklabels=df_M_Z_vs_DH_filtered.index,
    cmap="coolwarm",
    linewidths=0.5, # with lines between cells
	linecolor="white", # color of lines
    cbar_kws={"shrink": 0.5, "aspect": 10, "pad": 0.3} # colorbar settings
)

# Customize colorbar
cbar = ax.collections[0].colorbar
cbar.set_label("Log2FC",fontsize=10, fontweight="bold")
cbar.ax.tick_params(labelsize=10)#, length=0)  # tick font size, tick length 0 to hide ticks

# coords = (x, y) in axes fraction
cbar.ax.yaxis.label.set_rotation(90)
cbar.ax.yaxis.label.set_horizontalalignment("center")
cbar.ax.yaxis.set_label_coords(-1.4, 0.5)  # move further left

# Formatting of the labels and title
ax.set_title("DE Transcriptomics >3 | <-3", fontsize=12, pad=15)
ax.set_ylabel("DH Genes (Reference Strain)", fontsize=10,fontweight="bold")
ax.set_xlabel("")  # keep empty
ax.tick_params(axis="y", labelsize=8)#, which="both", length=0, rotation=0)  # tick label font size, tick length 0 to hide ticks

# Manually add x-axis labels with rotation and position
ax.set_xticks([0.5, 1.5])  # centers of the heatmap cells
ax.set_xticklabels([])     # hide default
ax.text(-1.1, 76, "MM vs DH", ha="left", va="bottom", rotation=50, fontsize=10,fontweight="bold")
ax.text(-0, 76, "ZZ vs DH", ha="left", va="bottom", rotation=50, fontsize=10,fontweight="bold")

# Add a second x-axis on top for "Group1" and "Group2"
ax2 = ax.twiny()
ax2.set_xlim(ax.get_xlim())  # match heatmap
ax2.set_xticks([0.5, 1.5])  # position between the two heatmap columns
ax2.set_xticklabels(["Group1", "Group2"], fontsize=8, fontweight="bold",rotation=90)
ax2.xaxis.tick_top()  # place ticks/labels on top
ax2.tick_params(length=2)  # remove the ticks if you want

plt.tight_layout()

# Pull the timestamp (time now) in the formate that you want, YYYYmmdd_HHMMSS
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# Save the figure
pathout = r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\007_heatmap_data"
# add timestamp to the filename
filename = f"{pathout}_DE_Trans_custom_{timestamp}.svg"
# plt.savefig(filename, dpi=300, bbox_inches='tight') # uncomment to save the figure

plt.show()