# For Loops and Functions

This session we will learn about for loops and how to write a basic function. These are useful mechanisms for making code organised and efficient.

We will:
1) Introduce for loops
2) Introduce functions
3) Combining loops and functions
4) Use a function to easily reproduce plots in a for loop

__Additional Concepts__:
- Displaying images in a notebook (local and url)
- Help function for understanding functions
- Using "None" with parameters 

*Last edited: Isabella Casini 17.10.2025*

In [None]:
import pandas as pd
import numpy as np


import matplotlib.pyplot as plt # import pyplot from matplotlib and call it "plt"
import matplotlib as mpl
import seaborn as sns

from IPython.display import Image
from IPython.display import display

import os

from datetime import datetime

# 1) Introduction to "for loops"

In [None]:
# From a URL
display(Image(url='https://media.geeksforgeeks.org/wp-content/uploads/20191101172216/for-loop-python.jpg',width=750))
# From a local file # Put the local path to your image here
# display(Image(filename=r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\for-loop-python.jpg"))

In [None]:
# From a URL
display(Image(url='https://realpython.com/cdn-cgi/image/width=1920,format=auto/https://files.realpython.com/media/UPDATE-Python-for-Loops-Definite-Iteration_Watermarked.32bfd8825dfe.jpg',width=750))

In [None]:
# Example 1: Using a for loop to iterate over a list of fruits

fruits = ['apple', 'banana', 'cherry']

for fruit in fruits:
    print(fruit)

In [None]:
# Example 2: using the number index to access elements in the list

for i in range(len(fruits)):
    print("i:", i)
    print("fruit:", fruits[i])

In [None]:
# Example 3: Using loops with dictionaries

fruit_dict = {'a': 'apple', 'b': 'banana', 'c': 'cherry'}

for key in fruit_dict:
    print("key:", key)
    print("value:", fruit_dict[key])

In [None]:
# Example 4: Using loops with dicionaries

for key, value in fruit_dict.items():
	print("key:", key)
	print("value:", value)

# 2) Introduction to functions

Functions have a few different components:
- "def" which is put a at the beginning indicating the defining of a function
- function name
- Docstring - function description in triple quotes (optional, but recommended)
- parameters to pass from outside the function to inside the function (optional)
- function body which is the code
- "return" statement, which lets you return objective from inside the function to back outside (optional)

In [None]:
# simple function (with only def, function name, and function body)

def helloworld():
	print("Hello World")

In [None]:
# call your simple function
helloworld()

In [None]:
# function with parameters
def greet(name):
	print("Hello", name)

In [None]:
# call your greeting function
greet("Justin")

In [None]:
# function with return value and a docstring

def square(x):
	'''This function returns the square of a number (x)''' # the docstring goes right after the function header
	square_value = x * x
	# print(f"Square value of {x} is:", square_value)
	return square_value

## 2.5) The "help" function

The help built in function retrieves the docstring (description) of another function. The better the function docstring the more helpful. 

Docstrings should include:
- What a function does
- Its parameters - how to use it
- What it returns

In [None]:
# A function with a d
help(square)

In [None]:
help(greet)

In [None]:
# let's look at another example - the print function
help(print)

In [None]:
# value to be squared
value = 3

# call your squaring function
squared_value = square(value)
print("Returned value is:", squared_value)

# 3) Let's combine loops and functions

In [None]:
# I want to greet multiple people and give them their squared value
names = ['Justin', 'Haxby', 'Subaru','Isabella']
values = [2,3,4,5]

for i in range(len(names)):
    # print("\ni:", i)
    greet(names[i])
    square_value = square(values[i])
    print(f"Your square value is: {square_value}\n")

# 4) Using loops and functions for plotting 3 heatmaps in one figure


## 4.1) Write a function to import the data from last session

In [None]:
# import data 
pathin = r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\ProgrammingWorkshop_Git\ProgrammingWorkshop\007_heatmap_data\Data_S4.xlsx" # what we've used in the past


# Using relative path
# pathin = os.path.join("007_heatmap_data", "Data_S4.xlsx") # to go back one directory, use ".."
print(os.path.abspath(pathin))

In [None]:
# What we did before in session 007
# # Set the gene column to be the index
# df_MvsDH = pd.read_excel(pathin, sheet_name='Sheet4', index_col=0)
# df_ZvsDH = pd.read_excel(pathin, sheet_name='Sheet5', index_col=0)
# df_ZvsM = pd.read_excel(pathin, sheet_name='Sheet6', index_col=0)
#-------------------------------------------------------------------------------------------------------

def load_data(pathin, sheet_name, index_col=None):
	"""Load data from an Excel file as a pandas DataFrame.
	Parameters:
	pathin (str): The file path to the Excel file.
	sheet_name (str): The name of the sheet to load.
	index_col (int, optional): The column to set as the index. Defaults to 0
	
	Returns:
	pd.DataFrame: The loaded DataFrame.
	"""
	df = pd.read_excel(pathin, sheet_name=sheet_name, index_col=0)
	return df

In [None]:
# Now we can use our function to load each sheet
list_of_sheets = ['Sheet4', 'Sheet5', 'Sheet6'] # MvsDH, ZvsDH, ZvsM

list_of_dfs = [] # initalize an empty list to hold the dataframes that we will load

for sheet in list_of_sheets:
	df = load_data(pathin, sheet) # default index column is 0
	list_of_dfs.append(df)

## 4.2) Write a function to process the data for the heatmaps

In [None]:
# What we did before in session 007

# # Remove rows with NaN as index in df_MvsDH
# df_MvsDH = df_MvsDH[~df_MvsDH.index.isna()]

# # Remove rows with NaN as index in df_ZvsDH
# df_ZvsDH = df_ZvsDH[~df_ZvsDH.index.isna()]

# # Remove rows with NaN as index in df_ZvsM
# df_ZvsM = df_ZvsM[~df_ZvsM.index.isna()]


# df_MvsDH_filtered = df_MvsDH[(df_MvsDH["log2FoldChange"] <= -3) | (df_MvsDH["log2FoldChange"] >= 3)]
# df_ZvsDH_filtered = df_ZvsDH[(df_ZvsDH["log2FoldChange"] <= -3) | (df_ZvsDH["log2FoldChange"] >= 3)]
# df_ZvsM_filtered = df_ZvsM[(df_ZvsM["log2FoldChange"] <= -3) | (df_ZvsM["log2FoldChange"] >= 3)]
#-------------------------------------------------------------------------------------------------------

def clean_filter_df(df,cutoff):
	"""Clean the DataFrame by removing rows with NaN as index.
	Filter values where "log2FoldChange" is less than or equal to -cutoff or greater than or equal to cutoff.
	
	Parameters:
	df (pd.DataFrame): The DataFrame to clean.
	cutoff (float): The cutoff value for filtering "log2FoldChange".
	
	Returns:
	pd.DataFrame: The cleaned DataFrame.
	"""
	# Remove rows with NaN as index
	cleaned_df = df[~df.index.isna()]

	# filter based on cutoff
	cleaned_df = cleaned_df[(cleaned_df["log2FoldChange"] <= -cutoff) | (cleaned_df["log2FoldChange"] >= cutoff)]
	return cleaned_df


In [None]:
list_of_cleaned_filtered_dfs = [] # initalize an empty list to hold the cleaned and filtered dataframes

# loop through each dataframe in the list_of_dfs
for df in list_of_dfs:
	cleaned_filtered_df = clean_filter_df(df, cutoff=7) # run the cleaning and filtering function
	# print(cleaned_filtered_df)
	list_of_cleaned_filtered_dfs.append(cleaned_filtered_df) # add the cleaned and filtered dataframe to the new list

In [None]:
list_of_cleaned_filtered_dfs[0]

## 4.3) Exercise:
What could you change in clean_filter_df function if you wanted to filter by a different column?

## 4.4) Write a function to plot a heatmap

In [None]:
# figure parameters

# Set global font and font size
mpl.rcParams['font.family'] = 'Arial'         # or 'DejaVu Sans', 'Times New Roman', etc.
mpl.rcParams['font.size'] = 10            # global font size

# Set global figure size (width, height) in inches
mpl.rcParams['figure.figsize'] = (8, 6)       # all figures will be 8x6 inches

mpl.rcParams['axes.titlesize'] = 10          # title font size
mpl.rcParams['axes.labelsize'] = 10          # axis label font size
mpl.rcParams['xtick.labelsize'] = 10          # x-axis tick font size
mpl.rcParams['ytick.labelsize'] = 10          # y-axis tick font size
mpl.rcParams['legend.fontsize'] = 10          # legend font size

In [None]:
# compare to what we did in session 007
def plot_heatmap(df, label, columns, mapcolor='coolwarm', mapsize=(2.5, 10), pathout=None):
	"""Plot a heatmap from a DataFrame.
	
	Parameters:
	df (pd.DataFrame): The DataFrame to plot.
	label (str): The label of the heatmap.
	columns (list): List of columns to include in the heatmap.
	mapcolor (str, optional): The colormap to use. Defaults to "vlag".
	mapsize (tuple, optional): The size of the figure. Defaults to (6,8).
	pathout (str, optional): The path to save the figure. Defaults to None.

	Returns:
	None

	Plots a heatmap using seaborn and matplotlib.
	Saves the figure if pathout is provided.
	"""
	plt.figure(figsize=mapsize)
	ax = sns.heatmap(df[columns], cmap=mapcolor,
			 yticklabels=df.index, # there are too many genes so we skips some labels
			 linewidths=0.5, # with lines between cells
			 linecolor="white", # color of lines
    		 cbar_kws={"shrink": 0.5, "aspect": 10, "pad": 0.3})#, # colorbar settings
			#  center=0, annot=True, fmt=".2f")

	# Customize colorbar
	cbar = ax.collections[0].colorbar
	cbar.set_label("Log2FC",fontsize=10, fontweight="bold")
	cbar.ax.tick_params(labelsize=10)#, length=0)  # tick font size, tick length 0 to hide ticks

	# coords = (x, y) in axes fraction
	cbar.ax.yaxis.label.set_rotation(90)
	cbar.ax.yaxis.label.set_horizontalalignment("center")
	cbar.ax.yaxis.set_label_coords(-1.4, 0.5)  # move further left

	# Formatting of the labels and titles
	ax.set_title(f"DE Transcriptomics >7 | <-7", fontsize=12, pad=15)
	ax.set_ylabel("DH Genes (Reference Strain)", fontsize=10,fontweight="bold")
	ax.set_xlabel("")  # keep empty
	ax.tick_params(axis="y", labelsize=8)#, which="both", length=0, rotation=0)  # tick label font size, tick length 0 to hide ticks

	# Manually add x-axis labels with rotation and position
	ax.set_xticklabels([label])     # hide default

	step = max(1, len(df) // 40)  # show ~40 labels max, adjust as needed
	ax.set_yticks(ax.get_yticks()[::step]) # take every 'step' tick
	ax.set_yticklabels(df.index[::step]) # take every 'step' label

	# Add a second x-axis on top for "Group1"
	ax2 = ax.twiny()
	ax2.set_xlim(ax.get_xlim())  # match heatmap
	ax2.set_xticks([0.5])  # position between the two heatmap columns
	ax2.set_xticklabels(["Group1"], fontsize=8, fontweight="bold",rotation=90)
	ax2.xaxis.tick_top()  # place ticks/labels on top
	ax2.tick_params(length=4)  # remove the ticks if you want

	plt.tight_layout()

	if pathout:
		# Pull the timestamp (time now) in the formate that you want, YYYYmmdd_HHMMSS
		timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
		# add timestamp to the filename
		filename = f"{pathout}\DE_Trans_loop_{label}_{timestamp}.svg"
		plt.savefig(filename, dpi=300, bbox_inches='tight') # uncomment to save the figure


	plt.show()

## 4.5) Use the heatmap plotting function for three different conditions in a for loop

In [None]:
labels = ["MM vs DH", "ZZ vs DH", "ZZ vs MM"]
pathout = r"C:\Users\uqicasin\Documents\Teaching\Program_Workshop\ProgrammingWorkshop_Git\ProgrammingWorkshop\007_heatmap_data" # change to your desired output path
for i in range(len(labels)):
	plot_heatmap(list_of_cleaned_filtered_dfs[i], label=labels[i], columns=["log2FoldChange"],pathout=pathout)