<img src="materials/images/introduction-to-visualization-cover.png"/>

# 👋 Welcome, before you start
<br>

### 📚 Module overview

We will go through two lessons with you:

- <font color=#E98300>**Lesson 1: Heatmap**</font>    `📍You are here.`
    
- [**Lesson 2: Volcano Plot**](Lesson_2_Volcano_Plot.ipynb)
</br>



<div class="alert alert-block alert-info">
<h3>⌨️ Keyboard shortcut</h3>

These common shortcut could save your time going through this notebook:
- Run the current cell: **`Enter + Shift`**.
- Add a cell above the current cell: Press **`A`**.
- Add a cell below the current cell: Press **`B`**.
- Change a code cell to markdown cell: Select the cell, and then press **`M`**.
- Delete a cell: Press **`D`** twice.

Need more help with keyboard shortcut? Press **`H`** to look it up.
</div>

---

# Lesson 1: Heatmap

⏲ `This module should take about 20 minutes to complete.`

A <mark>**heatmap**</mark> is a graphical representation of data where values are expressed as colors. It is an effective visual summary of information and enables a large volume of data to be communicated efficiently.

### ✅ `Run` each of the cells below:

# Preview dataset

Below, we will import a dataset of applicants for admission to graduate school.

In [None]:
import pandas as pd
import seaborn as sns

In [None]:
df = pd.read_csv("data/data_heatmap/grad_admit.csv")
df.head()

## View correlation matrix
Here, we will view a matrix of the pair-wise correlations between the variables in the dataset.

In [None]:
df.corr(numeric_only=True)

## View correlation matrix as a heatmap

Below, we render a heatmap in a family of blue colors. Note that the darker the blue, the higher the correlation between a given variable pair. The lighter the blue, the weaker the correlation between variable pairs.

In [None]:
sns.heatmap(df.corr(numeric_only=True), cmap="Blues", annot=True);

---

# Heatmap

A heatmap is a common method of visualizing <mark>gene expression changes</mark> from among hundreds to thousands of genes from different treatment conditions. The heatmap may also be combined with clustering methods which group genes and/or samples together based on the similarity of their gene expression pattern. This can be useful for identifying genes that are commonly regulated, or biological signatures associated with a particular condition (e.g a disease or an environmental condition).

Genes are represented in rows of the matrix and chips/samples in the columns. A colored matrix display represents the matrix of values as a grid; the number of rows is equal to the number of genes being analyzed, and the number of columns is equal to the number of chips/samples.
The boxes of the grid are colored according to the numerical value in the corresponding matrix cell (the gene expression values).

<img src="materials/images/images_heatmap/sample_heatmap.png"/>

You will be able to pick genes based on their expression levels under different conditions. Some may not change but those that do change are of the greatest interest. These indicate gene expression associated with a particular condition. Heatmaps also help one to identify significant groupings among the genes through associations.

---

### ✅ `Run` each of the cells below:

### Sample gene expression data set
Gene names and treatment conditions.

In [None]:
import pandas as pd
from bioinfokit import analys, visuz


df = pd.read_csv("data/data_heatmap/gene_expression.csv")
# set gene names as index
df = df.set_index(df.columns[0])
df.head()

## Heatmap of gene expression data

Heatmaps are used to show relationships between two variables, one plotted on each axis. By observing how cell colors change across each axis, you can observe if there are any patterns in value for one or both variables. Below, we use a colormap from red to green with yellow being the central value. The x-axis represents the treatment conditions and the y-axis represents the gene names.

In heatmaps, the data is displayed in a grid where each row represents a gene and each column represents a sample. The color and intensity of the boxes are used to represent changes (not absolute values) of gene expression. In the following heatmap, <mark>red represents down-regulated genes and green represents up-regulated genes. Yellow represents unchanged expression.</mark>

In [None]:
visuz.gene_exp.hmap(df=df, rowclus=False, colclus=False, cmap='RdYlGn', tickfont=(6, 4), show=True)

---

# 🌟 Ready for the next one?
<br>

- [**Lesson 2: Volcano Plot**](Lesson_2_Volcano_Plot.ipynb)

---

# Contributions & acknowledgment

Thanks Antony Ross for contributing the content for this notebook.

---

Copyright (c) 2022 Stanford Data Ocean (SDO)

All rights reserved.