<a href="https://colab.research.google.com/github/nunososorio/SingleCellGenomics2024/blob/main/5_Friday_April12th/Spatial_Transcriptomics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Spatial transcriptomics

In this part of the course we will focus on spatial transcriptomics data. Spatial transcriptomic technologies allow for the systematic measurement of gene expression levels throughout the tissue sample, increasing our understanding of cellular organisations and interactions within tissues while also providing biological insights in a wide range of subject and diseases. Multiple types of spatial techniques have been developed and they vary within spatial resolution, multiplexing capabilities, sensitivity, coverage, and throughput.

The different developed methods can by divided into tour categories:<br>
- **Sequencing-based:** 10X Genomics Visium, Stereo-seq, Slide-seq, Light-seq;<br>
- **Probe-based:** NonoString GeoMx;<br>
- **Imaging-based:** NanoString CosMx SMI, STARmap, MERFISH, seqFISH; <br>
- **Image-guided spatially resolved:** NICHE-seq, Geo-seq, Zip-seq.<br>

Some methodologies also allow to access different types of omics in addition to transcriptomics (RNA), such as DNA, protein, metabolite, chromatin accessibility, histone modification , among others. For more information on this topic you can check the following paper doi: 10.3390/cells12162042

# Load packages

In this tutorial we'll be using the following packages in order to perform the basis of the spatial transcriptomics data analysis. 

*Scanpy* will be used to perform teh data analysis and visualization, as donne on the previous notebook for scRNA-seq.

*Pandas* and *Numpy* packages will be used to perform data matrices manipulation.

*seaborn* and *matplotlib* will be used in some cases to plot the results.

For more advanced analysis we can check on *squidpy*.

In [None]:
! pip install scanpy > _

In [None]:
import scanpy as sc
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

sc.settings.verbosity = 3  

# Data set

This dataset was retrieved form the GEO database using the accession number ***GSE226208*** and the paper is entitled "*Shared inflammatory glial cell signature after brain injury, revealed by spatial, temporal and cell-type-specific profiling of the murine cerebral cortex*" and has the following doi: https://doi.org/10.1101/2023.02.24.529840. The goal of this paper was to understand the mechanisms working after traumatic brain injury (TBI). This study is composed by two dataset: one of spatial data, which the one we are using and one of scRNA-seq data, which we'll not use in this course. These tow were used to analyse the transcriptomic signature of the mouse injured cerebral cortex. 

The main results were the identification of specific states for microglia, astrocytes, and OPCs which comprised some genes related to injury, including genes related to inflammatory responses of the innate immune system (Cxcr3).

The spatial dataset is composed by 2 samples, one from healthy mouse brain (Intact/ctl) and another from a 3 days post-injury brain (3DPI/dpi3). The brain injuries were stab-wonds induced on the cerebral cortex, affecting only the grey matter. Each spatial transcriptomics slide contains 2 tissue slices from each condition, one with the normal orientation and the other one upside-down. 

As as summary, this study serves as a spatial and temporal map of cell-type specific mouse cortex, revealing the inflammatory signatures of glial cells after trauma. 

The technique used to sequence this samples was the Visium technology from 10X, where each slice contains 5000 spots of 55$\mu with unique barcodes, and each spot can capture the transcripts from 1 to 10 cells, so the number of cells per spot will depend on the size of each cell and its neighbours. To capture the mRNA, the tissue needs to be permeabilized and then the mRNA will bind to the capture oligonucleotides present in each spot. Then the mRNA is synthetized into cDNA and the sequencing library is prepared. To get the spatial information before permeabilizing the tissue we need to stain the tissue with H&E or IF and do the imaging.

Here we start already with the data matrix instead of the FASTQ files, which means that the authors already ran the SpaceRanger, which is the equivalent of CellRanger but for spatial data.

Here is a figure to explain more about the dataset we will use:

<img src="https://github.com/nunososorio/SingleCellGenomics2024/blob/main/5_Friday_April12th/overview_figure.png?raw=true" alt="AnnData" style="width:600px; height:auto;"/>

# Download the data to use in this tutorial

In [None]:
! wget https://github.com/nunososorio/SingleCellGenomics2024/raw/main/5_Friday_April12th/Data.zip

! unzip Data.zip

# Load the data

In [None]:
ctl = sc.read_visium(path='Data/GSE226208/intact')
dpi3 = sc.read_visium(path='Data/GSE226208/3dpi/')

# Pre-processing

## Check data structure

### Exercise

Check the data structure for each condition

In [None]:
#write your code here

In [None]:
#write your code here

In the following steps we'll make the *var_names* (gene names) unique for each data set, create a raw data object, where the raw data of each sample will be saved in case we need the raw counts matrix on the downstream analysis. 

Lastly, we'll create a new column on the *obs* slot, named *Condition*, containing samples' origin information, wether the data is from the Intact cortex or 3DPI.

In [None]:
ctl.var_names_make_unique()
dpi3.var_names_make_unique()

In [None]:
ctl.raw = ctl.copy()
dpi3.raw = dpi3.copy()

In [None]:
ctl.obs['Condition'] = 'Intact'
dpi3.obs['Condition'] = '3DPI'

Here we are setting colors to each condition, being the Intact sample pastel blue and teh 3DPI sample pastel red. this helps when plotting since we know already each colors represents each condition.

In [None]:
ctl_col = '#a1c9f4'
dpi3_col = '#ff9f9b'

Next we calculate some basic metrics for each sample, so then we can compare them and make decisions on the next steps to be executed. The following command will calculate some quality metrics, being the most important and the one that we'll use the total number of counts and the number of genes by counts.

In [None]:
ctl_m = ctl.copy()
dpi3_m = dpi3.copy()

In [None]:
sc.pp.calculate_qc_metrics(ctl_m, inplace=True)
sc.pp.calculate_qc_metrics(dpi3_m, inplace=True)

In [None]:
print(f"{ctl_m} \n\n {dpi3_m}")

Here the 2 metrics mentioned above wil be plotted side by side for each sample and the plots will be colored according to the chosen colors.

In [None]:
fig, axs = plt.subplots(1,4, figsize=(15,4))
fig.suptitle('Covariates for filtering')
sns.histplot(ctl_m.obs['total_counts'], kde=False, bins=60, ax = axs[0], color=ctl_col)
sns.histplot(dpi3_m.obs['total_counts'], kde=False, bins=60, ax = axs[1], color=dpi3_col)
sns.histplot(ctl_m.obs['n_genes_by_counts'], kde=False, bins=60, ax = axs[2], color=ctl_col)
sns.histplot(dpi3_m.obs['n_genes_by_counts'], kde=False, bins=60, ax = axs[3], color=dpi3_col)

**NOTE:** Maybe here we can ask how they perceive the samples, if their quality is similar or not, based on the distribution of the histograms since we will use it to decide if we merge teh samples or not.

# Merge data

When the project we are working on is composed by more than one dataset from different samples, we can merge all the small datasets into a major one, containing all the samples. This helps on the filtering process since we'll be working with only one big object instead of several small ones. Additionally this makes the filtering process more uniform among samples. However, when we see that the quality of the samples is quite different between then, the filtering should be perform in each sample individually and inly after that we should merge the different samples.

## Merge counts data

Since we are working with spatial data, to merge our samples into one object we need to perform some steps before in order to allow us to visualize the data projected on the image. For that we need to use the package *stlearn*. Tis package also offers a wide range of possible downstream analysis and is  worth to check if you need analyze spatial data in the future.

To use this package we only need to install it as follows on the next code cell.

In [None]:
! pip install stlearn > _

Once installed we need to load it, and then convert our individual dataset from scanpy format to the stlearn format. This step will add two new columns to the *obs* slot called *imagecol* and *imagerow*, and these are the ones used to merged both images. The code used to perform this step used also the *numpy* package to manipulate the array containing the image information/coordinates.

In [None]:
import stlearn as st

In [None]:
ctl = st.convert_scanpy(ctl)
dpi3 = st.convert_scanpy(dpi3)

Once our data is converted we can now merge both samples into one big dataset.

To merge two or more samples into one data object we use th scanpy function *concatenate* as follows. We select one of the samples to be the one where all the rest of the samples will be concatenated to. In our case we'll concatenate the 3DPI sample to the Intact sample.

**NOTE:** Here we can ask if they think we should merge both samples into one object.

In [None]:
adata = ctl.concatenate(dpi3)

### Exercise

Check the data structure and also check both *var* and *obs* slots of the data. Do you see any difference when comparing to scRNA-seq data?

In [None]:
#write your code here


In [None]:
#write your code here


In [None]:
#write your code here


## Merge images data

Even though *scanpy* does not offer an option to integrate spatial images, we can do it manually. However, we first need to convert our samples from the *scanpy* format to *stlearn* format as we already did when merging the data counts. If we perform this step we do not need to split our data into two distinct datasets to perform visualization on the spatial images. 

To merged our images we will also use the package *numpy* since we need to manipulate the arrays containing teh coordinates of each image. This stepp will allow to plot each slice side by slice as it was only one image, avoid the need to plot each sample separately. 

In [None]:
# Initialize the spatial
adata.uns["spatial"] = ctl.uns["spatial"]

# Horizontally stack 2 images from section 1 and section 2 datasets
combined = np.hstack([ctl.uns["spatial"]["D_Intact"]["images"]["hires"],
                      dpi3.uns["spatial"]["A_3dpi_V"]["images"]["hires"]])

# Map the image to the concatnated adata object
adata.uns["spatial"]["D_Intact"]["images"]["hires"] = combined

# Manually change the coordinate of spots to the right
adata.obs.loc[adata.obs.batch == "1","imagecol"] = adata.obs.loc[adata.obs.batch == "1","imagecol"].values + 2000

# Change to the .obsm["spatial"]
factor = adata.uns["spatial"]["D_Intact"]["scalefactors"]["tissue_hires_scalef"]
adata.obsm["spatial"] = adata.obs[["imagecol","imagerow"]].values / factor

### Exercise
Save the merged data object into your working directory.

In [None]:
# Write you code here

# QC

One of the first steps when preparing the data to perform quality control e to get QC metrics  such as the one calculated previously (total counts and number of genes by counts). Here, additionally we will also get mitochondrial (mt) and ribossomal (ribo) genes and then calculate the percentage of mitochondrial and ribossomal counts. To get which genes are mitochondrial or ribossomal genes we we add a new column to the *var* slot with True or False values, classifying the genes as mt or ribo. For taht we use the function *startswith* to get the genes that start with ***mt*** or by ***Rps***/***Rpl***, corresponding to mitochondrial and ribossomal genes, accordingly.


For that we will use the function previously used *sc.pp.calculate_qc_metrics*

## QC metrics

In [None]:
adata.var["mt"] = adata.var_names.str.startswith("mt-")

adata.var["ribo"] = adata.var_names.str.startswith(("Rps","Rpl"))

#fill in the blanks
#Calculate qc metrics for mt and ribo genes
sc.pp.calculate_qc_metrics(, qc_vars=[], inplace=True)


In [None]:
adata

## Cell cycle

As for the scRNA-seq data analysis, here we can also add cell cycle information to the data.

### Exercise

Search for the files containing the cell cycle information inside the Data folder you downloaded at the begining of the notebook and load it on the following cell of code. Then calculate the cell cycle score.

In [None]:
#Fill in the blank spaces
#Load the files containing the genes to each phase of teh cell cycle
s_genes = [x.strip() for x in open(  )]
g2m_genes = [x.strip() for x in open(  )]

In [None]:
#fill in the blank spaces
sc.tl.score_genes_cell_cycle(  )

Then we plot the calculated values and we chose the thresholds to our data. This visualization can be performed by using histograms or violin plots.

In [None]:
fig, axs = plt.subplots(1,4, figsize=(15,4))
fig.suptitle('Covariates for filtering')
sns.histplot(adata.obs['total_counts'], kde=False, ax = axs[0])
sns.histplot(adata.obs['n_genes_by_counts'], kde=False, bins=60, ax = axs[1])
sns.histplot(adata.obs['pct_counts_mt'], kde=False, bins=60, ax = axs[2])
sns.histplot(adata.obs['pct_counts_ribo'], kde=False, bins=60, ax = axs[3])

Here we are doing a zoom in on the histogram to choose the lowe threshold for each filtering parameter.

In [None]:
fig, axs = plt.subplots(1, 4, figsize=(15, 4))
sns.histplot(adata.obs["total_counts"][adata.obs["total_counts"] < 10000], kde=False, bins=60, ax=axs[0])
sns.histplot(adata.obs["n_genes_by_counts"][adata.obs["n_genes_by_counts"] < 4000], kde=False, bins=60, ax=axs[1])
sns.histplot(adata.obs["pct_counts_mt"][adata.obs["pct_counts_mt"] > 25], kde=False, bins=60, ax=axs[2])
sns.histplot(adata.obs["pct_counts_ribo"][adata.obs["pct_counts_ribo"] > 10], kde=False, bins=60, ax=axs[3])

In [None]:
sc.pl.violin(adata, keys=['total_counts','n_genes_by_counts', 'pct_counts_mt', 'pct_counts_ribo'], rotation=90, multi_panel=True)

### Exercise

Chose the thresholds to filter the data. Since the resolution is not single cell, we should take it into account when choosing the values to filter the data. here we are not filtering out cells but spots that may contain more than one cell, keep that in mind.

**Hint**: The code used to filter spatial data is the same used to filter the scRNA-seq data.


In [None]:
#write your code here

#filter counts
sc.pp.filter_cells(adata, min_counts =)
sc.pp.filter_cells(adata, max_counts =)

#filter genes
sc.pp.filter_cells(adata, min_genes =)
sc.pp.filter_genes(adata, min_cells =)

#filter mito genes
#write your code here

#filter ribo genes
#write your code here

In [None]:
sc.pl.violin(adata, keys=['total_counts','n_genes_by_counts', 'pct_counts_mt', 'pct_counts_ribo'], rotation=90, multi_panel=True)

### Exercise
Save the filtered data into your working directory

In [None]:
# Write you code here

# Normalization and logaritmization

After filtering the cells with bad quality we proceed to data normalization and logaritmization. Here we will use the default method to normalize the data, which normalizes each cell by total counts over all genes. This makes that every cell will have the same total count after normalization. However other methods could have been used as SCTransform or GLM-PCA, whose have an higher sensitivity for normalization. 

In [None]:
sc.pp.normalize_total(adata, inplace = True)
sc.pp.log1p(adata)

In this case we will not perform regression of any variable, such as percentage of mt genes or cell cycle since this variables may play a role on the variation of the cell types after injury.

# Dimentionality reduction

Once filtering is done the next step, as in scRNA-seq data, is to perform a reduction of the dimension of teh dataset we are working with. 

The first step will be to select the high variable genes for our data, followed by principal component analysis and uniform manifold  approximation and projection.

## Highly Variable Genes (HVG)

In this case we will decide how many genes we want to keep, which is 4000. This value is up to you, many tutorials usually uses 2000 genes, but for the purpose of this course we will keep a little more genes.

There is also the possibility to define different thresholds to different metrics and teh function will select the genes that passes those thresholds as the HGV of our dataset. The used method will depend on the data your working with and om which one you think fits the best your goal on the analysis. For more information you can check the documentation of the following function *highly_variable_genes*.

### Exercise

Calculate the HVG selectin the top 4000 genes.

In [None]:
#fill in the blanks
sc.pp.highly_variable_genes(, inplace=True)

In [None]:
sc.pl.highly_variable_genes(adata)

Here we keep only the HVG on the data, removing all the genes that are not highly variable

In [None]:
var_genes_all = adata.var.highly_variable

adata = adata[:, var_genes_all]

adata

## Principal Component Analysis (PCA)

The PCA is performed with teh same goal as on the analysis of scRNA-seq data, which is to reduce the dimension of the dataset, reduction it to the lower number possible of principal components (PCs) that allow to retain the maximum variation within the data. 

Here the PCA will be performed with the default settings, nonetheless these values can be adjusted to the data we are working with.

In our casa we don't need to set the parameter ***use_highly_variable*** since our dataset only has the HVG already.

### Exercise

Calculate the PCA using the function *sc.pp.pca()*

In [None]:
#write here your code

Next we plot the variance of the calculated PCs in order to select how many we want to keep. Usually the rule is that we select the last one before the curve of the plot starts to flatten. However we can select a little fewer or a little more.

### Exercise

Plot the variance ratio for the calculated PCs, and show 50 PCs on the plot.

**Hint:** The code is the same as the one used for scRNA-seq data.

In [None]:
#write here your code
sc.pl.pca_variance_ratio()

## Uniform Manifold Approximation and Projection (UMAP)

This method is used to visualize the data in a 2D way, making it simpler. To compute the UMAP we use the number of PC we selected on the previous step.

### Exercise

Compute the UMAP using the number of PC that you think fits the best. Use the functions *sc.pp.neighbors()* and *sc.tl.umap()*.

In [None]:
#fill in the blanks and write your code
sc.pp.neighbors(n_neighbors=15)


Now we can perform data visualization in a reduces dimension, such as visualize the metrics calculated on the CQ step in a 2D space.

### Exercise

Plot the metrics used on quality control and the variable *Condition* on the UMAP, so you can see where are the spots with the higher and lower values to each of those metrics.

**Metrics:** *'total_counts'*, *'n_genes_by_counts'*, *'pct_counts_mt'*, *'pct_counts_ribo'*.

In [None]:
#fill in the missing values from the function
sc.pl.umap(adata, color=, ncols=2, cmap='viridis')
sc.pl.umap(adata, color=, palette=[dpi3_col, ctl_col])

# Clustering

For spatial data, clustering can be performed as for scRNA-seq data, and the same principals apply here. 

### Exercise
Fill in the spaces with commented line with your code. To make it easier we will cerate a list where we will save the names we gave to each cluster to use it further on the analysis.

In [None]:
#create here you empty list


# complete the commented lines
for  in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4]: #chose the name of you iteration variable so in each loop it will assume a different resolution value
    sc.tl.leiden(, resolution= ,  key_added='clusters_%s'%  )
    #append the name of each clsuter resolution calculated to the list  you created use the following string formating to name you cluster resolution eg. clusterin with resolution 0.1 will be called clusters_0.1
    ('clusters_%s'%   )


### Exercise

Plot all the cluster resolution on a UMAP, use the list we created above to make it easier.

In [None]:
# fill in the wite spaces
sc.pl.umap( ncols=3, wspace = 0.6)

Since scanpy does not offer a tool to integrate the images of the dataset, now we need to split our dataset again into two different objects, but now the all the pre processing and dimensionality reduction of teh data is already performed. For that we split the our ***adata*** object into ***adata_ctl*** and ***adata_dpi3*** using the column ***Condition*** on the *obs* slot of the data, by indicating the different conditions to split.

In [None]:
adata_ctl = adata[adata.obs['Condition'] == 'Intact',:]

adata_dpi3 = adata[adata.obs['Condition'] == '3DPI',:]

In [None]:
adata

In [None]:
print(f"{adata_ctl} \n\n {adata_dpi3}")

Once our data is split in two we can now plot our data into the spatial images

### Exercise

Plot the different control metrics for each condition on the spatial image: *'total_counts'*, *'n_genes_by_counts'*, *'pct_counts_mt'*, *'pct_counts_ribo'*.

In [None]:
sc.settings.set_figure_params(dpi=150)

#fill in the empty spaces

sc.pl.spatial(, img_key = "hires", color=)
sc.pl.spatial(, img_key = "hires", color=)

### Exercise

Now plot the clusters on the spatial image for each condition. Complete the lines with missing code.

**Hint:** remember that we created a list with all cluster names previously!

In [None]:
#write your code here
sc.pl.spatial(, img_key = "hires", color=, size=1.5, wspace=0.5)
sc.pl.spatial(, img_key = "hires", color=, size=1.5, wspace=0.5)


### Exercise

Now we will plot the clusters on the integrated image of the data set (Both intact and 3DPI samples side by side). This will help to note differences on the clusters between samples. 

To do this exercise we weill use a for loop to iterate the cluster names and the function *st.pl.cluster_plot()* from *stleran* to plot the clusters on the integrated image. Fill in the empty spaces.

In [None]:
# fill the blanks

for  in :
    st.pl.cluster_plot(a, use_label=, crop=False, size=1.4, cell_alpha=1)

In [None]:
adata

# Rank marker genes

Know we rank the genes in each cluster the get it's marker genes to further annotated them with the according cell type.

### Exercise

Using a for loop to iterate the names fo the clusters, the *sc.tl.rank_genes_groups()* to rank the genes and the function *sc.pl.rank_genes_groups_dotplot()* to plot the marker genes using a dotplot, rank and plot the marker genes for each cluster. The method to rank the marker genes will be the *wilcoxon* method.

Fill in the blank spaces.

In [None]:
#complete the commented lines

for  in :
    #here print the name of the resolution beiing used in each iteration
    sc.tl.rank_genes_groups(, , inplace = True, , use_raw=False)
    sc.pl.rank_genes_groups_dotplot(, n_genes=15, groupby = , vmax=5)


In [None]:
adata

# Cluster annotation

When annotating the different clusters from a spatial transcriptomics dataset we need to keep in mind that each spot may contain more than one cell, depending on the size of the cell in that spot. For example, immune cells are much smaller than neurons or oligodendrocytes. For that reason some times it can be difficult to attribute only one cell type to a cluster, in that case we can use a more general classification.

**Question:** Do you thin k that we may have any cluster that may be composed by different cell types? To help you chose the best resolution you can go and check on the paper how many clusters they considered. 

**NOTE:** To annotate your clusters without a reference data set you can use the following mouse brain expression map (http://mousebrain.org/adolescent/genesearch.html) to check which cells express the genes obtained to each cluster. Also in case you need to check where a certain region of the brain is located you can check this atlas https://atlas.brain-map.org/.

### Exercise:

Annotated the cell types you identified, first by cell types and then by cells groups. This means that first you consider all cell types and then the different subtypes of cells will be grouped together. For tat you will create two new columns on the *obs* slot of the adata object called **Cell_type** and **Cell_type_groups**.


In [None]:
#Fill in the blanks

adata.obs[] = adata.obs[].replace({  })#here write a dict where you keys are the numbers of your cluster on your chosen resolution and the values are the corresponding cell type 

In [None]:
#write your code here

#Now create the obs Cell_type_groups, using the same cluster resolution used above, but group by cell groups, avoiding subtypes.

### Exercise

Plot the cell types into the integrated spatial images, so you can see where the clusters are. Use the function already previously used *st.pl.cluster_plot()*

In [None]:
#fil in the the missing values
st.pl.cluster_plot(, use_label=, crop=False, size=1.4, cell_alpha=1)
st.pl.cluster_plot(, use_label=, crop=False, size=1.4, cell_alpha=1)

### Exercise

Plot both cell type annotations and Condition using the UMAP projection.

In [None]:
#write your code here

### Exercise

Calculate the marker genes as done previously for the different cluster resolutions, but now use the **Cell_type** annotation, and plot them using a dotplot. For that use the following functions: *sc.tl.rank_genes_groups()*, *sc.pl.rank_genes_groups_dotplot()*.

In [None]:
#Fill in the blank spaces
sc.tl.rank_genes_groups(, , inplace = True, key_added="wilcoxon_Cell_type", method='wilcoxon', use_raw=False)
sc.pl.rank_genes_groups_dotplot(, n_genes=15, key="wilcoxon_Cell_type", groupby = 'Cell_type', vmax=5)

**NOTE:** As you can see the it can be quite difficult to find the most specific genes for each cluster. In cases like this we can add some filtering parameters to our rank genes function that will clean a little bit the genes selected to caracterize the cluster. This step can also be done before cluster annotation if needed.

### Exercise

Using the function *sc.tl.filter_rank_genes_groups()* and the following filtering parameters filter the rank genes and plot them using the function *sc.pl.rank_genes_groups_dotplot()*

**Parameters:** *min_in_group_fraction=0.1*, *max_out_group_fraction=0.5*, *min_fold_change=0.25*.

Fill in the blank sapces.

In [None]:
#fill in the spaces
sc.tl.filter_rank_genes_groups(, , key = "wilcoxon_Cell_type", key_added='wilcoxon_filtered_leiden_Cell_type'   )

#fill in the spaces to plot the dotplot
sc.pl.rank_genes_groups_dotplot(, n_genes=15, key="wilcoxon_filtered_leiden_Cell_type" , groupby="Cell_type", vmin=0, vmax=5)

### Exercise

Perform the same two exercises you did for ranking the *Cell_type* marker genes, but now do it with the **Cell_type_groups**.

**Hint:** Don't forget to change the name of the function variables.

In [None]:
#write your code here

In [None]:
#write your code here (filter the range genes for cell_type_groups)


# Marker genes

### Exercise 

Based on the dot plot above select 3 marker genes that you consider the best ones to represent the cluster. In case you know some canonical markers for the cell types identified you can also use those ones in addition to the ones on the dotplot. Do this step for both annotations (*Cell_type* and *Cell_type_groups*) and plot then on the spatial map, on the UMAP, and if you want you can also use a dot plot and group it by clusters with their marker genes.


In [None]:
#create your list here


### Cell type

#### Spatial image

In [None]:
st.settings.set_figure_params(dpi=120)


#fill in the blanks to plot each gene individually
for  in :
    st.pl.gene_plot(, gene_symbols=, use_raw=True, size=0.5, cmap='viridis')

#### UMAP

In [None]:
#fill the blanks
sc.pl.umap(, , use_raw=False, ncols=3, cmap='viridis', vmax=5)

#### Dot plot 

To plot the marker genes using a dot plot  you need to create a dictionary, where the keys are the different cell types, and the values are a list with the 3 marker genes you selected to each cell type.

In [None]:
#create your dictionary here

In [None]:
#Fill the blank on the code
dp = sc.pl.dotplot(, , , return_fig=True, vmax=5, cmap='Reds')
dp.add_totals().show() #this line of code is just to add a barplot at the end of the dot plot to show the number of cells inside each cluster

### Cell type groups

In [None]:
#create here your list with the 3 marker genes you selected to each cell group


#### Spatial image

In [None]:
#write your code here
st.settings.set_figure_params(dpi=120)



#### UMAP

In [None]:
#write your code here


#### Dot plot

In [None]:
#create here you dictionary


In [None]:
#Fill the blank on the code


# Final remarks

In this tutorial we focus on the most basic steps to analyse spatial data, starting from a counts matrix. If you want to take this analisis on step forward you can look into *stlearn* package (https://stlearn.readthedocs.io/en/latest/index.html), where you can find more downstream analysis such as spatial trajectory, cell-cell interation, cell-type deconvolution, and other types of analysis and different plotting.

As a next step it would be inte resting to perform cell type deconvolution, to really have an idea of the cell types present in each spot. Another nice option would be to an annotated dataset with the same type of experiment and transfer the cell annotations to the spatial data.

It is also possible to combine spatial data with scRNA-seq data. This is usually done to infer where certain genes identified on the scRNA-seq data are expressed on the spatial map.