<div style="border: 2px solid #8A9AD0; margin: 1em 0.2em; padding: 0.5em;">

# Inferring Trajectories using Scanpy (Python)

by [Wendi Bacon](https://training.galaxyproject.org/hall-of-fame/nomadscientist/), [Julia Jakiela](https://training.galaxyproject.org/hall-of-fame/wee-snufkin/), [Mehmet Tekman](https://training.galaxyproject.org/hall-of-fame/mtekman/)

CC-BY licensed content from the [Galaxy Training Network](https://training.galaxyproject.org/)

**Objectives**

- How can I infer lineage relationships between single cells based on their RNA, without a time series?

**Objectives**

- Execute multiple plotting methods designed to maintain lineage relationships between cells
- Interpret these plots

**Time Estimation: 2H**
</div>


<h1 id="run-the-tutorial">Run the tutorial!</h1>
<p>From now on, you can view this tutorial in the Jupyter notebook, which will allow you to read the material and simultaneously execute the code cells! You may have to change certain numbers in the code blocks, so do read carefully. The tutorial is adapted from the <a href="https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html">Scanpy Trajectory inference tutorial</a>.</p>
<h2 id="install-modules--activate-them">Install modules &amp; activate them</h2>


In [None]:
pip install scanpy

In [None]:
pip install fa2

In [None]:
pip install python-igraph

In [None]:
pip install louvain

In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as pl
from matplotlib import rcParams
import scanpy as sc

<h2 id="import-dataset">Import dataset</h2>
<p>You can now import files from your Galaxy history directly using the following code. This will depend on what number in your history the final annotated object is. If your object is dataset #2 in your history, then you import it as following:</p>


In [None]:
thymusobject = get(2)

<p>You now you need to read it in as an h5ad object.</p>


In [None]:
adata = sc.read_h5ad(thymusobject)

<h2 id="draw-force-directed-graph">Draw force-directed graph</h2>
<p>First, we will calculate a <a href="https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.draw_graph.html">force-directed graph</a>, as an alternate to tSNE, which will likely work better for trajectory analysis.</p>


In [None]:
sc.tl.draw_graph(adata)

<p>And now time to plot it!
<em>Note: We’re saving to <strong>png</strong>, but you can also choose pdf</em></p>


In [None]:
sc.pl.draw_graph(adata, color='cell_type', legend_loc='on data', save = 'Plot1.png')

<figure id="figure-1" style="max-width: 90%; margin:auto;"><img src="../../images/scrna-casestudy/draw_graph_faPlot1.png" alt="Plot1-Force-Directed Graph. " width="366" height="265" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 1</strong>:</span> Plot1-Force-Directed Graph</figcaption></figure>
<p>Well now this is exciting! Our DP-late is more clearly separating, and we might also suppose that DP-M1, DP-M2, and DP-M3 are actually earlier on in the differentiation towards mature T-cells. And we’re only just getting started!</p>
<h2 id="diffusion-maps">Diffusion maps</h2>
<p>We’ll now perform an <em>optional step</em>, that basically takes the place of the PCA. Instead of using PCs, we can use <a href="https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.diffmap.html">diffusion maps</a>.</p>


In [None]:
sc.tl.diffmap(adata)

<p>Now that we have our diffusion map, we need to re-calculate neighbors using the diffusion map instead of the PCs. Then we re-draw and plot a new force directed graph using the new neighbors.</p>


In [None]:
sc.pp.neighbors(adata, n_neighbors=15, use_rep='X_diffmap')
sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='cell_type', legend_loc='on data', save = 'Plot2.png')

<figure id="figure-2" style="max-width: 90%; margin:auto;"><img src="../../images/scrna-casestudy/draw_graph_faPlot2.png" alt="Diffusion Map. " width="366" height="265" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 2</strong>:</span> Diffusion Map</figcaption></figure>
<p>Oh dear! This doesn’t look great. Maybe the DP-M4 cells are a whole other trajectory? That doesn’t seem right. Saying that, this spreads out our T-mature cells, which makes a lot more sense when it comes to T-cell biology (we expect T-cells to differentiate into two types of T-cells, Cd8+Cd4- and Cd4+Cd8-). If you wanted to, you could also re-cluster your cells (since you’ve changed the neighborhood graph on which the clusterisation depends). You could use this:
<code class="language-plaintext highlighter-rouge">sc.tl.louvain(adata, resolution=0.6)</code>
However, we tried that, and it called far too many clusters given the depth of sequencing in this dataset. Let’s stick with our known cell types and move from there.</p>
<h2 id="working-in-a-group-decision-time">Working in a group? Decision-time!</h2>
<p>If you are working in a group, you can now divide up a decision here with one <em>control</em> and the rest can vary numbers so that you can compare results throughout the tutorials.</p>
<ul>
<li>Control
<ul>
<li>Go straight to the PAGA section</li>
</ul>
</li>
<li>Everyone else:
<ul>
<li>you could re-call clusters <code style="color: inherit">sc.tl.louvain(adata, resolution=0.6</code> or use other resolutions! (Tip, go low!)
<ul>
<li>Please note that in this case, you will want to change the PAGA step <code style="color: inherit">sc.pl.paga</code> to group by <code style="color: inherit">louvain</code> rather than <code style="color: inherit">cell_type</code>. You can certainly still plot both, we only didn’t because with using our old Louvain calls, the cell_type and louvain categories are identical.</li>
</ul>
</li>
<li>you could undo the diffusion map step by running the following
<code style="color: inherit">sc.pp.neighbors(adata, n_neighbors=15, use_rep='X_pca')</code>
<code style="color: inherit">sc.tl.draw_graph(adata)</code></li>
<li>you could also change the number of neighbors used in the pp.neighbors step (this is the same as the Galaxy tool <strong>Scanpy ComputeGraph</strong></li>
</ul>
</li>
<li>Everyone else: You will want to compare FREQUENTLY with your control team member.</li>
</ul>
<h2 id="paga">PAGA</h2>
<p><a href="https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.paga.html">PAGA</a> is used to generalise relationships between groups, or likely clusters, in this case.</p>


In [None]:
sc.tl.paga(adata, groups='cell_type')

<p>Now we want to plot our PAGA, but we might also be interested in colouring our plot by genes as well. In this case, remembering that we are dutifully counting our genes by their EnsemblIDs rather than Symbols (which do not exist for all EnsemblIDs), we have to look up our gene of interest (CD4, CD8a) and plot the corresponding IDs.</p>


In [None]:
sc.pl.paga(adata, color=['cell_type', 'ENSMUSG00000023274', 'ENSMUSG00000053977'], title=['Cell type', 'CD4', 'Cd8a'], save = 'Plot4.png')

<figure id="figure-3" style="max-width: 90%; margin:auto;"><img src="../../images/scrna-casestudy/pagaPlot4.png" alt="PAGA. " width="1386" height="264" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 3</strong>:</span> PAGA</figcaption></figure>
<p>Well now that is interesting! This analysis would find that DP-M1 and DP-M4 are both driving towards differentiation, which is not something we had necessarily been able to specify before by just looking at our cluster graphs or applying our biological knowledge.</p>
<h2 id="re-draw-force-directed-graph">Re-draw force-directed graph</h2>
<p>Force directed graphs can be initialised randomly, or we can prod it in the right direction. We’ll prod it with our PAGA calculations. Note that you could also try prodding it with tSNE or UMAP. A lot of these tools can be used on top of each other or with each other in different ways, this tutorial is just one example. Similarly, you could be using any <strong>obs</strong> information for grouping, so could do this for <em>louvain</em> or <em>cell_type</em> for instance.</p>


In [None]:
sc.tl.draw_graph(adata, init_pos='paga')

sc.pl.draw_graph(adata, color=['cell_type'], title=['Cluster'], legend_loc='on data', save = 'Plot5.png')
sc.pl.draw_graph(adata, color=['genotype'], title=['Genotype'], save = 'Plot6.png')
sc.pl.draw_graph(adata, color=['ENSMUSG00000023274', 'ENSMUSG00000053977'], title=['CD4', 'Cd8a'], save = 'Plot7.png')

<figure id="figure-4" style="max-width: 90%; margin:auto;"><img src="../../images/scrna-casestudy/draw_graph_faPlot5.png" alt="Force-Directed + PAGA - Cell type. " width="366" height="265" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 4</strong>:</span> Force-Directed + PAGA - Cell type</figcaption></figure>
<figure id="figure-5" style="max-width: 90%; margin:auto;"><img src="../../images/scrna-casestudy/draw_graph_faPlot6.png" alt="Force-Directed + PAGA - Genotype. " width="452" height="265" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 5</strong>:</span> Force-Directed + PAGA - Genotype</figcaption></figure>
<figure id="figure-6" style="max-width: 90%; margin:auto;"><img src="../../images/scrna-casestudy/draw_graph_faPlot7.png" alt="Force-Directed + PAGA - Markers. " width="827" height="269" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 6</strong>:</span> Force-Directed + PAGA - Markers</figcaption></figure>
<p><strong>Note</strong> - we are aware that something about these graphs has gotten a bit odd in the recent Scanpy updates. Watch this space for a fix!</p>
<p>Well aren’t those charts interesting! Using the diffusion map to drive the force-directed graph, we see correct ordering of our cells (from DN to DP to T-mature, which was lost with the diffusion map alone) as well as two apparent branches leaving the mature T-cell population, which is what we’d biologically expect. In terms of our experiment, we’re seeing a clear trajectory issue whereby the knockout cells are not found along the trajectory into T-mature (which, well, we kind of already figured out with just the cluster analysis, but we can feel even more confident about our results!) More importantly, we can see the T-mature population dividing itself, which we did not see in the clustering via UMAP/tSNE alone, and we can verify that as the leftmost branch has CD4 but the rightmost branch does not. This is suggesting our branchpoint from to CD4+ and CD8+ single positive cells. Exciting! However, it is important to note, that the branches there are quite small and sparsely populated, which can indicate artifact branches (i.e. trajectory analysis does its best to find branches, particularly diffusion map, so you can pretty easily force branches to appear even if they are not biologically real!). However, to be frank, we were surprised not to find this clearer in the main cluster map, as we know that the T-cells should diverge at that point, so if anything this is a relief that our data is believable!</p>
<p>And now, just for fun, we can compare the scatter graph with our PAGA side by side.</p>


In [None]:
sc.pl.paga_compare(
    adata, threshold=0.03, title='', right_margin=0.2, size=10, edge_width_scale=0.5,
    legend_fontsize=12, fontsize=12, frameon=False, edges=True, save=True)

<figure id="figure-7" style="max-width: 90%; margin:auto;"><img src="../../images/scrna-casestudy/paga_compare.png" alt="PAGA Compare. " width="819" height="244" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 7</strong>:</span> PAGA Compare</figcaption></figure>
<p><strong>Note</strong> - we are aware that something about these graphs has gotten a bit odd in the recent Scanpy updates. Watch this space for a fix!</p>
<h2 id="diffusion-pseudotime">Diffusion pseudotime</h2>
<p>We know that our cells are initialising at DN. We can use feed that information into our algorithms to then calculate a trajectory.</p>
<p>First, let’s name our ‘root’.</p>


In [None]:
adata.uns['iroot'] = np.flatnonzero(adata.obs['cell_type']  == 'DN')[0]

<h2 id="working-in-a-group-decision-time">Working in a group? Decision-time!</h2>
<p>If you called new clusters using the louvain algorithm, you might want to choose one of those clusters to be your root cell instead, so change the <code style="color: inherit">cell_type</code> above for <code style="color: inherit">louvain</code> and then name the cluster number. Use the plots you created to help you pick the number!</p>
<p>Onto the <a href="https://scanpy.readthedocs.io/en/stable/api/scanpy.tl.dpt.html">diffusion pseudotime</a>, where we are infer multiple time points within the same piece of data!</p>


In [None]:
sc.tl.dpt(adata)
sc.pl.draw_graph(adata, color=['cell_type', 'dpt_pseudotime'], legend_loc='on data', save = 'Plot8.png')

<figure id="figure-8" style="max-width: 90%; margin:auto;"><img src="../../images/scrna-casestudy/draw_graph_faPlot8.png" alt="Force-Directed + Pseudotime. " width="819" height="269" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 8</strong>:</span> Force-Directed + Pseudotime</figcaption></figure>
<p>This is nice, as it supports our conclusions thus far on the trajectory of the T-cell differentiation. With single-cell, the more ways you can prove to yourself what you’re seeing is real, the better! If we did not find consistent results, we would need to delve in further to see if the algorithm (not all algorithms fit all data!) or the biology.</p>
<p>Where might we go from here? We might consider playing with our louvain resolutions, to get the two branches to be called as different clusters, and then comparing them to each other for gene differences or genotype differences. We might also use different objects (for instance, what if we regressed out cell cycle genes?) and see if that changes the results. Perhaps we would eliminate the DN double-branch input. Or perhaps that’s real, and we should investigate that. What would you do?</p>
<h2 id="working-in-a-group-the-finale">Working in a group? The finale!</h2>
<p>Look at each others images! How do yours differ, what decisions were made? Previously, when calling clusters in the ‘Filter, Plot and Explore Single-cell RNA-seq Data’, the interpretation at the end is largely consistent, no matter what decisions are made throughout (mostly!). Is this the case with your trajectory analyses? You may find that it is not, which is why pseudotime analysis even more crucially depends on your understanding of the underlying biology (we have to choose the root cells, for instance, or recognise that DN cells should not be found in the middle of the DPs) as well as choosing the right analysis. That’s why it is a huge field! With analysing scRNA-seq data, it’s almost like you need to know about 75% of your data and make sure your analysis shows that, for you to then identify the 25% new information.</p>
<h1 id="export-your-data-figures-and-notebook">Export your data, figures, and notebook</h1>
<p>It’s now time to export your data! First, we need to get it Jupyter to see it as a file.</p>


In [None]:
adata.write('Trajectorythymus.h5ad')

<p>Now you can export it, as well as all your lovely plots! If you go into the <em>figures</em> folder at the left, you’ll see your lovely plots and can choose which ones to export. The following code will push them into your galaxy history. You can also directly download them onto your computer from the file window at the left.</p>


In [None]:
put("Trajectorythymus.h5ad")

In [None]:
put("figures/draw_graph_faPlot1.png")
put("figures/draw_graph_faPlot2.png")
put("figures/draw_graph_faPlot5.png")
put("figures/draw_graph_faPlot6.png")
put("figures/draw_graph_faPlot7.png")
put("figures/draw_graph_faPlot8.png")
put("figures/paga_compare.pdf")
put("figures/pagaPlot4.png")

<p>The cell below will only work if you haven’t changed the name of the notebook. If you renamed it, simply type its new name in the parenthesis.</p>


In [None]:
put("single-cell-scrna-case_JUPYTER-trajectories.ipynb")

<p>This may take a moment, so go check your Galaxy history to make sure your images, anndata object, and notebook (.ipynb) have all made it back into your Galaxy history. Once they are all there, you can exit this browser and return to the Galaxy tutorial!</p>
<p>If things have gone wrong, you can also download this <a href="https://zenodo.org/record/7075718/files/Trajectories_Answer_Key.ipynb">answer key tutorial</a>.</p>
<h1 id="citation">Citation</h1>
<p>Please note, this is largely based on the trajectories tutorial found on the Scanpy site itself <a href="https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html">https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html</a>.</p>
<h1 id="after-jupyter">After Jupyter</h1>
<p>Congratulations! You’ve made it through Jupyter!</p>
<blockquote class="hands_on" style="border: 2px solid #dfe5f9; margin: 1em 0.2em">
<div class="box-title hands-on-title" id="hands-on-closing-jupyterlab"><i class="fas fa-pencil-alt" aria-hidden="true" ></i> Hands-on: Closing JupyterLab</div>
<ol>
<li>Click <strong>User</strong>: <strong>Active Interactive Tools</strong></li>
<li>Tick the box of your Jupyter Interactive Tool, and click <strong>Stop</strong></li>
</ol>
</blockquote>
<p>If you want to run this notebook again, or share it with others, it now exists in your history. You can use this ‘finished’ version just the same way as you downloaded the directions file and uploaded it into the Jupyter environment.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Congratulations! You’ve made it to the end! You might be interested in the <a href="https://usegalaxy.eu/u/wendi.bacon.training/h/cs4inferring-trajectories-using-python-in-galaxyanswer-key">Answer Key History</a> or the <a href="https://zenodo.org/record/7054806/files/Trajectories_Answer_Key.ipynb?download=1">Answer Key Jupyter Notebook</a>.</p>
<p>In this tutorial, you moved from called clusters to inferred relationships and trajectories using pseudotime analysis. You found an alternative to PCA (diffusion map), an alternative to tSNE (force-directed graph), a means of identifying cluster relationships (PAGA), and a metric for pseudotime (diffusion pseudotime) to identify early and late cells. If you were working in a group, you found that such analysis is slightly more sensitive to your decisions than the simpler filtering/plotting/clustering is. We are inferring and assuming relationships and time, so that makes sense!</p>
<p>To discuss with like-minded scientists, join our <a href="https://gitter.im/Galaxy-Training-Network/galaxy-single-cell?utm_source=badge&amp;utm_medium=badge&amp;utm_campaign=pr-badge">Gitter</a> channel for all things Galaxy-single cell!</p>


# Key Points

- Trajectory analysis is less robust than pure plotting methods, as such 'inferred relationships' are a bigger mathematical leap
- As always with single-cell analysis, you must know enough biology to deduce if your analysis is reasonable, before exploring or deducing novel insight

# Congratulations on successfully completing this tutorial!

Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/single-cell/tutorials/scrna-case_JUPYTER-trajectories/tutorial.html#feedback) and check there for further resources!
