Author: Erno Hänninen

Created: 19.02.2023

Title: explore_hypothalamic_nuclei.ipynb

Description: 
- The information of hypothalamic nuclei are available only for Herb dataset -> identify and annotate those also from Zhou dataset
- Annotating clusters based on marker gene expression requires some manual work, therefore this notebook contains a lot of plotting
- In addition that the marker genes were explored the nuclei identification was guided using the hypothalamic nuclei which are annotated in Herb data
- Based on the observations for each identified nuclei a list was created containing the cells which are re-annotated as their respective nuclei. 

Procedure
- Read scvi integrated data
- subset all neurons and zhou neurons from the integrated data
- reclustering the neurons from zhou data
- explore marker gene expression and the nuclei annotated in herb data by visualizing the data
- Based on the observation for each identified nuclei create list where the corresponding cell identifiers are stored
- Based on these lists re-name the cells in the anndata object
- In herb data there were some nuclei that we couldn't identify from zhou data. If nuclei was not identified from zhou the cells of the nuclei were annotated as neurons in herb
- Save the cell type updated data for later use

List of non-standard modules:
- scanpy, matplotlib, pandas

Conda environment used:
- PYenv

Usage:
- The script was executed using Jupyter Notebook web interface. All the dependencies required by Jupyter are installed to PYenv Conda environment. See README file for further details

In [1]:
# Python packages
import scanpy as sc
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.pyplot import rc_context

In [None]:
# Import scvi integrated data
scvi_adata = sc.read("Data/scvi_adata.h5ad")
scvi_adata

In [7]:
# Subsetting the neurons
adata_neurons = adata[adata.obs["Cell_types_4"].isin([ "Neuron"])]
adata_neurons_zhou = adata_neurons[adata_neurons.obs["source"].isin([ "Zhou"])]

In [None]:
# Plotting the subtyes from herb on top of zhou data
ax = sc.pl.umap(adata_neurons,frameon=False,show=False,size=15)
sc.pl.umap(neurons_herb,color=["Cell_subpopulations"],frameon=False,ax=ax,size=15,show=False,)

In [10]:
# Reclustering subsetted data (new clusters are needed so we can annotate those)
sc.tl.pca(adata_neurons_zhou)
sc.pp.neighbors(adata_neurons_zhou)
sc.tl.leiden(adata_neurons_zhou, resolution=2.5)
sc.pl.umap(adata_neurons_zhou, color=["leiden"], wspace=0.45, legend_loc="on data", legend_fontsize="xx-small", legend_fontweight="normal")


In [18]:
# Leiden algorithm contains some randomness -> save the clustered adata to file for later use
adata_neurons.write("adata_scvi_neurons.h5ad")
adata_neurons_zhou.write("adata_neurons_zhou.h5ad")


In [2]:
# Read the data
#adata_neurons = sc.read("Data/adata_scvi_neurons.h5ad")
#adata_neurons_zhou = sc.read("Data/adata_neurons_zhou.h5ad")
#neurons_herb = adata_neurons[adata_neurons.obs["source"].isin(["Herb"])]


## PVH / PVN

## These markers helped me to locate PVN

In [None]:
# POU3F2 is from herb dotplot, others are from dropbox
sc.pl.umap(adata_neurons_zhou, color=["AVP", "OXT", "POU3F2", "SIM1", "OTP", "CRH", "TRH"])

In [None]:
# Plotting PVH on herb data
ax = sc.pl.umap(adata_neurons,frameon=False,show=False,size=15)
sc.pl.umap(neurons_herb[neurons_herb.obs["Cell_subpopulations"].isin(["PVH"])],color=["Cell_subpopulations"],
    frameon=False,ax=ax,size=15,show=False)

# Plotting clusters 18 and 30 from reclustered Zhou data that overlaps the PVH cluster
ax = sc.pl.umap(adata_neurons, frameon=False, show=False, size=15)
sc.pl.umap( adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["18", "30"])],color=["leiden"],
    frameon=False,ax=ax,size=15,show=False)

# We have an overlapping clusters (18 and 30)

## Plotting PVN markers on clusters 18 

In [None]:
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["18"])], color=["AVP", "OXT", "POU3F2", "SIM1", "OTP", "CRH", "TRH"], ncols=3)

## Plotting PVN markers on top of cluster 30

In [None]:
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["30"])], color=["AVP", "OXT", "POU3F2", "SIM1", "OTP", "CRH", "TRH", "SIM2", "PITX2"], ncols=3)

In [None]:
# PVN is OTP+, therefore reclustering cluster 18
pvn_cluster = adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["18"])]
sc.tl.pca(pvn_cluster)
sc.pp.neighbors(pvn_cluster)
sc.tl.leiden(pvn_cluster, resolution=0.15)
sc.pl.umap(pvn_cluster, color=["leiden", "OTP"])

In [11]:
# recluster cluster 0 is OTP+ -> store these cells to list
# Store the entire cluster 30 to list
pvn_cells = [list(pvn_cluster[pvn_cluster.obs["leiden"].isin(["0"])].obs.index), adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["30"])].obs.index]
pvn_cells = [item for sublist in pvn_cells for item in sublist]
print(len(pvn_cells))

1385


## VMH

In [None]:
# Locate VMH by plotting its markers
sc.pl.umap(adata_neurons_zhou, color=["ARPP21", "NR5A1", "SLIT3", "NPTX2", "SOX14", "SIX3", "FEZF1"])

In [None]:
# Plotting VMH from herb data
ax = sc.pl.umap(adata_neurons,frameon=False,show=False,size=15)
sc.pl.umap(neurons_herb[neurons_herb.obs["Cell_subpopulations"].isin(["VMH"])],color=["Cell_subpopulations"],
    frameon=False,ax=ax,size=15,show=False,)

# Cluster 7 from reclustered zhou data overlaps vmh
ax = sc.pl.umap(adata_neurons,frameon=False,show=False,size=15)
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["7"])],color=["leiden"],
    frameon=False, ax=ax, size=15,show=False)

## Plotting VMH markers on cluster 7

In [None]:
print("cluster 31")
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["7"])], color=["ARPP21", "NR5A1", "SLIT3", "NPTX2", "SOX14", "SIX3", "FEZF1", "POMC"],ncols=3, use_raw=True)

In [None]:
# POMC is shouldn't be highly expressed in VMH, therefore recluster the cluster 7
vmh_cluster = adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["7"])]
sc.tl.pca(vmh_cluster)
sc.pp.neighbors(vmh_cluster)
sc.tl.leiden(vmh_cluster, resolution=0.67)
sc.pl.umap(vmh_cluster, color="leiden")

In [None]:
# All but cluster 5 from the re-clustered data can be annotated as VMH
# Store the vmh cell id to list
vmh_cells = vmh_cluster[~vmh_cluster.obs["leiden"].isin(["5"])].obs["leiden"].index
len(vmh_cells)

# ARC

In [None]:
# Markers used to locate ARC
sc.pl.umap(adata_neurons_zhou, color=[ "NPY", "POMC", "TBX3", "OTP", "KISS1", "AGRP", "PRDM12","GHRH"], ncols=3)

In [None]:
# PLotting ARC neurons from Herb data
ax = sc.pl.umap(adata_neurons,frameon=False,show=False,size=15)
sc.pl.umap(neurons_herb[neurons_herb.obs["Cell_subpopulations"].isin(["ARC"])],
    color=["Cell_subpopulations"],frameon=False,ax=ax,size=15,show=False,)

## Clusters 10, 4, 5 from reclustered Zhou data overlaps Herb's ARC population 

## Neuronal cluster 10

In [None]:
print("ARC markers : ")
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["10"])], color=["NPY", "POMC", "TBX3", "OTP", "KISS1", "AGRP", "PRDM12", "GHRH"], ncols=3)
#print("Additional ARC markers from dropbox and from Herb dotplot:")
#sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["10"])], color=[ "SIX3", "SIX6", "NR5A2", "GAL", "HMX2", "RAX", "ISL1"], ncols=3)

## Neuronal cluster 4

In [None]:
print("ARC markers : ")
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["4"])], color=["NPY", "POMC", "TBX3", "OTP", "KISS1", "AGRP", "PRDM12", "GHRH"], ncols=3)
#print("Additional ARC markers from dropbox and from Herb dotplot:")
#sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["4"])], color=[ "SIX3", "SIX6", "NR5A2", "GAL", "HMX2", "RAX", "ISL1"], ncols=3)

## Neuronal cluster 5

In [None]:
print("ARC markers : ")
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["5"])], color=["NPY", "POMC", "TBX3", "OTP", "KISS1", "AGRP", "PRDM12", "GHRH"], ncols=3)
#print("Additional ARC markers from dropbox and from Herb dotplot:")
#sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["5"])], color=[ "SIX3", "SIX6", "NR5A2", "GAL", "HMX2", "RAX", "ISL1"], ncols=3)




In [26]:
# Cluster 10, 4, and 5 are all arc. Note we didn't distinguish the ARC progenitors in here
arc_cluster_10 = list(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["10"])].obs.index)
arc_cluster_4 = list(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["4"])].obs.index)
arc_cluster_5 = list(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["5"])].obs.index)

# Store the items to list of lists and flatten the list
arc_cells = [arc_cluster_10, arc_cluster_4, arc_cluster_5]
arc_cells = [item for sublist in arc_cells for item in sublist]
len(arc_cells)

5033

# LHA

In [None]:
# Locate LHA by plotting its markers
sc.pl.umap(adata_neurons_zhou,color=["LHX9","HCRT", "PDYN", "PCSK1","NPTX2", "RFX4","NEK7", "PLAGL1", "SCG2", "CBLN1", "VGF"])

In [None]:
# Plot LH cluster from Herb data
ax = sc.pl.umap(adata_neurons,frameon=False, show=False, size=15)
sc.pl.umap(neurons_herb[neurons_herb.obs["Cell_subpopulations"].isin(["LH"])],color=["Cell_subpopulations"],
    frameon=False,ax=ax,size=15,show=False)

# In zhou data there is no cluster fully overlapping LH from Herb data
# But clusters 1 and 37, which surrounds the LH cluster, express LH marker genes
ax = sc.pl.umap(adata_neurons,frameon=False,show=False,size=15)
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["1", "37"])],
    color=["leiden"],frameon=False,ax=ax,size=15,show=False)

## Plotting LH markers on Zhou clusters 1 and 37

In [None]:
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["1", "37"])], color=["leiden","LHX9","HCRT", "PDYN", "PCSK1","NPTX2", "RFX4","NEK7", "PLAGL1", "SCG2", "CBLN1", "VGF"], legend_loc="on data", use_raw=True, legend_fontsize="xx-small", legend_fontweight="normal", ncols=3)



In [None]:
# Cluster 37 is LHA, and most likely also cluster 1 contains LHA cells -> recluster cluster 1
lha_cluster = adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["1"])]
sc.tl.pca(lha_cluster)
sc.pp.neighbors(lha_cluster)
sc.tl.leiden(lha_cluster, resolution=0.52)
sc.pl.umap(lha_cluster, color="leiden", size=5)

In [None]:
sc.pl.umap(lha_cluster[lha_cluster.obs["leiden"].isin(["0"])],color=["HCRT", "PDYN", "PCSK1"],frameon=False,size=25)
#sc.pl.umap(lha_cluster[lha_cluster.obs["leiden"].isin(["1"])],color=["HCRT", "PDYN", "PCSK1"],frameon=False,size=25)

In [37]:
# In the re-clustered data cluster 0 seems to be LHA as well -> store cells from original cluster 37 and cluster 0 from the reclustered data to list
lha_recluster_0 = lha_cluster[lha_cluster.obs["leiden"].isin(["0"])].obs.index
lha_cluster_37 = adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["37"])].obs.index
# Store the items to list of lists and flatten the list
lha_cells = [lha_recluster_0, lha_cluster_37]
lha_cells = [item for sublist in lha_cells for item in sublist]
len(lha_cells)

2569

## TM nucleus (Tuberomammillary Terminal )

In [None]:
ax = sc.pl.umap(adata_neurons,frameon=False,show=False,size=15)
sc.pl.umap(neurons_herb[neurons_herb.obs["Cell_subpopulations"].isin(["TM"])],color=["LEPR"],
    frameon=False,ax=ax,size=15,show=False,title="TM on Herb neurons")

ax = sc.pl.umap(adata_neurons,frameon=False,show=False,size=15)
sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["14"])],color=["HDC"],
    title=["Plotting HDC expression on Zhou cluster 14"],frameon=False,ax=ax,size=10,show=False)

sc.pl.umap(adata_neurons_zhou, color=["HDC", "TBX3", "LEPR"])

In [40]:
# Cluster 14 is TM
#sc.pl.umap(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["14"])],color=["HDC"])
tm_cells = list(adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["14"])].obs.index)
print(len(tm_cells))

1022


# SMN and MN nuclei ( Supramammillary Nucleus, Mammillary Nucleus)

In [None]:
# MN is also included as SMN doesn't have many markers that are positive for SMN but negative for MN
ax = sc.pl.umap( adata_neurons, frameon=False, show=False, size=15)
sc.pl.umap( neurons_herb[neurons_herb.obs["Cell_subpopulations"].isin(["SMN", "MN"])],
    color=["Cell_subpopulations"], frameon=False,ax=ax, size=15,show=False)

## Plotting SMN and MN markers on Zhou neurons

In [None]:
# "FOXB1", "LHX1" are markers for MN
# PITX2 is marker for bot

sc.pl.umap(
    adata_neurons_zhou,
    color=["LMX1A", "BARHL1","IRX3","FOXA1", "PITX2","FOXB1", "LHX1"],
    frameon=False,
    size=20,
   legend_loc="on data", legend_fontsize="xx-small", legend_fontweight="normal", ncols=4
    
)

In [None]:
# It seems that cluster 20 contains both smn and mn -> reclustering
smn_mn_cluster = adata_neurons_zhou[adata_neurons_zhou.obs["leiden"].isin(["20"])]
sc.tl.pca(smn_mn_cluster)
sc.pp.neighbors(smn_mn_cluster)
sc.tl.leiden(smn_mn_cluster, resolution=0.15)
sc.pl.umap(smn_mn_cluster, color="leiden", size=25)

In [None]:
# Based on LMX1A, BARHL1, IRX3 and FOXA1 expression we are able to discriminate SMN 
sc.pl.umap(smn_mn_cluster, color=["LMX1A", "BARHL1", "IRX3", "FOXA1"], size=25)

# MN can be identified based on LHX1 and FOXB1
sc.pl.umap(smn_mn_cluster, color=["LHX1", "FOXB1"], size=40)


In [None]:
# Store MN and SMN cells to list
mn_cells = list(smn_mn_cluster[smn_mn_cluster.obs["leiden"].isin(["0"])].obs.index)
smn_cells = list(smn_mn_cluster[smn_mn_cluster.obs["leiden"].isin(["1"])].obs.index)

In [71]:
# Update subtypes to adata
scvi_adata.obs["Cell_subpopulations_updated"] = scvi_adata.obs["Cell_subpopulations"]
zhou_subtypes = [pvn_cells, vmh_cells, arc_cells, lha_cells, tm_cells, mn_cells, smn_cells]
subtypes = ["PVH", "VMH", "ARC", "LH", "TM", "MN", "SMN"]
for i in range(len(zhou_subtypes)):
    scvi_adata.obs.loc[zhou_subtypes[i], "Cell_subpopulations_updated"] = subtypes[i]
    
# All subtypes from Herb data which was not identified (ZI, ID, SCN, NA, Intermediates) are renamed as neurons
# renam PVH to PVN and LH to LHA
scvi_adata.obs['Cell_subpopulations_updated'] = scvi_adata.obs['Cell_subpopulations_updated'].replace({'ZI': 'Neuron', 'ID': 'Neuron', 
                                'SCN': 'Neuron', 'NA': 'Neuron', 'Intermediates': 'Neuron', "PVH": "PVN", "LH":"LHA"})

In [None]:
scvi_adata.obs["Cell_subpopulations_updated"].value_counts()

In [None]:
# Plotting
sc.pl.umap(scvi_adata[scvi_adata.obs["Cell_types_4"].isin([ "Neuron"])], color="Cell_subpopulations_updated")
temp_neurons = scvi_adata[scvi_adata.obs["Cell_types_4"].isin([ "Neuron"])]
sc.pl.umap(temp_neurons[~temp_neurons.obs["Cell_subpopulations_updated"].isin(["Neuron"])], color=["Cell_subpopulations_updated", "source", "PITX2", "LMX1A", "HDC", "HCRT", "SIM1", "FEZF1", "NR5A1", "TBX3", "GHRH"], size=5)
temp_neurons_2 = temp_neurons[~temp_neurons.obs["Cell_subpopulations_updated"].isin(["Neuron"])]
sc.pl.umap(temp_neurons_2[temp_neurons_2.obs["source"].isin(["Herb"])], color=["Cell_subpopulations_updated"])
sc.pl.umap(temp_neurons_2[temp_neurons_2.obs["source"].isin(["Zhou"])], color=["Cell_subpopulations_updated"], size=7)



In [81]:
# Save adata for later use
scvi_adata.write("Data/scvi_subtypes.h5ad")