# Make celltype metadata

The purpose of this notebook is to map the id (integer number) of each cluster to a biological name. This was done by looking at [Figure 5D](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481139/figure/F5/) in the paper.

[![Figure 5](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481139/bin/nihms687993f5.jpg)](https://www.ncbi.nlm.nih.gov/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=4481139_nihms687993f5.jpg)

In [2]:
import os

import pandas as pd

import common

# Assign notebook and folder names
notebook_name = '02_make_celltype_metadata'
figure_folder = os.path.join(common.FIGURE_FOLDER, notebook_name)
data_folder = os.path.join(common.DATA_FOLDER, notebook_name)
print('Figure folder:', figure_folder)
print('Data folder:', data_folder)

# Make the folders
! mkdir -p $figure_folder
! mkdir -p $data_folder

Figure folder: ../figures/02_make_celltype_metadata
Data folder: ../data/02_make_celltype_metadata


## Cluster IDs (numbers) to the type of cell from the paper

Hardcoded, could have errors

In [3]:
cluster_name_to_ids = {'Horizontal cells': 1, 'Retinal ganglion cells': 2,
                       'Amacrine cells': range(3, 24), "Rods": 24,
                       'Cones': 25, 'Bipolar cells': range(26, 34),
                       'Muller glia': 34, 'Astrocytes': 35,
                       'Fibroblasts': 36, 'Vascular endothelium': 37,
                       'Pericytes': 38, 'Microglia': 39}

In [4]:
import itertools


In [5]:
pairs = [zip(v, itertools.cycle([k])) if not isinstance(v, int) else [(v, k)] 
     for k, v in cluster_name_to_ids.items()]
pairs = list(itertools.chain(*pairs))
pairs

[(1, 'Horizontal cells'),
 (2, 'Retinal ganglion cells'),
 (3, 'Amacrine cells'),
 (4, 'Amacrine cells'),
 (5, 'Amacrine cells'),
 (6, 'Amacrine cells'),
 (7, 'Amacrine cells'),
 (8, 'Amacrine cells'),
 (9, 'Amacrine cells'),
 (10, 'Amacrine cells'),
 (11, 'Amacrine cells'),
 (12, 'Amacrine cells'),
 (13, 'Amacrine cells'),
 (14, 'Amacrine cells'),
 (15, 'Amacrine cells'),
 (16, 'Amacrine cells'),
 (17, 'Amacrine cells'),
 (18, 'Amacrine cells'),
 (19, 'Amacrine cells'),
 (20, 'Amacrine cells'),
 (21, 'Amacrine cells'),
 (22, 'Amacrine cells'),
 (23, 'Amacrine cells'),
 (24, 'Rods'),
 (25, 'Cones'),
 (26, 'Bipolar cells'),
 (27, 'Bipolar cells'),
 (28, 'Bipolar cells'),
 (29, 'Bipolar cells'),
 (30, 'Bipolar cells'),
 (31, 'Bipolar cells'),
 (32, 'Bipolar cells'),
 (33, 'Bipolar cells'),
 (34, 'Muller glia'),
 (35, 'Astrocytes'),
 (36, 'Fibroblasts'),
 (37, 'Vascular endothelium'),
 (38, 'Pericytes'),
 (39, 'Microglia')]

In [6]:
celltypes = [name for i, name in pairs]
ids = ['cluster_' + str(i).zfill(2) for i, name in pairs]

In [15]:
cluster_names = pd.Series(celltypes, index=ids, name='cluster_name')
cluster_names

cluster_01          Horizontal cells
cluster_02    Retinal ganglion cells
cluster_03            Amacrine cells
cluster_04            Amacrine cells
cluster_05            Amacrine cells
cluster_06            Amacrine cells
cluster_07            Amacrine cells
cluster_08            Amacrine cells
cluster_09            Amacrine cells
cluster_10            Amacrine cells
cluster_11            Amacrine cells
cluster_12            Amacrine cells
cluster_13            Amacrine cells
cluster_14            Amacrine cells
cluster_15            Amacrine cells
cluster_16            Amacrine cells
cluster_17            Amacrine cells
cluster_18            Amacrine cells
cluster_19            Amacrine cells
cluster_20            Amacrine cells
cluster_21            Amacrine cells
cluster_22            Amacrine cells
cluster_23            Amacrine cells
cluster_24                      Rods
cluster_25                     Cones
cluster_26             Bipolar cells
cluster_27             Bipolar cells
c

In [16]:
csv = os.path.join(data_folder, 'cluster_ids_to_celltypes.csv')
csv

'../data/02_make_celltype_metadata/cluster_ids_to_celltypes.csv'

In [17]:
cluster_names.to_csv(csv, index=True, index_label='cluster_id', 
                     header=True)
! head $csv

cluster_id,cluster_name
cluster_01,Horizontal cells
cluster_02,Retinal ganglion cells
cluster_03,Amacrine cells
cluster_04,Amacrine cells
cluster_05,Amacrine cells
cluster_06,Amacrine cells
cluster_07,Amacrine cells
cluster_08,Amacrine cells
cluster_09,Amacrine cells
