# Data Loader for the Cell atlas of aqueous humor outflow

Paper link: https://www.pnas.org/doi/10.1073/pnas.2001250117

For data links use these or go to the subseries section on https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi with  GSE146188

Mouse
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE146nnn/GSE146186/suppl/GSE146186_Mouse_count_matrix.csv.gz 

Pig
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE146nnn/GSE146187/suppl/GSE146187_Pig_count_matrix.csv.gz

Human
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE148nnn/GSE148371/suppl/GSE148371_Human_count_matrix.csv.gz

MacaF
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE148nnn/GSE148373/suppl/GSE148373_MacaF_count_matrix.csv.gz

MacaM
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE148nnn/GSE148374/suppl/GSE148374_MacaqueM_count_matrix.csv.gz

Get each with command `wget -r '...' -O ...csv.gz`

In [3]:
!wget -r 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE146nnn/GSE146186/suppl/GSE146186_Mouse_count_matrix.csv.gz' -O data/mouse.csv.gz
!wget -r 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE146nnn/GSE146187/suppl/GSE146187_Pig_count_matrix.csv.gz' -O data/pig.csv.gz
!wget -r 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE148nnn/GSE148371/suppl/GSE148371_Human_count_matrix.csv.gz' -O data/human.csv.gz
!wget -r 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE148nnn/GSE148373/suppl/GSE148373_MacaF_count_matrix.csv.gz' -O data/macaF.csv.gz
!wget -r 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE148nnn/GSE148374/suppl/GSE148374_MacaqueM_count_matrix.csv.gz' -O data/macaM.csv.gz

will be placed in the single file you specified.

--2022-11-14 11:34:44--  https://ftp.ncbi.nlm.nih.gov/geo/series/GSE146nnn/GSE146186/suppl/GSE146186_Mouse_count_matrix.csv.gz
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.228, 165.112.9.230, 2607:f220:41e:250::7, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|165.112.9.228|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13462876 (13M) [application/x-gzip]
Saving to: ‘data/mouse.csv.gz’


2022-11-14 11:34:46 (6.39 MB/s) - ‘data/mouse.csv.gz’ saved [13462876/13462876]

FINISHED --2022-11-14 11:34:46--
Total wall clock time: 2.7s
Downloaded: 1 files, 13M in 2.0s (6.39 MB/s)
will be placed in the single file you specified.

--2022-11-14 11:34:47--  https://ftp.ncbi.nlm.nih.gov/geo/series/GSE146nnn/GSE146187/suppl/GSE146187_Pig_count_matrix.csv.gz
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.228, 165.112.9.230, 2607:f220:41e:250::7, ...
Connecting to ftp.nc

In [4]:
!gunzip ./data/*

# Metadata file

The metadata file was downloaded from the single cell portal at: https://singlecell.broadinstitute.org/single_cell/study/SCP780/cell-atlas-of-aqueous-humor-outflow-pathways-in-eyes-of-humans-and-four-model-species-provides-insights-into-glaucoma-pathogenesis#study-summary

We have included it in the repo at `./data/all_five_species_metafile.csv`.

# Convert to Scanpy h5ads

In [10]:
import scanpy as sc
import pandas as pd
import numpy as np

In [16]:
pig_path = "data/pig.csv"
macaM_path = "data/macaM.csv"
macaF_path = "data/macaF.csv"
human_path = "data/human.csv"
mouse_path = "data/mouse.csv"

In [17]:
meta_path = "data/all_five_species_metafile.csv"
meta = pd.read_csv(meta_path)
meta = meta.drop(0)

In [18]:
meta.head()

Unnamed: 0,NAME,Cluster
1,H1TMS1_AAACCTGAGCGTTCCG-1,Macrophage
2,H1TMS1_AAACCTGAGGTAGCTG-1,SchwannCell-nmy
3,H1TMS1_AAACCTGAGTTGTAGA-1,Pericyte
4,H1TMS1_AAACCTGCAGCTGTAT-1,SchwannCell-nmy
5,H1TMS1_AAACCTGCAGGAATCG-1,BeamCella


### Pig

In [19]:
pig_ad = sc.read(pig_path).T

In [20]:
pig_ad.obs = meta.set_index("NAME").loc[pig_ad.obs_names]

In [21]:
pig_ad.X

array([[0., 1., 1., ..., 0., 0., 0.],
       [0., 2., 0., ..., 0., 0., 0.],
       [0., 4., 0., ..., 0., 0., 0.],
       ...,
       [0., 2., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.]], dtype=float32)

In [22]:
pig_ad.obs["cell_type"] = pig_ad.obs["Cluster"]

In [23]:
pig_ad.write(pig_path.replace("csv", "h5ad"))

### Mouse

In [24]:
mouse_ad = sc.read(mouse_path).T

In [25]:
mouse_ad.obs = meta.set_index("NAME").loc[mouse_ad.obs_names]

In [26]:
mouse_ad.X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

In [27]:
mouse_ad.obs["cell_type"] = mouse_ad.obs["Cluster"]

In [28]:
mouse_ad.write(mouse_path.replace("csv", "h5ad"))

### Human

In [29]:
human_ad = sc.read(human_path).T

In [30]:
human_ad.obs = meta.set_index("NAME").loc[human_ad.obs_names]

In [31]:
human_ad.X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 4., 0.]], dtype=float32)

In [32]:
human_ad.obs["cell_type"] = human_ad.obs["Cluster"]

In [33]:
human_ad.write(human_path.replace("csv", "h5ad"))

### MacaM

In [34]:
macaM_ad = sc.read(macaM_path).T

In [35]:
macaM_ad.obs = meta.set_index("NAME").loc[macaM_ad.obs_names]

In [36]:
macaM_ad.X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

In [37]:
macaM_ad.obs["cell_type"] = macaM_ad.obs["Cluster"]

In [38]:
macaM_ad.write(macaM_path.replace("csv", "h5ad"))

### MacaF

In [39]:
macaF_ad = sc.read(macaF_path).T

In [40]:
macaF_ad.obs = meta.set_index("NAME").loc[macaF_ad.obs_names]

In [41]:
macaF_ad.X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

In [42]:
macaF_ad.obs["cell_type"] = macaF_ad.obs["Cluster"]

In [43]:
macaF_ad.write(macaF_path.replace("csv", "h5ad"))

In [44]:
macaF_ad.obs

Unnamed: 0,Cluster,cell_type
MacaCWLS1_AAACCTGAGCTAACAA-1,5_Schwann cell,5_Schwann cell
MacaCWLS1_AAACCTGAGTGATCGG-1,2_BeamA,2_BeamA
MacaCWLS1_AAACCTGCACCAGGCT-1,2_BeamA,2_BeamA
MacaCWLS1_AAACCTGCATGGGAAC-1,15_Beam X,15_Beam X
MacaCWLS1_AAACCTGTCGGAAACG-1,1_Corneal epithelium,1_Corneal epithelium
...,...,...
MacaTMRimLS1_TTTGTTGGTGAACCGA-1,17_Corneal epithelium,17_Corneal epithelium
MacaTMRimLS1_TTTGTTGTCAACCTTT-1,1_Corneal epithelium,1_Corneal epithelium
MacaTMRimLS1_TTTGTTGTCACGGGCT-1,1_Corneal epithelium,1_Corneal epithelium
MacaTMRimLS1_TTTGTTGTCGAGTTGT-1,3_Macrophage,3_Macrophage


### Double Check All

In [45]:
display(pig_ad)
display(mouse_ad)
display(human_ad)
display(macaM_ad)
display(macaF_ad)

AnnData object with n_obs × n_vars = 6709 × 25880
    obs: 'Cluster', 'cell_type'

AnnData object with n_obs × n_vars = 5067 × 31053
    obs: 'Cluster', 'cell_type'

AnnData object with n_obs × n_vars = 24023 × 33660
    obs: 'Cluster', 'cell_type'

AnnData object with n_obs × n_vars = 5158 × 32386
    obs: 'Cluster', 'cell_type'

AnnData object with n_obs × n_vars = 9155 × 36162
    obs: 'Cluster', 'cell_type'