# phs002268.v1.p1 - University of Florida
## HuBMAP: A 3-D Tissue Map of the Human Lymphatic System
### Study Description
The major goal of the 3-D Tissue Map of the Human Lymphatic System is to use microscopic and biomolecular procedures to facilitate co-registration pipelines and common 3D reconstruction algorithms. Tissue collected from human spleen, thymus and lymph node will be spatially resolved at the single cell level both within and across individuals. The approach employed involves sequencing of transcriptomes of dissociated cells and mapping to histological sections using CO-Detection by indEXing (CODEX) and /or Imaging mass spectroscopy, two highly multiplexed methods employing antibody-tagged target epitopes. Additionally, light sheet fluorescent microscopy is used to provide a higher level context for structural localization on a larger volume. The molecular data provided by this project is obtained through single-cell RNA-seq.

### Study Attribution
#### Principal Investigators
* Mark Atkinson. Department of Pathology, Immunology, and Laboratory Medicine, University of Florida Diabetes Institute, College of Medicine, Gainesville FL, USA.
* Bernd Bodenmiller. Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
* Todd Brusko. Department of Pathology, Immunology, and Laboratory Medicine, University of Florida Diabetes Institute, College of Medicine, Gainesville FL, USA.
* Harry Nick. Department of Neuroscience, University of Florida, College of Medicine, Gainesville FL, USA.
* Clive Wasserfall. Department of Pathology, Immunology, and Laboratory Medicine, University of Florida Diabetes Institute, College of Medicine, Gainesville FL, USA.



In [1]:
import hubmapdbgap
import hubmapbags

#this study ID is given by NIH
dbgap_study_id = 'phs002268'

token = '<this-is-my-token>'

In [2]:
# clean local cache
hubmapbags.utilities.clean()

# generate daily report
df = hubmapbags.reports.daily(token=token)

# list all data providers in dataframe
df['group_name'].unique()

array(['University of California San Diego TMC', 'Stanford TMC',
       'California Institute of Technology TMC',
       'University of Florida TMC',
       'TTD - Pacific Northwest National Laboratory', 'Stanford RTI',
       'Broad Institute RTI', 'TMC - University of Connecticut',
       'Northwestern RTI', 'General Electric RTI',
       'EXT - Human Cell Atlas', 'Vanderbilt TMC', 'IEC Testing Group',
       "TMC - Children's Hospital of Philadelphia",
       'TTD - Penn State University and Columbia University',
       'TMC - Pacific Northwest National Laboratory', 'Purdue TTD',
       'TMC - University of Connecticut and Scripps',
       'TMC - University of Pennsylvania'], dtype=object)

In [3]:
df = df[(df['status']=='Published') & (df['dataset_type']=='Primary') & (df['is_protected']==True)]
df = df[df['group_name']=='University of Florida TMC']

#need to figure out why there are duplicate values
df = df.drop_duplicates(subset=['hubmap_id'])

In [4]:
hubmap_ids = list(df['hubmap_id'])
print(f'List of total datasets to include in study is {len(hubmap_ids)}')
df

List of total datasets to include in study is 26


Unnamed: 0,uuid,hubmap_id,status,group_name,data_type,dataset_type,created_datetime,published_datetime,is_protected
1824,1d154d589fb8cffbb8d1f056d0f718f6,HBM354.GTPP.329,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 15:58:17.260,2021-05-02 17:20:55.461,True
1825,76e5f8debfe7159dda861a3ad79b87df,HBM844.GDWH.846,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 16:00:10.771,2021-05-02 17:20:55.115,True
1826,5ba6a5b81e95c93c26a33980f6e957d7,HBM962.BSVN.575,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 15:50:37.156,2021-05-02 17:20:54.815,True
1827,fbfb5aba4fad7d6565e19387e2d12e0b,HBM929.BRVT.339,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 15:55:36.345,2021-05-02 17:20:54.519,True
1828,f4cd67b641b87d97b5b2749382cfc921,HBM992.DRDZ.999,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 13:51:49.456,2021-05-02 17:20:54.217,True
1829,76627e9a5a369c5230b9da216fe20eb3,HBM743.FGPZ.867,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 13:48:44.882,2021-05-02 17:20:53.865,True
1830,67c8950caee99625a5c7a5dda03c4be8,HBM376.TQFR.354,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 13:46:11.022,2021-05-02 17:20:53.552,True
1831,cc70b4ebb578528b29b8d6fa5eb499dd,HBM626.FHJD.938,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 13:40:07.650,2021-05-02 17:20:53.240,True
1832,1a7c7284d18dd06273dba5861fc19f7b,HBM644.LHFR.583,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 15:46:56.712,2021-05-02 17:20:52.898,True
1833,01844e5a048e03aa434ee0afb71c813d,HBM263.TLLN.467,Published,University of Florida TMC,['scRNA-Seq-10x'],Primary,2020-11-29 15:44:19.317,2021-05-02 17:20:52.494,True


In [5]:
data = hubmapdbgap.create.submission(hubmap_ids, dbgap_study_id=dbgap_study_id, token=token, prepend_sample_id=False )

Removing existing folder phs002268
Adding 26 to the main dataframe
Gathering dataset metadata
Processing dataset HBM354.GTPP.329
Processing dataset HBM844.GDWH.846
Processing dataset HBM962.BSVN.575
Processing dataset HBM929.BRVT.339
Processing dataset HBM992.DRDZ.999
Processing dataset HBM743.FGPZ.867
Processing dataset HBM376.TQFR.354
Processing dataset HBM626.FHJD.938
Processing dataset HBM644.LHFR.583
Processing dataset HBM263.TLLN.467
Processing dataset HBM295.MMTQ.529
Processing dataset HBM868.BXNB.448
Processing dataset HBM727.CLDW.546
Processing dataset HBM267.GWND.796
Processing dataset HBM978.DVCP.278
Processing dataset HBM457.SQKR.279
Processing dataset HBM895.WHGJ.263
Processing dataset HBM252.HMBK.543
Processing dataset HBM556.QMSM.776
Processing dataset HBM724.ZKSM.924
Processing dataset HBM984.GRBB.858
Processing dataset HBM226.LBVC.946
Processing dataset HBM373.RTKK.586
Processing dataset HBM749.WHLC.649
Processing dataset HBM472.NTNN.543
Processing dataset HBM336.FWTN.

7it [00:01,  4.69it/s]


Gathering sample attributes


26it [00:08,  3.18it/s]


Creating sample mapping
Downloading spreadsheets
Processing dataset inventories


  0%|          | 0/26 [00:00<?, ?it/s]


KeyError: '10.17504/protocols.io.be79jhr6'