# Hubmap: scRNA-seq

The [HubMAP (Human BioMolecular Atlas Program) consortium](https://hubmapconsortium.org/) is an initiative mapping human cells to create a comprehensive atlas, with its [Data Portal](https://portal.hubmapconsortium.org/) serving as the platform where researchers can access, visualize, and download (single-cell) tissue data.

Lamin mirrors most of the datasets for simplified access here: [laminlabs/hubmap](https://lamin.ai/laminlabs/hubmap).

If you use the data academically, please cite the original publication [Jain et al. 2023](https://www.nature.com/articles/s41556-023-01194-w).

Here, we show how the HubMAP instance is structured and how datasets and be queried and accessed.

Connect to the source instance:

In [None]:
# pip install 'lamindb[jupyter,bionty,wetlab]'
!lamin connect laminlabs/hubmap

```{note}

If you want to transfer artifacts or metadata into your own instance, use `.using("laminlabs/hubmap")` when accessing registries and then `.save()` ({doc}`/transfer`).

```

In [None]:
import lamindb as ln

## Getting HubMAP datasets and data products

HubMAP associates several data products, which are the single raw datasets, into higher level datasets.
For example, the dataset [HBM983.LKMP.544](https://portal.hubmapconsortium.org/browse/dataset/20ee458e5ee361717b68ca72caf6044e) has three data products:

1. [raw_expr.h5ad](https://assets.hubmapconsortium.org/f6eb890063d13698feb11d39fa61e45a/raw_expr.h5ad)
1. [expr.h5ad](https://assets.hubmapconsortium.org/f6eb890063d13698feb11d39fa61e45a/expr.h5ad)
2. [secondary_analysis.h5ad](https://assets.hubmapconsortium.org/f6eb890063d13698feb11d39fa61e45a/secondary_analysis.h5ad)
3. [scvelo_annotated.h5ad](https://assets.hubmapconsortium.org/f6eb890063d13698feb11d39fa61e45a/scvelo_annotated.h5ad)

The [laminlabs/hubmap](https://lamin.ai/laminlabs/hubmap) instance registers these data products as {class}`~lamindb.Artifact` that jointly form a {class}`~lamindb.Collection`.

The `key` attribute of `ln.Artifact` and `ln.Collection` corresponds to the IDs of the URLs.
For example, the id in the URL [https://portal.hubmapconsortium.org/browse/dataset/20ee458e5ee361717b68ca72caf6044e](https://portal.hubmapconsortium.org/browse/dataset/20ee458e5ee361717b68ca72caf6044e) is the `key` of the corresponding collection:

In [None]:
small_intenstine_collection = ln.Collection.get(key="20ee458e5ee361717b68ca72caf6044e")
small_intenstine_collection

We can get all associated data products like:

In [None]:
small_intenstine_collection.artifacts.all().df()

Note the key of these three `Artifacts` which corresponds to the assets URL.
For example, [https://assets.hubmapconsortium.org/f6eb890063d13698feb11d39fa61e45a/expr.h5ad](https://assets.hubmapconsortium.org/f6eb890063d13698feb11d39fa61e45a/expr.h5ad) is the direct URL to the `expr.h5ad` data product.

Artifacts can be directly loaded:

In [None]:
small_intenstine_af = (
    small_intenstine_collection.artifacts.filter(key__icontains="raw_expr.h5ad")
    .distinct()
    .one()
)
adata = small_intenstine_af.load()
adata

## Querying single-cell datasets

Currently, only the `Artifacts` of the `raw_expr.h5ad` data products are labeled with metadata.
The available metadata includes `ln.Reference`, `bt.Tissue`, `bt.Disease`, `bt.ExperimentalFactor`, and many more.
Please have a look at [the instance](https://lamin.ai/laminlabs/hubmap) for more details.

In [None]:
# Get one dataset with a specific type of heart failure
heart_failure_adata = (
    ln.Artifact.filter(diseases__name="heart failure with reduced ejection fraction")
    .first()
    .load()
)
heart_failure_adata