Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📝 Add a quickstart & update ingest #231

Merged
merged 4 commits into from
Sep 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Walk-through of basic functionality
- {doc}`introspect`
- {doc}`share-cross-org`

If you are short on time, use the [quickstart](../quickstart).

Walk-through of exemplary R&D operations

- {doc}`rds-setup`
Expand All @@ -26,5 +28,6 @@ Walk-through of exemplary R&D operations
:hidden:

basic
../quickstart
rds
```
66 changes: 20 additions & 46 deletions docs/guide/ingest.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
"outputs": [],
"source": [
"import lamindb as ln\n",
"import sklearn.datasets\n",
"import scanpy as sc\n",
"\n",
"ln.nb.header()"
]
Expand All @@ -37,7 +39,7 @@
"id": "765c1d7a-abc1-4941-9585-c98eb4d8bea4",
"metadata": {},
"source": [
"Let's first ingest a simple image file from [Paradisi *et al.* (2005)](https://bmcmolcellbiol.biomedcentral.com/articles/10.1186/1471-2121-6-27):\n",
"Example: A single image file from [Paradisi *et al.* (2005)](https://bmcmolcellbiol.biomedcentral.com/articles/10.1186/1471-2121-6-27):\n",
"\n",
"<img width=\"150\" alt=\"Laminopathic nuclei\" src=\"https://upload.wikimedia.org/wikipedia/commons/2/28/Laminopathic_nuclei.jpg\">"
]
Expand All @@ -58,7 +60,7 @@
"id": "74eb8e33",
"metadata": {},
"source": [
"To track this dataset, we stage it for ingestion via `.add`:"
"To track this dataset, stage it for ingestion:"
]
},
{
Expand All @@ -76,7 +78,7 @@
"id": "0c78cb53",
"metadata": {},
"source": [
"Staged files can be viewed via `.status`, having been assigned a unique id and version:"
"Check what we staged:"
]
},
{
Expand All @@ -102,7 +104,7 @@
"id": "897b54c0-9bb1-49e1-b63a-08b1413df2a1",
"metadata": {},
"source": [
"You can also ingest a data object loaded into memory, for instance, a `DataFrame` here:"
"Example: A `DataFrame` storing the iris dataset:"
]
},
{
Expand All @@ -112,8 +114,6 @@
"metadata": {},
"outputs": [],
"source": [
"import sklearn.datasets\n",
"\n",
"df = sklearn.datasets.load_iris(as_frame=True).frame\n",
"\n",
"df.head()"
Expand All @@ -124,7 +124,7 @@
"id": "d172ed7d",
"metadata": {},
"source": [
"When ingesting in-memory objects, a `name` parameter needs to be passed:"
"When ingesting in-memory objects, a `name` argument needs to be passed:"
]
},
{
Expand All @@ -137,24 +137,6 @@
"ln.db.ingest.add(df, name=\"iris\")"
]
},
{
"cell_type": "markdown",
"id": "ba3d9ac2",
"metadata": {},
"source": [
"Upon ingestion, the data object will be saved as a corresponding file format. In this case, a dataframe is saved as a `.feather` file in LaminDB. See [here](https://lamin.ai/docs/lnschema-core/lnschema_core.dobject) for more details!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6f7367ea",
"metadata": {},
"outputs": [],
"source": [
"ln.db.ingest.status"
]
},
{
"cell_type": "markdown",
"id": "e2484f6f",
Expand All @@ -178,9 +160,7 @@
"id": "1ffea7de",
"metadata": {},
"source": [
"By providing _feature models_ at ingestion, can use LaminDB as a queryable data warehouse that stores links[^relations] and monitor data integrity.\n",
"\n",
"A feature model creates a link between a feature and a reference table that defines the entity underlying the feature.\n",
"By passing a _feature model_ to `db.ingest`, LaminDB creates links[^relations] to underlying entities and behaves much like a data warehouse.\n",
"\n",
"[^relations]: We mostly use the term link for synonyms relations and references."
]
Expand All @@ -190,7 +170,7 @@
"id": "8801c81c",
"metadata": {},
"source": [
"Let us explain this by considering a scRNA-seq count matrix in form of an `AnnData` object in memory"
"Example: An scRNA-seq count matrix in form of an `AnnData` object in memory"
]
},
{
Expand All @@ -200,8 +180,6 @@
"metadata": {},
"outputs": [],
"source": [
"import scanpy as sc\n",
"\n",
"data = sc.read(ln.datasets.file_mouse_sc_lymph_node())\n",
"\n",
"data.var.head()"
Expand All @@ -214,7 +192,9 @@
"source": [
"The features in this dataset represent the entity `gene` and are indexed by Ensembl gene ids.\n",
"\n",
"Bionty provides a number of feature models for all basic biological entities that are typically measured. Below we show an example of genes as entities. Also see [ingesting flow cytometry data with cell markers](https://lamin.ai/docs/db/faq/flow).\n",
"Bionty provides a number of feature models for all basic biological entities that are typically measured.\n",
"\n",
"For linking against protein complexes, see a guide on [ingesting flow cytometry data with cell markers](https://lamin.ai/docs/db/faq/flow).\n",
"\n",
"```{note}\n",
"\n",
Expand Down Expand Up @@ -257,12 +237,7 @@
"\n",
"Ingesting data with a `feature_model` enables querying for features with a number of ids, names, and feature properties.\n",
"\n",
"For example, here we ingest genes with their Ensembl ids, but we can also query for them based on [gene symbol, NCBI ids, gene type, etc](https://lamin.ai/docs/db/guide/query-load#Query-data-objects-by-linked-entities).\n",
"\n",
"```{note}\n",
"\n",
"Unmapped features will only be queryable by its own field, in this examaple, by ensembl_gene_id.\n",
"```"
"For example, here we ingest genes with their Ensembl ids, but we can also query for them based on [gene symbol, NCBI ids, gene type, etc](https://lamin.ai/docs/db/guide/query-load#Query-data-objects-by-linked-entities)."
]
},
{
Expand All @@ -274,9 +249,8 @@
"source": [
"ln.db.ingest.add(\n",
" data,\n",
" name=\"mouse_sc_lymph_node\",\n",
" name=\"Mouse Lymph Node scRNA-seq\",\n",
" feature_model=feature_model,\n",
" featureset_name=\"mouse_1k\", # optional\n",
")"
]
},
Expand Down Expand Up @@ -317,7 +291,7 @@
"\n",
"```{note}\n",
"\n",
"For the purpose of this tutorial, we ingest the pipeline output from within this notebook. Typically, this is done from the command line.\n",
"For the purpose of this guide, we ingest the pipeline output from within this notebook. Typically, this is done from the command line.\n",
"```"
]
},
Expand Down Expand Up @@ -364,7 +338,7 @@
"metadata": {},
"outputs": [],
"source": [
"import lnbfx\n",
"import lnbfx # https://lamin.ai/docs/lnbfx\n",
"\n",
"bfx_run = lnbfx.BfxRun(\n",
" pipeline=lnbfx.lookup.pipeline.cell_ranger_v7_0_0,\n",
Expand Down Expand Up @@ -435,15 +409,15 @@
"id": "27ea0197",
"metadata": {},
"source": [
"We see that several links are made in the background: the data object is associated with its source (this Jupyter notebook, `jupynb`) and the user who operates the notebook (`test-user1`).\n",
"\n",
"`ln.db.ingest` detects whether data comes from a notebook, a pipeline, a connector, or a custom graphical user interface."
"We see that several links are made in the background: the data object is associated with its source (this Jupyter notebook, `jupynb`) and the user who operates the notebook."
]
}
],
"metadata": {
"kernelspec": {
"language": "python"
"display_name": "Python 3.9.12 ('base1')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand Down
Loading