# Create a local database

Here, we're working both with local storage of files & a local sqlite database backend for managing queries, data versioning and schema versioning.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from nbproject import header

0,1
uid,tCnjF9Hs8h65
time_init,2022-06-06 14:21
time_run,2022-06-06 20:28


## Ingest

**Problem:** We have some data from somewhere and we want to store it. We call this process _ingestion_.

**Context:** External data that is subject to external conventions (schema & formats) and internal data (generation & pipelining constrained to internal schema & formats) may need ingestion processes with highly differing levels of _curation_: cleaning, standardization, normalization, and annotation.

**Examples:** Let's choose either examples on the whole spectrum!

### Download public flow cytometry data

We'll start with Alpert19 and download a file from the publicly accessible persistant URL from Immport under study SDY478.

The URL for downloading the data is: https://browser.immport.org/browser?path=SDY478%2FResultFiles%2FCyTOF_result.

One needs to create an account at Immport, which is free for anyone.

We downloaded the file into a temporary directory:

In [3]:
! ls ~/Downloads

070314-Mike-Study 15-2013-plate 1-15-004-1-13_cells_found.521438.fcs


This file is far from being integratable with our existing data.

It's also only annotated by the download URL and the semantic context of study number on Immport.

Ideally, we'd annotate the file with all that semantic context at once (study, publications, experiments, reagents, tissues, organisms, etc.), which we'd best do with an API to query the metadata fields from ImmPort, writing it to our own metadata storage.

For now, let's be satisfied with merely assigning an ID within our local storage.

## Setting up lamindb

Let us configure lamindb with a local directory for storing objects, and a local sqlite database for storing metadata of these objects.

In [4]:
!lamindb configure --storage "$HOME/Library/Mobile Documents/com~apple~CloudDocs/Lamin/Sandbox"

[0m

Create a database with an initial schema.

In [5]:
import lamindb as lndb

In [6]:
db = lndb.DB()

Create an on-disk representation of the database.

In [7]:
db.create()

created database at /Users/falexwolf/Library/Mobile Documents/com~apple~CloudDocs/Lamin/Sandbox/lamin.db


## Ingesting a first data file

In [8]:
from pathlib import Path

In [9]:
lnid = lndb.ingest(
    Path.home()
    / "Downloads/070314-Mike-Study 15-2013-plate 1-15-004-1-13_cells_found.521438.fcs"
)

ingested file H9VtLNWnMYazfLaJAWIc from notebook tCnjF9Hs8h65
