# Get started

LaminDB is a distributed data management system similar to how git is a distributed version control system.

Just like you work with repositories in git, you work with <ins>instances</ins> in LaminDB.
However, unlike git (and dvc), LaminDB is queryable by metadata.

An instance is a data warehouse with storage (local directory, S3, GCP, Azure) and a SQL database (SQLite, Postgres, BigQuery) for querying it.

## Set up a user identity

As a first-time user, you sign up an email to enable collaborating with a user identity that LaminDB can link to data & analyses.

You can sign up either on the command line via `lndb signup --email <your-email>` or via [hub.lamin.ai](https://hub.lamin.ai). Use your GitHub-associated email if you have one. For example:

    !lndb signup --email "raspbear@gmx.de"

    ℹ️ Please *confirm* the sign-up email. After that, proceed to `lndb init`!

    Generated login secret: MmR4YuQEyb0yxu7dAwJZTjLzR1Az2lN4Q4IduDlO.
    Email & secret persist in: /Users/falexwolf/.lndb/current_user.env.
    Going forward, credentials are auto-loaded. In case of loss, you can always recover your secret via email.

After confirming the sign-up email, you can initialize your first instance immediately via `lndb init`.

## Log in user

In case `raspbear@gmx.de` already completed the sign-up but has not yet logged into LaminDB in the present compute environment, call

In [1]:
!lndb login --email "raspbear@gmx.de" --secret "MmR4YuQEyb0yxu7dAwJZTjLzR1Az2lN4Q4IduDlO"

## Initialize and configure an instance

For a simple demo project, let us configure a local instance with storage in `mydata/` and a local SQlite database for managing it.

You can also directly pass `s3://my-bucket` to `--storage` or a postgres URL to `--dbconfig`.

In [2]:
!lndb init --storage mydata --schema bionty,wetlab  # a generic biology schema module based on bionty and wetlab

ℹ️ Loading schema modules: core, bionty, wetlab.
ℹ️ Created instance mydata: mydata/mydata.lndb


The instance configuration will persist in `/Users/falexwolf/.lndb/current_instance.env`, all instance data is in `mydata` with all metadata in the SQLite file `mydata.lndb`.

## Ingest data

In [3]:
import lamindb as db
import sklearn.datasets

db.header()  # this is nbproject.header()

0,1
id,GgD4VJbXtOOS
version,draft
time_init,2022-06-23 14:16
time_run,2022-07-31 11:24
pypackage,lamindb==0.1.2 scikit-learn==1.1.1


For the sake of demonstrating ingesting data that is merely queryable by provenance, let us choose data that has little semantic meaning in the context of modern biology.

The `iris` dataset stores phenotypes of flowers in form of sepal & petal sizes, which we do not aim to query for in the present tutorial.
See this [Wikipedia article](https://en.wikipedia.org/wiki/Iris_flower_data_set) for more information.

In [4]:
sklearn.datasets.load_iris(as_frame=True).frame.to_csv("iris.csv")

Add the file to the to-be-ingested list

In [5]:
db.do.ingest.add("iris.csv")

Check the to-be-ingested list with assigned dobject ids and versions (here, version '1' of this data object):

In [6]:
db.do.ingest.status

{PosixPath('iris.csv'): ('FQs038cEXiYpuEniYoxuj', '1')}

Let's ingest the file via:

In [None]:
db.do.ingest.commit()