# Get started

LaminDB is a distributed data managemenent system similar to how git is a distributed version control system.

Just like you work with repositories in git, you work with <ins>instances</ins> in LaminDB.

Unlike git (and dvc), LaminDB is queryable by metadata.

A LaminDB instance is a data warehouse with storage (local directory, S3, GCP, Azure) and a SQL database (sqlite, Postgres or Google BigQuery) for querying it.

## Set up your user identity

As a first time user, you sign up your email in the cloud.

This ensures you have an unambiguous identity attached to data & analyses and enables collaboration in the cloud.

Sign up on the command line via `lndb signup --email <your-email>`. Use your GitHub-associated email if you have one. For example:

    $lndb signup --email "raspbear@gmx.de"

    ℹ️ Please *confirm* the sign-up email. After that, proceed to `lndb init`!

    Generated login secret: MmR4YuQEyb0yxu7dAwJZTjLzR1Az2lN4Q4IduDlO.
    Email & secret persist in: /Users/falexwolf/.lndb/current_user.env.
    Going forward, credentials are auto-loaded. In case of loss, you can always recover your secret via email.

After confirming the sign up email you can right away initialize your first instance via `lndb init`.

## Log in user

In case `raspbear@gmx.de` already completed the sign up, but has not yet logged into LaminDB in the present compute environment, call

In [1]:
!lndb login --email "raspbear@gmx.de" --secret "MmR4YuQEyb0yxu7dAwJZTjLzR1Az2lN4Q4IduDlO"

## Initialize and configure an instance

For demonstration purposes, let us configure a local instance with storage in `mydata/` and a local `sqlite` database for managing it.

You can also directly pass `s3://my-bucket` to `--storage` or a postgres URL to `--dbconfig`.

In [2]:
!lndb init --storage mydata --schema biology  # a generic biology schema module based on bionty and biolab

ℹ️ Using instance: mydata/mydata.lndb


The instance configuration will persist in `/Users/falexwolf/.lndb/current_instance.env.`

## Ingest data

We're good to go! Let's ingest some data.

In [3]:
import lamindb as db
import sklearn.datasets

db.header()

0,1
id,GgD4VJbXtOOS
version,draft
time_init,2022-06-23 14:16
time_run,2022-07-30 09:32
pypackage,lamindb==0.1.2 scikit-learn==1.1.1


For the sake of demonstration, let's get some piece of data from somewhere and write it to a file:

In [4]:
sklearn.datasets.load_iris(as_frame=True).frame.to_csv("iris.csv")

Add the file to the to-be-ingested list

In [5]:
db.do.ingest.add("iris.csv")

Check the to-be-ingested list with assigned Dobject ids

In [6]:
db.do.ingest.status

{PosixPath('iris.csv'): ('wNDfCM0ESIiOLVPXaLIhZ', '1')}

Let's ingest the file via:

In [None]:
db.do.ingest.commit()