# Ingesting a local ML dataset

As part of our demonstration of a simple ML workflow with LaminDB and PyTorch, this notebook demonstrates a basic process for ingesting locally-stored data into a LaminDB instance.

```{note}
- For an introduction to this four-part demonstration, please see [LaminDB use case: integrating with PyTorch to train a model on the MNIST dataset](./mnist-intro.ipynb).
- For ingesting the same dataset stored in the cloud, please see [Ingesting a remote ML dataset](./mnist-ingest-remote.ipynb).
- For building the PyTorch `Dataset` and training the autoencoder, please see [Integrating with PyTorch and training an autoencoder](./mnist-train.ipynb).
- For extending the LaminDB schema, please see [Extending the LaminDB schema](./mnist-extend-schema.ipynb).
```

## Creating a LaminDB instance

Our first step is to create a LaminDB instance and ingest the local files.

In [None]:
import lndb

lndb.init(storage="mnist-local")

Let's take a look at our set up instance.

In [None]:
lndb.settings.instance

Now that the instance has been set up, we must ingest the relevant data objects from our local folder into the instance so that we are able to track and query them.

During this step, LaminDB commits object metadata to the instance database (in this case, a local SQLite instance), which serves as the metadata and governance layer of our [data lakehouse](https://www.databricks.com/glossary/data-lakehouse).

In [None]:
import lamindb as ln
import lamindb.schema as lns

ln.nb.header()

```{note}

`ln.nb` is an access point of one our open-source modules, nbproject.

The call to `ln.nb.header()` initializes the notebook and enables key data provenance features.

Whenever a data object is ingested into the instance from an initialized notebook, LaminDB automatically identifies the notebook where it came from and inserts the relevant provenance records in the database.

For more details, check out our guide on [ingest and tracking data from notebook runs](https://lamin.ai/docs/db/guide/nb).
```

## Ingesting and linking data objects to folder

Let's first get the paths to the locally-stored data objects.

In [None]:
from pathlib import Path

files = Path("mnist_100/").glob("*")

We now ingest each of the data objects based on their local path and link them to the relevant metadata.

In our case, the metadata we want to link each `DObject` to is a `DFolder` entity so that we can later query data objects based on folders.

In [None]:
# create folder for linking to mnist data objects
mnist_folder = lns.DFolder(name="mnist")

# create dobjects and link them to mnist folder
dobjects = [ln.DObject(filepath) for filepath in files]
mnist_folder.dobjects = dobjects

# ingest all data objects
ln.add(mnist_folder)
ln.add(dobjects);