# Zero-copy ingest a remote ML dataset

Our first step is to create a LaminDB instance and ingest the local files.

In [None]:
import lndb

lndb.init(name="mnist-remote", storage="s3://bernardo-test-bucket-1")

Let's take a look at our set up instance.

In [None]:
lndb.settings.instance

Now that the instance has been set up with the existing storage, we must ingest the relevant data objects from storage into the instance so that we are able to track and query them.

During this step, LaminDB commits object metadata to the instance database (in this case, a local SQLite instance).

In [None]:
import lamindb as ln

ln.nb.header()

Let's first get the URIs to the remotely-stored data objects.

In [None]:
import boto3

s3 = boto3.resource("s3")
bucket = s3.Bucket("bernardo-test-bucket-1")
dobject_uris = [
    f"s3://{bucket.name}/{object.key}"
    for object in bucket.objects.filter(Prefix="mnist_100/")
]

Let's now ingest each of the data objects based on their URI and link them to the relevant metadata.

In our case, the metadata we want to link each `DObject` to is a `DFolder` entity so that we can later query data objects based on folders.

In [None]:
# create folder for linking to mnist data objects
mnist_folder = ln.DFolder(name="mnist")

# create dobjects and link them to mnist folder
dobjects = [ln.DObject(data=cloudpath) for cloudpath in dobject_uris]
mnist_folder.dobjects = dobjects

# ingest all data objects
ln.add(mnist_folder)
ln.add(dobjects);