This jupyter notebook needs to run on a x86_64 CPU. We recommend running it on a Linux machine. It works both with and without a Nvidia GPU.

In [90]:
"""
Get the Lightly API token and save it here.
For the full docs see https://docs.lightly.ai/docs/install-lightly#api-token
"""

lightly_token = "CHANGE_ME"

In [4]:
"""
Set the path to the dataset.
Here we use the clothing-small dataset and download it. It has about 4k images.
If you want to use your own dataset, just set the path to it.
"""
from pathlib import Path

dataset_path = Path("./dataset_clothing")
!git clone https://github.com/alexeygrigorev/clothing-dataset-small.git {str(dataset_path)}

# Optional: Set the dataset path to a directory with less images, so that this example finishes faster.
dataset_path = dataset_path / "validation"

!tree --filelimit=10 {str(dataset_path)}

Cloning into 'dataset_clothing'...


remote: Enumerating objects: 3839, done.[K
Resolving deltas: 100% (10/10), done.
[01;34mdataset_clothing/validation[00m
├── [01;34mdress[00m [32 entries exceeds filelimit, not opening dir]
├── [01;34mhat[00m [14 entries exceeds filelimit, not opening dir]
├── [01;34mlongsleeve[00m [49 entries exceeds filelimit, not opening dir]
├── [01;34moutwear[00m [24 entries exceeds filelimit, not opening dir]
├── [01;34mpants[00m [49 entries exceeds filelimit, not opening dir]
├── [01;34mshirt[00m [29 entries exceeds filelimit, not opening dir]
├── [01;34mshoes[00m [26 entries exceeds filelimit, not opening dir]
├── [01;34mshorts[00m [25 entries exceeds filelimit, not opening dir]
├── [01;34mskirt[00m [12 entries exceeds filelimit, not opening dir]
└── [01;34mt-shirt[00m [81 entries exceeds filelimit, not opening dir]

10 directories, 0 files


In [92]:
"""
Test that docker is installed and working.
Instructions work for Linux. For other OS see https://docs.docker.com/engine/install/
If these command fail, follow our docker installation guide at https://docs.lightly.ai/docs/install-lightly#docker
"""
import subprocess

def is_nvidia_gpu_available():
    try:
        subprocess.run(["nvidia-smi"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        return True
    except subprocess.CalledProcessError:
        return False

if is_nvidia_gpu_available():
    !sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
else:
    !sudo docker run --rm hello-world




Hello from Docker!
This message shows that your installation appears to be working correctly.
... 


In [93]:
""" Install the Lightly worker and do a quick sanity check. """
!docker pull lightly/worker:latest
!docker run --shm-size="1024m" --rm -it lightly/worker:latest sanity_check=True

latest: Pulling from lightly/worker
Status: Image is up to date for lightly/worker:latest
docker.io/lightly/worker:latest
[2024-03-25 13:23:47] Lightly Worker Solution v2.11.1[0m
[2024-03-25 13:23:47] Congratulations! It looks like the Lightly container is running![0m


In [94]:
""" Install the Lightly Python SDK. """
!pip3 install lightly



In [95]:
""" Register the Lightly Worker. """

from lightly.api import ApiWorkflowClient

client = ApiWorkflowClient(token=lightly_token)

# Create a Lightly Worker. If a worker with this name already exists, the id of the existing
# worker is returned.
worker_id = client.register_compute_worker(name="clothing-worker")
print(f"{worker_id=}")

worker_id='65806b455ca68c93b29ad6b3'


In [96]:

""" Create a dataset in the Lightly platform and configure the datasource. """
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token=lightly_token)

# Create the dataset on the Lightly Platform.
client.create_dataset(
    dataset_name="clothing-small",
    dataset_type=DatasetType.IMAGES
)

# Configure the datasource.
client.set_local_config(
    relative_path="",
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="",
    purpose=DatasourcePurpose.LIGHTLY,
)

In [97]:
""" Schedule a run on the dataset to select 50 samples. """

scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={"shutdown_when_job_finished": True},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)
print(f"{scheduled_run_id=}")

scheduled_run_id='66017ae99aaa3857efbebb6a'


In [98]:
"""
Run the Lightly Worker to process the run. It mounts the dataset defined earlier.
"""
lightly_path = "./lightly_runs"

# See if there is another running Lightly Worker that might pick up the job instead.
!docker ps

!echo "input_mount:"
!echo {str(dataset_path.absolute())}
!echo "lightly_mount:"
!echo {lightly_path}

gpus = "--gpus all" if is_nvidia_gpu_available() else ""
!docker run --shm-size="1024m" {gpus} --rm -it \
    -v {str(dataset_path.absolute())}:/input_mount:ro \
    -v {lightly_path}:/lightly_mount \
    -e LIGHTLY_TOKEN={lightly_token} \
    -e LIGHTLY_WORKER_ID={worker_id}\
    lightly/worker:latest



CONTAINER ID   IMAGE                   COMMAND                  CREATED         STATUS         PORTS     NAMES
0e07286b3970   lightly/worker:latest   "/bin/bash onprem-do…"   2 minutes ago   Up 2 minutes             agitated_shamir
input_mount:
/GitHub/lightly-solution-all-in-one-notebook/dataset_clothing/validation
lightly_mount:
./lightly_runs
[2024-03-25 13:24:05] Lightly Worker Solution v2.11.1[0m
[2024-03-25 13:24:05] You are using docker build: Tue Mar 12 07:56:29 UTC 2024.[0m
[2024-03-25 13:24:05] Starting worker with id '65806b455ca68c93b29ad6b3'...[0m
[93m[2024-03-25 13:24:05] Worker 2.11.1 can only process jobs scheduled with Lightly Python client 1.5 or higher.[0m
[2024-03-25 13:24:05] Worker with labels '[]' started. Waiting for jobs...[0m
[2024-03-25 13:24:06] Found 1 open jobs.[0m
[2024-03-25 13:24:06] Started job with job_id '66017ae99aaa3857efbebb6a'.[0m
...
...
[2024-03-25 13:25:34] Done![0m
[2024-03-25 13:25:36] Finished compute worker run successfully.[0m
[

Congratulations! You succesfully ran the Lightly solution.
Now you can view and explore the dataset interactively on the [Lightly Platform](https://app.lightly.ai).
To not only see the metadata and distribution, but also the images itself, you need to serve them from your local disk to your local browser by using the `lightly-serve` CLI command:

In [99]:
!lightly-serve input_mount={str(dataset_path)} lightly_mount={lightly_path}

Starting server, listening at 'localhost:3456'
Serving files in 'dataset_clothing/validation' and './lightly_runs'


In case your browser runs on a different machine than your notebook, you also need to forward a port, see our [docs](https://docs.lightly.ai/docs/local-storage#view-local-data-in-remote-machine-in-lightly-platform).