# 🍫 Building a Controlnet pipeline for interior design with Fondant

> ⚠️ Please note that this notebook **is not** compatible with **Google Colab**. To complete the tutorial, you must
> initiate Docker containers. Starting Docker containers within Google Colab is not supported.

This example demonstrates an end-to-end fondant pipeline to collect and process data for the fine-tuning of a [ControlNet](https://github.com/lllyasviel/ControlNet) model, focusing on images related to interior design.


### Pipeline overview


There are 5 components in total, these are:

1. [**Prompt Generation**](components/generate_prompts): This component generates a set of seed prompts using a rule-based approach that combines various rooms and styles together, like “a photo of a {room_type} in the style of {style_type}”. As input, it takes in a list of room types (bedroom, kitchen, laundry room, ..), a list of room styles (contemporary, minimalist, art deco, ...) and a list of prefixes (comfortable, luxurious, simple). These lists can be easily adapted to other domains. The output of this component is a list of seed prompts.

2. [**Image URL Retrieval**](https://github.com/ml6team/fondant/tree/main/components/prompt_based_laion_retrieval): This component retrieves images from the [LAION-5B](https://laion.ai/blog/laion-5b/) dataset based on the seed prompts. The retrieval itself is done based on CLIP embeddings similarity between the prompt sentences and the captions in the LAION dataset. This component doesn’t return the actual images yet, only the URLs. The next component in the pipeline will then download these images.

3. [**Download Images**](https://github.com/ml6team/fondant/tree/main/components/download_images): This component downloads the actual images based on the URLs retrieved by the previous component. It takes in the URLs as input and returns the actual images, along with some metadata (like their height and width).

4. [**Add Captions**](https://github.com/ml6team/fondant/tree/main/components/caption_images): This component captions all images using [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). This model takes in the image and generates a caption that describes the content of the image. This component takes in a Hugging Face model ID, so it can use any [Hugging Face Hub model](https://huggingface.co/models).

5. [**Add Segmentation Maps**](https://github.com/ml6team/fondant/tree/main/components/segment_images): This component segments the images using the [UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet) model. Each segmentation map contains segments of 150 possible categories listed [here](https://huggingface.co/openmmlab/upernet-convnext-small/blob/main/config.json#L110).

## Environment

#### This section checks the prerequisites of your environment. Read any errors or warnings carefully.

**Ensure a Python between version 3.8 and 3.10 is available**

In [11]:
import sys
if sys.version_info < (3, 8, 0) or sys.version_info >= (3, 11, 0):
    raise Exception(f"A Python version between 3.8 and 3.10 is required. You are running {sys.version}")

**Check if docker compose is installed and the docker daemon is running**

In [38]:
!docker compose version >/dev/null
!docker info >/dev/null

ERROR: Cannot connect to the Docker daemon at unix:///home/robbe/.docker/desktop/docker.sock. Is the docker daemon running?
errors pretty printing info


**Check if GPU is available**

In [22]:
import logging
import subprocess

try:
    subprocess.check_output('nvidia-smi')
    logging.info("Found GPU, using it!")
    number_of_accelerators = 1
    accelerator_name = "GPU"
except Exception:
    logging.warning("We recommend to run this pipeline on a GPU, but none could be found, using CPU instead")
    number_of_accelerators = None
    accelerator_name = None



**Check if Fondant is installed**

In [25]:
try:
    import fondant
except ImportError:
    logging.warning("Please install Fondant from the `requirements.txt` at the root of this repository")

## Implement the pipeline

### Creating a pipeline

First of all, we need to initialize the pipeline, which includes specifying a name for your pipeline, providing a description, and setting a base_path. The base_path is used to store the pipeline artifacts and data generated by the components.

In [60]:
from pathlib import Path

from fondant.pipeline import ComponentOp, Pipeline

BASE_PATH = "./data_dir"
Path(BASE_PATH).mkdir(parents=True, exist_ok=True)

pipeline = Pipeline(
    pipeline_name="controlnet-pipeline",
    pipeline_description="Pipeline that collects data to train ControlNet",
    base_path=BASE_PATH
)

### Adding a (custom) component

The first component of our pipeline is the `generate_prompts` component, which generates seed prompts. This is a custom component implemented in this repository. You can find it at [./components/generate_prompts](./components/generate_prompts).

To create an operation for a custom component, we create a `ComponentOp` and pass in the `component_dir` where the component is located.

We can pass in arguments to change the behavior of the component. Here we are passing in `n_rows_to_load: 10`, which limits the amount of data that is generated for the purpose of this example.

For an overview of the available arguments, you can check the [`fondant_component.yaml`](/edit/src/components/generate_prompts/fondant_component.yaml) specification.

In [61]:
generate_prompts_op = ComponentOp(
    component_dir="components/generate_prompts",
    arguments={
        "n_rows_to_load": 10
    },
)

Once we've created an operation for our component, we can add it to our pipeline.

In [62]:
pipeline.add_op(generate_prompts_op)

Now, our pipeline consists of a single component that generates prompts.

### Adding more (reusable) components

We can now proceed to add more components. 

We will use components available on the [Fondant Hub](https://fondant.ai/en/latest/components/hub/), for which we can create operations using the `ComponentOp.from_registry(...)` method.

*NOTE: The `prompt_based_laion_retrieval` component uses a public CLIP service which can only handle a few requets at a time, if you run into [timeout issues](https://github.com/rom1504/clip-retrieval/issues/267), you might want to host your own clip service following this [guide](https://github.com/rom1504/clip-retrieval/blob/main/docs/laion5B_h14_back.md)*

In [63]:
laion_retrieval_op = ComponentOp.from_registry(
    name="prompt_based_laion_retrieval",
    arguments={
        "num_images": 3,
        "aesthetic_score": 9,
        "aesthetic_weight": 0.5,
        "url": "https://knn.laion.ai/knn-service"
    },
)

download_images_op = ComponentOp.from_registry(
    name="download_images",
    arguments={
        "timeout": 1,
        "retries": 0,
        "image_size": 512,
        "resize_mode": "center_crop",
        "resize_only_if_bigger": False,
        "min_image_size": 0,
        "max_aspect_ratio": 2.5,
    },
)

caption_images_op = ComponentOp.from_registry(
    name="caption_images",
    arguments={
        "model_id": "Salesforce/blip-image-captioning-base",
        "batch_size": 8,
        "max_new_tokens": 50,
    },
   number_of_accelerators=number_of_accelerators,
   accelerator_name=accelerator_name,
)

segment_images_op = ComponentOp.from_registry(
    name="segment_images",
    arguments={
        "model_id": "openmmlab/upernet-convnext-small",
        "batch_size": 8,
    },
    number_of_accelerators=number_of_accelerators,
    accelerator_name=accelerator_name,
)

Now, we can use the components in our pipeline. We will chain them into a pipeline by defining dependencies between the different pipeline steps.

In [64]:
pipeline.add_op(laion_retrieval_op, dependencies=generate_prompts_op)
pipeline.add_op(download_images_op, dependencies=laion_retrieval_op)
pipeline.add_op(caption_images_op, dependencies=download_images_op)
pipeline.add_op(segment_images_op, dependencies=caption_images_op)

## Optional: writing the dataset to the Hugging Face Hub 

To write the final dataset to HF hub, we will use the `write_to_hf_hub` component from the [Fondant Hub](https://fondant.ai/en/latest/components/hub/).

You'll need a Hugging Face Hub account for this. If you don't have one, you can either create one, or skip this step.

In [None]:
USERNAME = 
HF_TOKEN = 

`write_to_hf_hub` is a special type of reusable Fondant component which is **generic**. This means that it can handle different data schemas, but we have to tell it which schema to use.

We do this by overwriting its `fondant_component.yaml` file with the schema of the data we want it to write. To achieve this, we can create a `fondant_component.yaml` file in the directory `components/write_to_hf_hub` with the following content:

In [None]:
%%writefile components/write_to_hub_controlnet/fondant_component.yaml
name: Write to hub
description: Component that writes a dataset to the hub
image: fndnt/write_to_hf_hub:0.6.2  # We use a docker image from the Fondant Hub instead of implementing our own.

consumes:  # We fill in our data schema here. The component will write this data to the Hugging Face Hub.
  images:
    fields:
      data:
        type: binary

  captions:
    fields:
      text:
        type: string

  segmentations:
    fields:
      data:
        type: binary

args:  # We repeat the arguments from the original `fondant_component.yaml`
  hf_token:
    description: The hugging face token used to write to the hub
    type: str
  username:
    description: The username under which to upload the dataset
    type: str
  dataset_name:
    description: The name of the dataset to upload
    type: str
  image_column_names:
    description: A list containing the image column names. Used to format to image to HF hub format
    type: list
    default: []
  column_name_mapping:
    description: Mapping of the consumed fondant column names to the written hub column names
    type: dict
    default: {}

For which we then create an operation as if it was a custom component:

In [None]:
write_to_hub_controlnet = ComponentOp(
    component_dir="components/write_to_hub_controlnet",
    arguments={
        "username": USERNAME ,
        "hf_token": HF_TOKEN ,
        "dataset_name": "controlnet-interior-design",
        "image_column_names": ["images_data"],
    },
)

And add it to the pipeline

In [None]:
pipeline.add_op(write_to_hub_controlnet, dependencies=segment_images_op)

## Running the pipeline

This pipeline will generate prompts, retreive matching images in the laion dataset, download then and generate corresponding captions and segmentations. If you added the optional `write_to_hf_hub` component, it will write the resulting dataset to the HF hub.

Fondant provides multiple runners to run our pipeline:
- A Docker runner for local execution
- A Vertex AI runner for managed execution on Google Cloud
- A Kubeflow Pipelines runner for execution anywhere

Here we will use the `DockerRunner` for local execution, which utilizes docker-compose under the hood.

The runner will first build the custom component and download the reusable components from the component hub. Afterwards, you will see the components execute one by one.

In [65]:
from fondant.compiler import DockerCompiler
from fondant.runner import DockerRunner

from pathlib import Path

DockerCompiler().compile(pipeline=pipeline, output_path="docker-compose.yml")
DockerRunner().run("docker-compose.yml")

[2023-11-07 17:06:42,030 | fondant.compiler | INFO] Compiling controlnet-pipeline to docker-compose.yml
[2023-11-07 17:06:42,031 | fondant.compiler | INFO] Base path found on local system, setting up ./data_dir as mount volume
[2023-11-07 17:06:42,032 | fondant.pipeline | INFO] Sorting pipeline component graph topologically.
[2023-11-07 17:06:42,036 | fondant.pipeline | INFO] All pipeline component specifications match.
[2023-11-07 17:06:42,037 | fondant.compiler | INFO] Compiling service for generate_prompts
[2023-11-07 17:06:42,037 | fondant.compiler | INFO] Found Dockerfile for generate_prompts, adding build step.
[2023-11-07 17:06:42,037 | fondant.compiler | INFO] Compiling service for laion_retrieval
[2023-11-07 17:06:42,038 | fondant.compiler | INFO] Compiling service for download_images
[2023-11-07 17:06:42,038 | fondant.compiler | INFO] Compiling service for caption_images
[2023-11-07 17:06:42,039 | fondant.compiler | INFO] Compiling service for segment_images
[2023-11-07 17:06

#1 [generate_prompts internal] load build definition from Dockerfile
#1 transferring dockerfile: 538B done
#1 DONE 0.0s

#2 [generate_prompts internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [generate_prompts internal] load metadata for docker.io/library/python:3.8-slim
#3 DONE 0.0s

#4 [generate_prompts 1/8] FROM docker.io/library/python:3.8-slim
#4 DONE 0.0s

#5 [generate_prompts internal] load build context
#5 transferring context: 133B done
#5 DONE 0.0s

#6 [generate_prompts 2/8] RUN apt-get update &&     apt-get upgrade -y &&     apt-get install git -y
#6 CACHED

#7 [generate_prompts 7/8] COPY src/ .
#7 CACHED

#8 [generate_prompts 3/8] COPY requirements.txt /
#8 CACHED

#9 [generate_prompts 6/8] WORKDIR /component/src
#9 CACHED

#10 [generate_prompts 5/8] RUN pip3 install --no-cache-dir -r requirements.txt
#10 CACHED

#11 [generate_prompts 4/8] RUN python3 -m pip install --upgrade pip
#11 CACHED

#12 [generate_prompts 8/8] COPY fondant_component.yaml

 Container controlnet-pipeline-generate_prompts-1  Recreate
 Container controlnet-pipeline-generate_prompts-1  Recreated
 Container controlnet-pipeline-laion_retrieval-1  Recreate
 Container controlnet-pipeline-laion_retrieval-1  Recreated
 Container controlnet-pipeline-download_images-1  Recreate
 Container controlnet-pipeline-download_images-1  Recreated
 Container controlnet-pipeline-caption_images-1  Recreate
 Container controlnet-pipeline-caption_images-1  Recreated
 Container controlnet-pipeline-segment_images-1  Recreate
 Container controlnet-pipeline-segment_images-1  Recreated


Attaching to controlnet-pipeline-caption_images-1, controlnet-pipeline-download_images-1, controlnet-pipeline-generate_prompts-1, controlnet-pipeline-laion_retrieval-1, controlnet-pipeline-segment_images-1


controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:06:46,255 | fondant.cli | INFO] Component `GeneratePromptsComponent` found in module main
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:06:46,257 | fondant.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:06:46,258 | fondant.executor | INFO] No matching execution for component detected
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:06:46,258 | root | INFO] Executing component
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:06:46,295 | root | INFO] Creating write task for: /data_dir/controlnet-pipeline/controlnet-pipeline-20231107170642/generate_prompts/index
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:06:46,308 | root | INFO] Creating write task for: /data_dir/controlnet-pipeline/controlnet-pipeline-20231107170642/generate_prompts/prompt

[########################################] | 100% Completed | 100.92 ms
controlnet-pipeline-generate_prompts-1 exited with code 0


controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:06:47,823 | fondant.cli | INFO] Component `LAIONRetrievalComponent` found in module main
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:06:47,826 | fondant.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:06:47,838 | fondant.executor | INFO] Previous component `generate_prompts` is not cached. Invalidating cache for current and subsequent components
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:06:47,838 | fondant.executor | INFO] Caching disabled for the component
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:06:47,838 | root | INFO] Executing component
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:06:47,889 | fondant.data_io | INFO] Loading subset prompts with fields ['text']...
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:

[########################################] | 100% Completed | 3.72 sms


controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:06:51,652 | fondant.executor | INFO] Saving output manifest to /data_dir/controlnet-pipeline/controlnet-pipeline-20231107170642/laion_retrieval/manifest.json
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:06:51,653 | fondant.executor | INFO] Writing cache key to /data_dir/controlnet-pipeline/cache/ce77bd842b490cf02c66057ea7d0ff78.txt


controlnet-pipeline-laion_retrieval-1 exited with code 0


controlnet-pipeline-download_images-1   | [2023-11-07 16:06:54,303 | fondant.cli | INFO] Component `DownloadImagesComponent` found in module main
controlnet-pipeline-download_images-1   | [2023-11-07 16:06:54,307 | fondant.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
controlnet-pipeline-download_images-1   | [2023-11-07 16:06:54,309 | fondant.executor | INFO] Previous component `laion_retrieval` is not cached. Invalidating cache for current and subsequent components
controlnet-pipeline-download_images-1   | [2023-11-07 16:06:54,309 | fondant.executor | INFO] Caching disabled for the component
controlnet-pipeline-download_images-1   | [2023-11-07 16:06:54,310 | root | INFO] Executing component
controlnet-pipeline-download_images-1   | [2023-11-07 16:06:54,371 | fondant.data_io | INFO] Loading subset images with fields ['url']...
controlnet-pipeline-download_images-1   | [2023-11-07 16:06:

[########################################] | 100% Completed | 6.22 ss


controlnet-pipeline-download_images-1   | [2023-11-07 16:07:01,659 | fondant.executor | INFO] Saving output manifest to /data_dir/controlnet-pipeline/controlnet-pipeline-20231107170642/download_images/manifest.json
controlnet-pipeline-download_images-1   | [2023-11-07 16:07:01,659 | fondant.executor | INFO] Writing cache key to /data_dir/controlnet-pipeline/cache/0efa47036db4a669008df3cde7012899.txt


controlnet-pipeline-download_images-1 exited with code 0


controlnet-pipeline-caption_images-1    | [2023-11-07 16:07:05,611 | fondant.cli | INFO] Component `CaptionImagesComponent` found in module main
controlnet-pipeline-caption_images-1    | [2023-11-07 16:07:05,614 | fondant.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
controlnet-pipeline-caption_images-1    | [2023-11-07 16:07:05,616 | fondant.executor | INFO] Previous component `download_images` is not cached. Invalidating cache for current and subsequent components
controlnet-pipeline-caption_images-1    | [2023-11-07 16:07:05,616 | fondant.executor | INFO] Caching disabled for the component
controlnet-pipeline-caption_images-1    | [2023-11-07 16:07:05,616 | root | INFO] Executing component
controlnet-pipeline-caption_images-1    | [2023-11-07 16:07:05,616 | main | INFO] Device: cpu
controlnet-pipeline-caption_images-1    | [2023-11-07 16:07:05,616 | main | INFO] Initialize model 'Sales

KeyboardInterrupt: 

 Container controlnet-pipeline-caption_images-1  Stopped
 Container controlnet-pipeline-download_images-1  Stopping
 Container controlnet-pipeline-download_images-1  Stopped
 Container controlnet-pipeline-laion_retrieval-1  Stopping
 Container controlnet-pipeline-laion_retrieval-1  Stopped
 Container controlnet-pipeline-generate_prompts-1  Stopping
 Container controlnet-pipeline-generate_prompts-1  Stopped
canceled


## Exploring the dataset 

You can also explore the dataset using the fondant explorer, this enables you to visualize your output dataset at each component step. Use 

In [66]:
from fondant.explorer import run_explorer_app

run_explorer_app(
    base_path=BASE_PATH,
    container="fndnt/data_explorer",
    tag="latest",
    port=8501,
)

[2023-11-07 17:10:57,253 | root | INFO] Using local base path: ./data_dir
[2023-11-07 17:10:57,254 | root | INFO] This directory will be mounted to /artifacts in the container.
[2023-11-07 17:10:57,255 | root | INFO] Running image from registry: fndnt/data_explorer with tag: latest on port: 8501
[2023-11-07 17:10:57,255 | root | INFO] Access the explorer at http://localhost:8501
latest: Pulling from fndnt/data_explorer
Digest: sha256:8f317b795798f24f37cb287355d6223c9cca94eb6f12e3535790d1faa79735ec
Status: Image is up to date for fndnt/data_explorer:latest


KeyboardInterrupt: 

To stop the Explorer and continue the notebook, press the stop button at the top of the notebook.

## Creating your own dataset

To create your own dataset, you can update the `generate_prompts` component to generate prompts describing the images you want.

Make the changes you want below and press enter, they will be written to the `./components/generate_prompts/src/main.py` file.

In [67]:
%%writefile components/generate_prompts/src/main.py
"""
This component generates a set of initial prompts that will be used to retrieve images
from the LAION-5B dataset.
"""
import itertools
import logging
import typing as t

import dask.dataframe as dd
import pandas as pd

from fondant.component import DaskLoadComponent

logger = logging.getLogger(__name__)

interior_styles = [
    "art deco",
    "bauhaus",
    "bouclé",
    "maximalist",
    "brutalist",
    "coastal",
    "minimalist",
    "rustic",
    "hollywood regency",
    "midcentury modern",
    "modern organic",
    "contemporary",
    "modern",
    "scandinavian",
    "eclectic",
    "bohemiam",
    "industrial",
    "traditional",
    "transitional",
    "farmhouse",
    "country",
    "asian",
    "mediterranean",
    "rustic",
    "southwestern",
    "coastal",
]

interior_prefix = [
    "comfortable",
    "luxurious",
    "simple",
]

rooms = [
    "Bathroom",
    "Living room",
    "Hotel room",
    "Lobby",
    "Entrance hall",
    "Kitchen",
    "Family room",
    "Master bedroom",
    "Bedroom",
    "Kids bedroom",
    "Laundry room",
    "Guest room",
    "Home office",
    "Library room",
    "Playroom",
    "Home Theater room",
    "Gym room",
    "Basement room",
    "Garage",
    "Walk-in closet",
    "Pantry",
    "Gaming room",
    "Attic",
    "Sunroom",
    "Storage room",
    "Study room",
    "Dining room",
    "Loft",
    "Studio room",
    "Appartement",
]


def make_interior_prompt(room: str, prefix: str, style: str) -> str:
    """Generate a prompt for the interior design model.

    Args:
        room: room name
        prefix: prefix for the room
        style: interior style

    Returns:
        prompt for the interior design model
    """
    return f"{prefix.lower()} {room.lower()}, {style.lower()} interior design"


class GeneratePromptsComponent(DaskLoadComponent):
    def __init__(self, *args, n_rows_to_load: t.Optional[int]) -> None:
        """
        Generate a set of initial prompts that will be used to retrieve images from the
        LAION-5B dataset.

        Args:
            n_rows_to_load: Optional argument that defines the number of rows to load.
                Useful for testing pipeline runs on a small scale
        """
        self.n_rows_to_load = n_rows_to_load

    def load(self) -> dd.DataFrame:
        room_tuples = itertools.product(rooms, interior_prefix, interior_styles)
        prompts = map(lambda x: make_interior_prompt(*x), room_tuples)

        pandas_df = pd.DataFrame(prompts, columns=["prompts_text"])

        if self.n_rows_to_load:
            pandas_df = pandas_df.head(self.n_rows_to_load)

        df = dd.from_pandas(pandas_df, npartitions=1)

        return df

Overwriting components/generate_prompts/src/main.py


If you now recompile your pipeline, the new changes will be picked up and Fondant will automatically re-build the component with the changes included.

In [69]:
DockerCompiler().compile(pipeline=pipeline, output_path="docker-compose.yml")
DockerRunner().run("docker-compose.yml")

[2023-11-07 17:12:50,809 | fondant.compiler | INFO] Compiling controlnet-pipeline to docker-compose.yml
[2023-11-07 17:12:50,811 | fondant.compiler | INFO] Base path found on local system, setting up ./data_dir as mount volume
[2023-11-07 17:12:50,813 | fondant.pipeline | INFO] Sorting pipeline component graph topologically.
[2023-11-07 17:12:50,819 | fondant.pipeline | INFO] All pipeline component specifications match.
[2023-11-07 17:12:50,820 | fondant.compiler | INFO] Compiling service for generate_prompts
[2023-11-07 17:12:50,820 | fondant.compiler | INFO] Found Dockerfile for generate_prompts, adding build step.
[2023-11-07 17:12:50,820 | fondant.compiler | INFO] Compiling service for laion_retrieval
[2023-11-07 17:12:50,821 | fondant.compiler | INFO] Compiling service for download_images
[2023-11-07 17:12:50,822 | fondant.compiler | INFO] Compiling service for caption_images
[2023-11-07 17:12:50,822 | fondant.compiler | INFO] Compiling service for segment_images
[2023-11-07 17:12

#1 [generate_prompts internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s

#2 [generate_prompts internal] load build definition from Dockerfile
#2 transferring dockerfile: 538B done
#2 DONE 0.0s

#3 [generate_prompts internal] load metadata for docker.io/library/python:3.8-slim
#3 DONE 0.0s

#4 [generate_prompts 1/8] FROM docker.io/library/python:3.8-slim
#4 DONE 0.0s

#5 [generate_prompts internal] load build context
#5 transferring context: 133B done
#5 DONE 0.0s

#6 [generate_prompts 6/8] WORKDIR /component/src
#6 CACHED

#7 [generate_prompts 7/8] COPY src/ .
#7 CACHED

#8 [generate_prompts 5/8] RUN pip3 install --no-cache-dir -r requirements.txt
#8 CACHED

#9 [generate_prompts 3/8] COPY requirements.txt /
#9 CACHED

#10 [generate_prompts 2/8] RUN apt-get update &&     apt-get upgrade -y &&     apt-get install git -y
#10 CACHED

#11 [generate_prompts 4/8] RUN python3 -m pip install --upgrade pip
#11 CACHED

#12 [generate_prompts 8/8] COPY fondant_component.yaml

 Container controlnet-pipeline-generate_prompts-1  Recreate
 Container controlnet-pipeline-generate_prompts-1  Recreated
 Container controlnet-pipeline-laion_retrieval-1  Recreate
 Container controlnet-pipeline-laion_retrieval-1  Recreated
 Container controlnet-pipeline-download_images-1  Recreate
 Container controlnet-pipeline-download_images-1  Recreated
 Container controlnet-pipeline-caption_images-1  Recreate
 Container controlnet-pipeline-caption_images-1  Recreated
 Container controlnet-pipeline-segment_images-1  Recreate
 Container controlnet-pipeline-segment_images-1  Recreated


Attaching to controlnet-pipeline-caption_images-1, controlnet-pipeline-download_images-1, controlnet-pipeline-generate_prompts-1, controlnet-pipeline-laion_retrieval-1, controlnet-pipeline-segment_images-1


controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:12:55,026 | fondant.cli | INFO] Component `GeneratePromptsComponent` found in module main
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:12:55,029 | fondant.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:12:55,033 | fondant.executor | INFO] Matching execution detected for component. The last execution of the component originated from `controlnet-pipeline-20231107170642`.
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:12:55,033 | fondant.executor | INFO] Skipping component execution
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:12:55,066 | fondant.executor | INFO] Saving output manifest to /data_dir/controlnet-pipeline/controlnet-pipeline-20231107171250/generate_prompts/manifest.json
controlnet-pipeline-generate_prompts-1  | [2023-11-07 16:12:55,06

controlnet-pipeline-generate_prompts-1 exited with code 0


controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:12:56,577 | fondant.cli | INFO] Component `LAIONRetrievalComponent` found in module main
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:12:56,581 | fondant.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:12:56,583 | fondant.executor | INFO] Previous component `generate_prompts` run was cached. Cached pipeline id: controlnet-pipeline-20231107170642
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:12:56,584 | fondant.executor | INFO] Matching execution detected for component. The last execution of the component originated from `controlnet-pipeline-20231107170642`.
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:12:56,585 | fondant.executor | INFO] Skipping component execution
controlnet-pipeline-laion_retrieval-1   | [2023-11-07 16:12:56,586 | fondant.ex

controlnet-pipeline-laion_retrieval-1 exited with code 0


controlnet-pipeline-download_images-1   | [2023-11-07 16:12:58,141 | fondant.cli | INFO] Component `DownloadImagesComponent` found in module main
controlnet-pipeline-download_images-1   | [2023-11-07 16:12:58,144 | fondant.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
controlnet-pipeline-download_images-1   | [2023-11-07 16:12:58,146 | fondant.executor | INFO] Previous component `laion_retrieval` run was cached. Cached pipeline id: controlnet-pipeline-20231107170642
controlnet-pipeline-download_images-1   | [2023-11-07 16:12:58,147 | fondant.executor | INFO] Matching execution detected for component. The last execution of the component originated from `controlnet-pipeline-20231107170642`.
controlnet-pipeline-download_images-1   | [2023-11-07 16:12:58,148 | fondant.executor | INFO] Skipping component execution
controlnet-pipeline-download_images-1   | [2023-11-07 16:12:58,149 | fondant.exe

controlnet-pipeline-download_images-1 exited with code 0


controlnet-pipeline-caption_images-1    | [2023-11-07 16:13:00,501 | fondant.cli | INFO] Component `CaptionImagesComponent` found in module main
controlnet-pipeline-caption_images-1    | [2023-11-07 16:13:00,504 | fondant.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
controlnet-pipeline-caption_images-1    | [2023-11-07 16:13:00,506 | fondant.executor | INFO] Previous component `download_images` run was cached. Cached pipeline id: controlnet-pipeline-20231107170642
controlnet-pipeline-caption_images-1    | [2023-11-07 16:13:00,506 | fondant.executor | INFO] No matching execution for component detected
controlnet-pipeline-caption_images-1    | [2023-11-07 16:13:00,506 | root | INFO] Executing component
controlnet-pipeline-caption_images-1    | [2023-11-07 16:13:00,506 | main | INFO] Device: cpu
controlnet-pipeline-caption_images-1    | [2023-11-07 16:13:00,506 | main | INFO] Initialize mod

KeyboardInterrupt: 

If you restart the Explorer, you'll see that you can now select a second pipeline in the left panel and inspect your new dataset.

In [None]:
run_explorer_app(
    base_path=BASE_PATH,
    container="fndnt/data_explorer",
    tag="latest",
    port=8501,
)

[2023-11-07 17:13:06,386 | root | INFO] Using local base path: ./data_dir
[2023-11-07 17:13:06,389 | root | INFO] This directory will be mounted to /artifacts in the container.
[2023-11-07 17:13:06,392 | root | INFO] Running image from registry: fndnt/data_explorer with tag: latest on port: 8501
[2023-11-07 17:13:06,393 | root | INFO] Access the explorer at http://localhost:8501
latest: Pulling from fndnt/data_explorer
Digest: sha256:8f317b795798f24f37cb287355d6223c9cca94eb6f12e3535790d1faa79735ec
Status: Image is up to date for fndnt/data_explorer:latest
 Container controlnet-pipeline-caption_images-1  Stopped
 Container controlnet-pipeline-download_images-1  Stopping
 Container controlnet-pipeline-download_images-1  Stopped
 Container controlnet-pipeline-laion_retrieval-1  Stopping
 Container controlnet-pipeline-laion_retrieval-1  Stopped
 Container controlnet-pipeline-generate_prompts-1  Stopping
 Container controlnet-pipeline-generate_prompts-1  Stopped
canceled


## Scaling up

If you're happy with your dataset, it's time to scale up. Check [our documentation](https://fondant.ai/en/latest/pipeline/#compiling-and-running-a-pipeline) for more information about the available runners.