## 🍫 Building a RAG indexing pipeline with Fondant

> ⚠️ Please note that this notebook **is not** compatible with **Google Colab**. To complete the tutorial, you must 
> initiate Docker containers. Starting Docker containers within Google Colab is not supported.

This repository demonstrates a Fondant data pipeline that ingests text
data into a vector database. The pipeline uses four reusable Fondant components.  
Additionally, we provide a Docker Compose setup for Weaviate, enabling local testing and
development.

### Pipeline overview

The primary goal of this sample is to showcase how you can use a Fondant pipeline and reusable
components to load, chunk and embed text, as well as ingest the text embeddings to a vector
database.
Pipeline Steps:

- [Data Loading](https://github.com/ml6team/fondant/tree/main/components/load_from_parquet): The
  pipeline begins by loading text data from a Parquet file, which serves as the
  source for subsequent processing. For the minimal example we are using a dataset from Huggingface.
- [Text Chunking](https://github.com/ml6team/fondant/tree/main/components/chunk_text): Text data is
  chunked into manageable sections to prepare it for embedding. This
  step
  is crucial for performant RAG systems.
- [Text Embedding](https://github.com/ml6team/fondant/tree/main/components/embed_text): We are using
  a small HuggingFace model for the generation of text embeddings.
  The `embed_text` component easily allows the usage of different models as well.
- [Write to Weaviate](https://github.com/ml6team/fondant/tree/main/components/index_weaviate): The
  final step of the pipeline involves writing the embedded text data to
  a Weaviate database.

## Environment
### This section checks the prerequisites of your environment. Read any errors or warnings carefully. 

**Ensure a Python between version 3.8 and 3.10 is available**

In [20]:
import sys
if sys.version_info < (3, 8, 0) or sys.version_info >= (3, 11, 0):
    raise Exception(f"A Python version between 3.8 and 3.10 is required. You are running {sys.version}")

**Check if docker compose is installed and the docker daemon is running**

In [21]:
!docker compose version
!docker ps && echo "Docker running"

Docker Compose version v2.19.1
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
Docker running


**Check if GPU is available**

In [22]:
import logging
import subprocess

try:
    subprocess.check_output('nvidia-smi')
    logging.info("Found GPU, using it!")
    number_of_accelerators = 1
    accelerator_name = "GPU"
except Exception:
    logging.warning("We recommend to run this pipeline on a GPU, but none could be found, using CPU instead")
    number_of_accelerators = None
    accelerator_name = None



**Install Fondant**

In [35]:
!pip install -r ../requirements.txt
# TODO: remove after component inspection PR is merged 
!pip install "fondant[component,aws,azure,gcp]@git+https://github.com/ml6team/fondant@fix-code-inspection-notebook"

Collecting langchain==0.0.329
  Downloading langchain-0.0.329-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pydantic<3,>=1
  Downloading pydantic-2.6.0-py3-none-any.whl (394 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m394.2/394.2 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting SQLAlchemy<3,>=1.4
  Using cached SQLAlchemy-2.0.25-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB)
Collecting dataclasses-json<0.7,>=0.5.7
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting tenacity<9.0.0,>=8.1.0
  Using cached tenacity-8.2.3-py3-none-any.whl (24 kB)
Collecting jsonpatch<2.0,>=1.33
  Using cached jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting anyio<4.0
  Using cached anyio-3.7.1-py3-none-any.whl (80 kB)
Collecting langsmith<0.1.0,>=0.0.52
  Downloading langsmith-0.0.86-py3-none-any.whl (54 kB)
[2K  

## Implement the pipeline

First of all, we need to initialize the pipeline, which includes specifying a name for your pipeline, providing a description, and setting a base_path. The base_path is used to store the pipeline artifacts and data generated by the components

In [36]:
from pathlib import Path
from fondant.pipeline import Pipeline, Resources

BASE_PATH = "./data"
Path(BASE_PATH).mkdir(parents=True, exist_ok=True)

pipeline = Pipeline(
    name="ingestion-pipeline",  # Add a unique pipeline name to easily track your progress and data
    description="Pipeline to prepare and process data for building a RAG solution",
    base_path=BASE_PATH, # The demo pipelines uses a local directory to store the data.
)

For demonstration purposes, we will utilize a dataset available on Hugging Face. As such, we will use a reusable Fondant component `load_from_hf_hub`. Note that the `load_from_hf_hub` component does not define a fixed schema for the data it produces, which means we need to provide hits ourselves with the `produces` argument. It takes a mapping from field names to `pyarrow` types.

In [42]:
import pyarrow as pa

text = pipeline.read(
    "components/components/load_from_hf_hub",
    arguments={
        # Add arguments
        "dataset_name": "wikitext@~parquet",
        "n_rows_to_load": 100,
    },
    produces={
        "text": pa.string()
    }
)

## Implement a custom component 

You can build Fondant pipelines using reusable components from the component hub. Of course, you can implement your custom components. The easiest way to implement your custom components is to build a `lightweight_component`. You can easily implement and test the component code in a notebook and use the same code as part of your pipeline.

Here, we will implement a custom chunking component using Langchain.

Text data is chunked into manageable sections to prepare it for embedding. This step is crucial for efficient RAG systems. Langchain provides an interface to chunk text snippets efficiently. We will implement a Fondant component around the Langchain interface. Here, we are creating a custom `lightweight_component`. Check out [our documentation](https://fondant.ai/en/latest/components/lightweight_components/) for more information.

In [43]:
import pandas as pd
import typing as t 

from fondant.component import PandasTransformComponent
from fondant.pipeline import lightweight_component
import logging
import typing as t 
from langchain.text_splitter import RecursiveCharacterTextSplitter


#TODO: Move all imports defined within functions under the class definition after https://github.com/ml6team/fondant/pull/835 is merged 
@lightweight_component(
    consumes={"text":pa.string()},
    produces={"text":pa.string(), "original_document_id":pa.string()},
    extra_requires=["langchain==0.0.329"]
)
class ChunkTextComponent(PandasTransformComponent):
    """Component that chunks text into smaller segments.
    More information about the different chunking strategies can be here:
      - https://python.langchain.com/docs/modules/data_connection/document_transformers/
      - https://www.pinecone.io/learn/chunking-strategies/.
    """

    def __init__(
        self,
        *,
        chunk_size: int,
        chunk_overlap: int,
    ):
        """
        Args:
            chunk_size: the chunk size 
            chunk_overlap: the overlap between chunks
        """
        import logging
        import typing as t 
        from langchain.text_splitter import RecursiveCharacterTextSplitter

        self.logger = logging.getLogger(__name__)
        self.chunker = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap
        )

    def chunk_text(self, row) -> t.List[t.Tuple]:
        # Multi-index df has id under the name attribute
        doc_id = row.name
        text_data = row["text"]
        docs = self.chunker.create_documents([text_data])

        return [
            (doc_id, f"{doc_id}_{chunk_id}", chunk.page_content)
            for chunk_id, chunk in enumerate(docs)
        ]

    def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame:
        import itertools
        
        self.logger.info(f"Chunking {len(dataframe)} documents...")

        results = dataframe.apply(
            self.chunk_text,
            axis=1,
        ).to_list()

        # Flatten results
        results = list(itertools.chain.from_iterable(results))

        # Turn into dataframes
        results_df = pd.DataFrame(
            results,
            columns=["original_document_id", "id", "text"],
        )
        results_df = results_df.set_index("id")

        return results_df


This method doesn't execute the component yet, but adds it to the execution graph of the pipeline, and returns a lazy `Dataset` instance. 
We can now add the implemented chunking component to the pipeline using `Dataset.apply()`.
Beside our custom component we start to add reusable components, `embed_text` and `index_weaviate`, from the [Fondant Hub](https://fondant.ai/en/latest/components/hub/).

In [47]:
import utils

# TODO: remove /components after using a stable release 

chunks = text.apply(
    ChunkTextComponent,
    arguments={
        "chunk_size": 512, "chunk_overlap": 32
    }
)


embeddings = chunks.apply(
    "components/components/embed_text",
    arguments={
        "model_provider": "huggingface",
        "model": "all-MiniLM-L6-v2"
    },
    resources=Resources(
        accelerator_number=number_of_accelerators,
        accelerator_name=accelerator_name,
    ),
    cluster_type="local" if number_of_accelerators is not None else "default",
)

embeddings.write(
    "components/components/index_weaviate",
    arguments={
        "weaviate_url": f"http://{utils.get_host_ip()}:8081",
        "class_name": "index",
    },
    cache=False
)

Our pipeline now looks as follows:

`read_from_hf_hub` -> `chunk_text` -> `embed_text` -> `index_weaviate`

## Running the pipeline

The pipeline will load and process text data, then ingest the processed data into a vector database. Before executing the pipeline, we need to start the Weaviate database. Otherwise the pipeline execution will fail.

To do this, we can utilize the Docker setup provided in the `weaviate` folder.

In [48]:
!docker compose -f weaviate_service/docker-compose.yaml up --detach --quiet-pull

[1A[1B[0G[?25l[+] Running 0/0
 ⠙ contextionary Pulling                                                   [34m0.1s [0m
 ⠙ weaviate Pulling                                                        [34m0.1s [0m
[?25h[1A[1A[1A[0G[?25l[+] Running 0/2
 ⠹ contextionary Pulling                                                   [34m0.2s [0m
 ⠹ weaviate Pulling                                                        [34m0.2s [0m
[?25h[1A[1A[1A[0G[?25l[+] Running 0/2
 ⠸ contextionary Pulling                                                   [34m0.3s [0m
 ⠸ weaviate Pulling                                                        [34m0.3s [0m
[?25h[1A[1A[1A[0G[?25l[+] Running 0/2
 ⠼ contextionary Pulling                                                   [34m0.4s [0m
 ⠼ weaviate Pulling                                                        [34m0.4s [0m
[?25h[1A[1A[1A[0G[?25l[+] Running 0/2
 ⠴ contextionary Pulling                                              

Finally, we can execute our pipeline. 
Fondant provides multiple runners to run our pipeline:

- A Docker runner for local execution
- A Vertex AI runner for managed execution on Google Cloud
- A Sagemaker runner for managed execution on AWS
- A Kubeflow Pipelines runner for execution anywhere
Here we will use the DockerRunner for local execution, which utilizes docker-compose under the hood.

The runner will download the reusable components from the component hub. Afterwards, you will see the components execute one by one.

In [49]:
from fondant.pipeline.runner import DockerRunner

DockerRunner().run(pipeline)

INFO:root:Found reference to un-compiled pipeline... compiling
INFO:fondant.pipeline.compiler:Compiling ingestion-pipeline to .fondant/compose.yaml
INFO:fondant.pipeline.compiler:Base path found on local system, setting up ./data as mount volume
INFO:fondant.pipeline.pipeline:Sorting pipeline component graph topologically.
INFO:fondant.pipeline.pipeline:All pipeline component specifications match.
INFO:fondant.pipeline.compiler:Compiling service for load_from_hugging_face_hub
INFO:fondant.pipeline.compiler:Found Dockerfile for load_from_hugging_face_hub, adding build step.
INFO:fondant.pipeline.compiler:Compiling service for chunktextcomponent
INFO:fondant.pipeline.compiler:Compiling service for embed_text
INFO:fondant.pipeline.compiler:Found Dockerfile for embed_text, adding build step.
INFO:fondant.pipeline.compiler:Compiling service for index_weaviate
INFO:fondant.pipeline.compiler:Found Dockerfile for index_weaviate, adding build step.
INFO:fondant.pipeline.compiler:Successfully co

Starting pipeline run...


 c57ee5000d61 Pulling fs layer 
 be0f2e005f57 Pulling fs layer 
 eab129fe7d73 Pulling fs layer 
 dd24933c9a93 Pulling fs layer 
 3fcbdacf3969 Pulling fs layer 
 bac50b8af93e Pulling fs layer 
 5779984ca198 Pulling fs layer 
 dd24933c9a93 Waiting 
 bac50b8af93e Waiting 
 5779984ca198 Waiting 
 3fcbdacf3969 Waiting 
 be0f2e005f57 Downloading [>                                                  ]  35.51kB/3.511MB
 eab129fe7d73 Downloading [>                                                  ]  130.3kB/12.84MB
 eab129fe7d73 Downloading [=>                                                 ]  261.4kB/12.84MB
 c57ee5000d61 Downloading [>                                                  ]  294.2kB/29.15MB
 eab129fe7d73 Downloading [===>                                               ]  784.7kB/12.84MB
 eab129fe7d73 Downloading [====>                                              ]  1.178MB/12.84MB
 eab129fe7d73 Downloading [=====>                                             ]  1.309MB/12.84MB
 c57e

#1 [load_from_hugging_face_hub internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s

#2 [load_from_hugging_face_hub internal] load build definition from Dockerfile
#2 transferring dockerfile: 672B done
#2 DONE 0.0s

#3 [load_from_hugging_face_hub internal] load metadata for docker.io/library/python:3.8-slim
#3 ...

#4 [load_from_hugging_face_hub auth] library/python:pull token for registry-1.docker.io
#4 DONE 0.0s

#3 [load_from_hugging_face_hub internal] load metadata for docker.io/library/python:3.8-slim
#3 DONE 2.3s

#5 [load_from_hugging_face_hub internal] load build context
#5 transferring context: 5.68kB done
#5 DONE 0.0s

#6 [load_from_hugging_face_hub 1/7] FROM docker.io/library/python:3.8-slim@sha256:9a1e8d68615dd54b15889d57ae9232d6e696e7ad11353660e0f320f66d002f9b
#6 resolve docker.io/library/python:3.8-slim@sha256:9a1e8d68615dd54b15889d57ae9232d6e696e7ad11353660e0f320f66d002f9b 0.0s done
#6 sha256:1cf9e04c14ca4b1b4f4cce94de523edcc547b7b4b706357afa5486e03

 Network ingestion-pipeline_default  Creating
 Network ingestion-pipeline_default  Created
 Container ingestion-pipeline-load_from_hugging_face_hub-1  Creating
 Container ingestion-pipeline-load_from_hugging_face_hub-1  Created
 Container ingestion-pipeline-chunktextcomponent-1  Creating
 Container ingestion-pipeline-chunktextcomponent-1  Created
 Container ingestion-pipeline-embed_text-1  Creating
 Container ingestion-pipeline-embed_text-1  Created
 Container ingestion-pipeline-index_weaviate-1  Creating
 Container ingestion-pipeline-index_weaviate-1  Created


Attaching to ingestion-pipeline-chunktextcomponent-1, ingestion-pipeline-embed_text-1, ingestion-pipeline-index_weaviate-1, ingestion-pipeline-load_from_hugging_face_hub-1


ingestion-pipeline-load_from_hugging_face_hub-1  | [2024-02-05 11:33:18,625 | fondant.cli | INFO] Component `LoadFromHubComponent` found in module main
ingestion-pipeline-load_from_hugging_face_hub-1  | [2024-02-05 11:33:18,631 | fondant.component.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
ingestion-pipeline-load_from_hugging_face_hub-1  | [2024-02-05 11:33:18,633 | fondant.component.executor | INFO] No matching execution for component detected
ingestion-pipeline-load_from_hugging_face_hub-1  | [2024-02-05 11:33:18,633 | root | INFO] Executing component
ingestion-pipeline-load_from_hugging_face_hub-1  | [2024-02-05 11:33:18,633 | main | INFO] Loading dataset from the hub...
ingestion-pipeline-load_from_hugging_face_hub-1  | [2024-02-05 11:33:24,002 | main | INFO] Renaming columns...
ingestion-pipeline-load_from_hugging_face_hub-1  | [2024-02-05 11:33:25,162 | main | INFO] Required numb

[                                        ] | 0% Completed | 478.00 us
[########################################] | 100% Completed | 103.05 ms
ingestion-pipeline-load_from_hugging_face_hub-1  | 
ingestion-pipeline-load_from_hugging_face_hub-1  | 
ingestion-pipeline-load_from_hugging_face_hub-1 exited with code 0
ingestion-pipeline-chunktextcomponent-1          | Collecting langchain==0.0.329 (from -r requirements.txt (line 1))
ingestion-pipeline-chunktextcomponent-1          |   Obtaining dependency information for langchain==0.0.329 from https://files.pythonhosted.org/packages/42/4e/86204994aeb2e4ac367a7fade896b13532eae2430299052eb2c80ca35d2c/langchain-0.0.329-py3-none-any.whl.metadata
ingestion-pipeline-chunktextcomponent-1          |   Downloading langchain-0.0.329-py3-none-any.whl.metadata (16 kB)
ingestion-pipeline-chunktextcomponent-1          | Collecting SQLAlchemy<3,>=1.4 (from langchain==0.0.329->-r requirements.txt (line 1))
ingestion-pipeline-chunktextcomponent-1          | 

ingestion-pipeline-chunktextcomponent-1          | 
ingestion-pipeline-chunktextcomponent-1          | [notice] A new release of pip is available: 23.2.1 -> 23.3.2
ingestion-pipeline-chunktextcomponent-1          | [notice] To update, run: pip install --upgrade pip
ingestion-pipeline-chunktextcomponent-1          | 
ingestion-pipeline-chunktextcomponent-1          | [2024-02-05 11:43:13,246 | fondant.cli | INFO] Component `ChunkTextComponent` found in module main
ingestion-pipeline-chunktextcomponent-1          | [2024-02-05 11:43:13,251 | fondant.component.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
ingestion-pipeline-chunktextcomponent-1          | [2024-02-05 11:43:13,264 | fondant.component.executor | INFO] Previous component `load_from_hugging_face_hub` is not cached. Invalidating cache for current and subsequent components
ingestion-pipeline-chunktextcomponent-1          | [2024-0

[                                        ] | 0% Completed | 935.08 us
[########################                ] | 61% Completed | 113.33 ms
[########################                ] | 61% Completed | 213.80 ms
[########################################] | 100% Completed | 318.79 ms
ingestion-pipeline-chunktextcomponent-1          | 
ingestion-pipeline-chunktextcomponent-1          | 


ingestion-pipeline-chunktextcomponent-1          | [2024-02-05 11:43:13,952 | fondant.component.executor | INFO] Saving output manifest to /data/ingestion-pipeline/ingestion-pipeline-20240205120133/chunktextcomponent/manifest.json
ingestion-pipeline-chunktextcomponent-1          | [2024-02-05 11:43:13,952 | fondant.component.executor | INFO] Writing cache key with manifest reference to /data/ingestion-pipeline/cache/2c243c959e599093d5e21d51223bbaa5.txt


ingestion-pipeline-chunktextcomponent-1 exited with code 0


ingestion-pipeline-embed_text-1                  | [2024-02-05 11:43:19,431 | fondant.cli | INFO] Component `EmbedTextComponent` found in module main
ingestion-pipeline-embed_text-1                  | [2024-02-05 11:43:19,436 | fondant.component.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
ingestion-pipeline-embed_text-1                  | [2024-02-05 11:43:19,440 | fondant.component.executor | INFO] Previous component `chunktextcomponent` is not cached. Invalidating cache for current and subsequent components
ingestion-pipeline-embed_text-1                  | [2024-02-05 11:43:19,440 | fondant.component.executor | INFO] Caching disabled for the component
ingestion-pipeline-embed_text-1                  | [2024-02-05 11:43:19,440 | root | INFO] Executing component
ingestion-pipeline-embed_text-1                  | [2024-02-05 11:43:23,330 | sentence_transformers.SentenceTransformer | INF

[                                        ] | 0% Completed | 374.37 us
[                                        ] | 0% Completed | 101.55 ms
[                                        ] | 0% Completed | 202.74 ms
[                                        ] | 0% Completed | 308.84 ms
[                                        ] | 0% Completed | 409.14 ms
[                                        ] | 0% Completed | 509.33 ms
[                                        ] | 0% Completed | 609.60 ms
[                                        ] | 0% Completed | 709.84 ms
[                                        ] | 0% Completed | 810.07 ms
[                                        ] | 0% Completed | 910.32 ms
[                                        ] | 0% Completed | 1.01 s
[                                        ] | 0% Completed | 1.11 s
[                                        ] | 0% Completed | 1.21 s
[                                        ] | 0% Completed | 1.31 s
[                               

ingestion-pipeline-embed_text-1                  | 
ingestion-pipeline-embed_text-1                  | 
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.20s/it][A
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.29s/it]
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.29s/it]


[                                        ] | 0% Completed | 2.32 s
[################                        ] | 40% Completed | 2.42 s
[################                        ] | 40% Completed | 2.52 s


ingestion-pipeline-embed_text-1                  | 
ingestion-pipeline-embed_text-1                  | 
ingestion-pipeline-embed_text-1                  | 
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.41s/it][A[A
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.41s/it]
ingestion-pipeline-embed_text-1                  | 
ingestion-pipeline-embed_text-1                  | 
ingestion-pipeline-embed_text-1                  | 
ingestion-pipeline-embed_text-1                  | 
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.47s/it][A[A[A
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.47s/it]
ingestion-pipeline-embed_text-1                  | [2024-02-05 11:43:46,466 | fondant.component.executor | INFO] Saving output manifest to /data/ingestion-pipeline/ingestion-pipeline-20240205120133/embed_text/manifest.json
ingestion-pipeline-embed_text-1                  | [2024-02-05 11:43:46,466 | fondant.component.executor | INFO] Writing cache key with manifest reference to /data/ingestion-pipe

[########################################] | 100% Completed | 2.62 s
ingestion-pipeline-embed_text-1 exited with code 0


ingestion-pipeline-index_weaviate-1              | [2024-02-05 11:43:50,266 | fondant.cli | INFO] Component `IndexWeaviateComponent` found in module main
ingestion-pipeline-index_weaviate-1              | [2024-02-05 11:43:50,273 | fondant.component.executor | INFO] Dask default local mode will be used for further executions.Our current supported options are limited to 'local' and 'default'.
ingestion-pipeline-index_weaviate-1              | [2024-02-05 11:43:50,283 | fondant.component.executor | INFO] Caching disabled for the component
ingestion-pipeline-index_weaviate-1              | [2024-02-05 11:43:50,283 | root | INFO] Executing component
ingestion-pipeline-index_weaviate-1              |             Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.
ingestion-pipeline-index_weaviate-1              | [2024-02-05 11:43:50,784 | root | INFO] Columns of dataframe: []
ingestion-pipeline-index_weaviate-1  

ingestion-pipeline-index_weaviate-1 exited with code 1
Finished pipeline run.


## Exploring the dataset

You can also explore the dataset using the fondant explorer, this enables you to visualize your output dataset at each component step. It might take a while to start the first time as it needs to download the explorer docker image first. You can browse at 
http://localhost:8501/

In [None]:
from fondant.explore import run_explorer_app

run_explorer_app(base_path=BASE_PATH)

To stop the Explore, run the cell below.

In [None]:
from fondant.explore import stop_explorer_app

stop_explorer_app()

## Clean up your environment

After your pipeline run successfully, you should clean up your environment and stop the weaviate database.

In [None]:
!docker compose -f weaviate/docker-compose.yaml down

In [None]:
stop_explorer_app()

## Scaling up
If you're happy with your dataset, it's time to scale up. Check [our documentation](https://fondant.ai/en/latest/pipeline/#compiling-and-running-a-pipeline) for more information about the available runners.

