## Use Pipeline

Pipelines are used to transform unstructured data to searchable vector collections. Currently, there are three types of pipelines available, namely ingestion pipelines, search pipelines, and deletion pipelines. 

In this notebook, we utilize the document records in the [example dataset](https://docs.zilliz.com/docs/example-dataset) to demonstrate how to create and run pipelines so that you search among your unstructured data.


### Preparations

In the example dataset, we have over 5,000 articles from [medium.com](https://medium.com). To demonstrate how to create a data ingestion pipeline, we need to scrap these articles and save them as separate text files. 

The following script snippet

- Reads the example dataset, 
- Accesses the link of each record in the dataset to scrap the page content, and 
- Saves it in a separate text file.

In [1]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

SCRAP_COUNT = 200

# Get the passage from the link
# Save the passage as a txt file
def get_passage(id, link):
    try:
        options = Options()
        options.add_argument('--headless')
        driver = webdriver.Chrome(options=options)
        driver.get(link)
        html = driver.find_element(By.TAG_NAME, 'body').get_attribute('innerHTML')
        soup = BeautifulSoup(html)
        with open(f'../passages/{id}.txt', 'w') as f:
            f.write(soup.get_text())
    except Exception as e:
        print(f'Failed to get {id}')
        print(e)


with open('../New_Medium_Data.csv') as f:
    df = pd.read_csv(f)
    df = df.iloc[:SCRAP_COUNT]
    df['vector'] = [ x for x in df['vector'].apply(lambda x: x[1:-1].split(',')) ]
    df['vector'] = df['vector'].apply(lambda x: [float(i) for i in x])

    # for i, row in df.iterrows():
    #     get_passage(row['id'], row['link'])

    df.to_json('../New_Medium_Data.json', orient='records')
    

The scraping process takes time. You can set `SCRAP_COUNT` to change the number of records to read from dataset and medium.com. Once the scraping process is done, you have to manually UPLOAD SCAPED DOCUMENTS TO YOUR CLOUD STORAGE BUCKET before continuing.

### Sign cloud object URLs

Zilliz Cloud pipelines require signed URLs. You should sign each object in your cloud storage bucket before using them in data ingestion pipelines.

The following script snippet:

- Reads the JSON file generated in the previous step.
- Signs the cloud object url if a local copy has been generated.
- Appends the signed url to the record in the dataset.
- Remove all records that do not have corresponding signed urls.

In [2]:
import datetime
import json
import os
import pandas as pd
from google.oauth2 import service_account
from google.cloud import storage

GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEY_FILE = "/Users/anthony/Downloads/anthony-364406-09a7fcff7fdb.json"
GOOGLE_QUOTA_PROJECT_ID = "anthony-364406"
GOOGLE_STORAGE_BUCKET_NAME = "medium-passages"

def generate_download_signed_url_v4(bucket_name, blob_name):
    """Generates a v4 signed URL for downloading a blob.

    Note that this method requires a service account key file. You can not use
    this if you are using Application Default Credentials from Google Compute
    Engine or from the Google Cloud SDK.
    """
    # bucket_name = 'your-bucket-name'
    # blob_name = 'your-object-name'

    with open(GOOGLE_SERVICE_ACCOUNT_PRIVATE_KEY_FILE) as f:
        json_account_info = json.load(f)
        credentials = service_account.Credentials.from_service_account_info(
            json_account_info)

    storage_client = storage.Client(project=GOOGLE_QUOTA_PROJECT_ID, credentials=credentials)
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)

    url = blob.generate_signed_url(
        version="v4",
        # This URL is valid for 15 minutes
        expiration=datetime.timedelta(minutes=60),
        # Allow GET requests using this URL.
        method="GET",
    )

    print("Generated GET signed URL:")
    print(url)
    print("You can use this URL with any user agent, for example:")
    print(f"curl '{url}'")
    return url

with open('../New_Medium_Data.json') as f:
    df = pd.read_json(f)
    files = os.listdir('../passages')
    df['signed_url'] = df['id'].apply(lambda x: generate_download_signed_url_v4(GOOGLE_STORAGE_BUCKET_NAME, 'passages/{}.txt'.format(x)) if '{}.txt'.format(x) in files else None)
    df = df[df['signed_url'].notnull()] # remove nulls
    df.to_json('../New_Medium_Data.json', orient='records')

with open('../New_Medium_Data.json', 'r') as f:
    data = json.load(f)
    for x in data:
        x['signed_url'] = x['signed_url'].replace('\/', '/')
        x['link'] = x['link'].replace('\/', '/')

with open('../New_Medium_Data.json', 'w') as f:
    json.dump(data, f)

Generated GET signed URL:
https://storage.googleapis.com/medium-passages/passages/0.txt?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=storage-viewer%40anthony-364406.iam.gserviceaccount.com%2F20231130%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20231130T033222Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host&X-Goog-Signature=843c8944347ec9825ebaf7642a19e01651fa17a30834fd480fb7b9149e75fd611fc528c9efe82b252843ddebbaa132362b80acf859a49c6adcc8b70d0dd431aeed3816760ff5dc7ed3649faaa0347b33125983d0989721f8b5e42a4c63eeecf3d60ac1dcac27f21994b4839d68db9902e81de066f19e5c511aeab41171b31dd40cf1a9a44a8bc9d6c6d812aa78d17d6cf5b2f1b836d35d466277fcefd81ecc339ff8f191cc87d0e912f759b15977029ec97f36da67f5cdd0cc74ad8e92e0cf03f6f96a80c8ad2c996ce291c7b180d2bb154c917dc53a3b5bea89351f08afe39649a1896ed62c410f4d418c9aa087a30c77262987b4be0db7951b3abe56d13989
You can use this URL with any user agent, for example:
curl 'https://storage.googleapis.com/medium-passages/passages/0.txt?X-Goog-Algorithm=GOOG4-RSA-SHA25

### Demonstration starts

From this section on, all demonstrations are done in RESTful requests.

Before that, we set up two environment variables. You should fill in a Zilliz Cloud API key in `YOUR_CLUSTER_TOKEN`.

In [3]:
os.environ["YOUR_CLUSTER_TOKEN"] = "e7a35f43adbabb1303380f05f1af795f645ab6f2e1d83bf1fb14b1fef1f24f40792d99a4cd86bde6ab1486324fabc321e7eea921"
os.environ["ZILLIZ_CLOUD_API_ENDPOINT_PREFIX"] = "https://controller.api.gcp-us-west1.cloud-uat3.zilliz.com"
os.environ["ZILLIZ_CLOUD_CLUSTER_ID"] = "in03-db58c34c4cc4dd2"
os.environ["ZILLIZ_CLOUD_CLUSTER_ENDPOINT"] = "https://in03-db58c34c4cc4dd2.api.gcp-us-west1.cloud-uat3.zilliz.com"

#### List pipelines

You can list your pipelines as follows:

In [4]:
!curl --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/" | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    22    0    22    0     0     14      0 --:--:--  0:00:01 --:--:--    14
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m[][0m[1;39m
[1;39m}[0m


#### Create a pipeline

Zilliz Cloud offers three types of pipelines, namely ingestion pipeline for data ingestion, search pipeline for semantic searches, and deletion pipelines for removing documents from collections.

We will demonstrating how to create these types of pipelines one after another.

- Create a data ingestion pipeline

In [5]:
os.environ["PAYLOAD"] = json.dumps({
    "name": "medium_articles_ingestion",
    "description": "Ingestion of medium articles",
    "type": "INGESTION",
    "functions": [
        {
            "name": "medium_articles_index_func",
            "action": "INDEX_DOC",
            "inputField": "signed_url",
            "language": "ENGLISH"
        },
        {
            "name": "medium_articles_index_preserve_title",
            "action": "PRESERVE",
            "inputField": "title",
            "outputField": "title",
            "fieldType": "VarChar"
        },
        {
            "name": "medium_articles_index_preserve_link",
            "action": "PRESERVE",
            "inputField": "link",
            "outputField": "link",
            "fieldType": "VarChar"
        },
        {
            "name": "medium_articles_index_preserve_publication",
            "action": "PRESERVE",
            "inputField": "publication",
            "outputField": "publication",
            "fieldType": "VarChar"
        }
    ],
    "clusterId": "in03-db58c34c4cc4dd2",
    "newCollectionName": "medium_articles"
})

!curl --http1.1 --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/" \
    -d "${PAYLOAD}" | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1625    0   870  100   755    144    125  0:00:06  0:00:06 --:--:--   361
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"pipelineId"[0m[1;39m: [0m[0;32m"pipe-cde31766ffc8b8285b841d"[0m[1;39m,
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_ingestion"[0m[1;39m,
    [0m[34;1m"type"[0m[1;39m: [0m[0;32m"INGESTION"[0m[1;39m,
    [0m[34;1m"description"[0m[1;39m: [0m[0;32m"Ingestion of medium articles"[0m[1;39m,
    [0m[34;1m"status"[0m[1;39m: [0m[0;32m"SERVING"[0m[1;39m,
    [0m[34;1m"functions"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"action"[0m[1;39m: [0m[0;32m"INDEX_DOC"[0m[1;39m,
        [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_index_func"[0m[1;39m,
       

You can check the collection created with the pipeline as follows:

In [6]:
!curl --http1.1 --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_CLUSTER_ENDPOINT}/v1/vector/collections/describe?collectionName=medium_articles&dbName=default" | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1036  100  1036    0     0    461      0  0:00:02  0:00:02 --:--:--   461     0      0      0 --:--:--  0:00:01 --:--:--     0
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"collectionName"[0m[1;39m: [0m[0;32m"medium_articles"[0m[1;39m,
    [0m[34;1m"shardsNum"[0m[1;39m: [0m[0;39m1[0m[1;39m,
    [0m[34;1m"description"[0m[1;39m: [0m[0;32m"parrot"[0m[1;39m,
    [0m[34;1m"load"[0m[1;39m: [0m[0;32m"LoadStateLoaded"[0m[1;39m,
    [0m[34;1m"enableDynamicField"[0m[1;39m: [0m[0;39mfalse[0m[1;39m,
    [0m[34;1m"fields"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"name"[0m[1;39m: [0m[0;32m"id"[0m[1;39m,
        [0m[34;1m"type"[0m[1;39m: [0m[0;32m"Int64"[0m[1;39m,
        [0m[34;1m"prim

- Create a search pipeline



In [7]:
os.environ["PAYLOAD"] = json.dumps({
    "name": "medium_articles_search",
    "description": "Ingestion of medium articles",
    "type": "SEARCH",
    "functions": [
        {
            "name": "medium_articles_search_func",
            "action": "SEARCH_DOC_CHUNK",
            "clusterId": "in03-db58c34c4cc4dd2",
            "inputField": "query_text",
            "collectionName": "medium_articles"
        }
    ]
})

!curl --http1.1 --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/" \
    -d "${PAYLOAD}" | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   668    0   381  100   287    317    239  0:00:01  0:00:01 --:--:--   562
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"pipelineId"[0m[1;39m: [0m[0;32m"pipe-71e95af38d958f50d8178b"[0m[1;39m,
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_search"[0m[1;39m,
    [0m[34;1m"type"[0m[1;39m: [0m[0;32m"SEARCH"[0m[1;39m,
    [0m[34;1m"description"[0m[1;39m: [0m[0;32m"Ingestion of medium articles"[0m[1;39m,
    [0m[34;1m"status"[0m[1;39m: [0m[0;32m"SERVING"[0m[1;39m,
    [0m[34;1m"functions"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"action"[0m[1;39m: [0m[0;32m"SEARCH_DOC_CHUNK"[0m[1;39m,
        [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_search_func"[0m[1;39m,
     

- Create a deletion pipeline

In [8]:
os.environ["PAYLOAD"] = json.dumps({
    "name": "medium_articles_deletion",
    "description": "Ingestion of medium articles",
    "type": "DELETION",
    "functions": [
        {
            "name": "medium_articles_deletion_func",
            "action": "PURGE_DOC_INDEX",
            "inputField": "doc_name",
        }
    ],
    "clusterId": "in03-db58c34c4cc4dd2",
    "collectionName": "medium_articles"
})

!curl --http1.1 --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/" \
    -d "${PAYLOAD}" | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   673    0   383  100   290    222    168  0:00:01  0:00:01 --:--:--   390
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"pipelineId"[0m[1;39m: [0m[0;32m"pipe-48871e5feceb3c80be99b3"[0m[1;39m,
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_deletion"[0m[1;39m,
    [0m[34;1m"type"[0m[1;39m: [0m[0;32m"DELETION"[0m[1;39m,
    [0m[34;1m"description"[0m[1;39m: [0m[0;32m"Ingestion of medium articles"[0m[1;39m,
    [0m[34;1m"status"[0m[1;39m: [0m[0;32m"SERVING"[0m[1;39m,
    [0m[34;1m"functions"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"action"[0m[1;39m: [0m[0;32m"PURGE_DOC_INDEX"[0m[1;39m,
        [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_deletion_func"[0m[1;39m,


#### View pipelines

Now you can run the list pipeline API endpoint again to view the all the created pipelines, or use the describe API endpoint to view a specific pipeline.

- List pipelines

In [9]:
!curl --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/" | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1598    0  1598    0     0    783      0 --:--:--  0:00:02 --:--:--   786-  0:00:01 --:--:--     0
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"pipelineId"[0m[1;39m: [0m[0;32m"pipe-cde31766ffc8b8285b841d"[0m[1;39m,
      [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_ingestion"[0m[1;39m,
      [0m[34;1m"type"[0m[1;39m: [0m[0;32m"INGESTION"[0m[1;39m,
      [0m[34;1m"description"[0m[1;39m: [0m[0;32m"Ingestion of medium articles"[0m[1;39m,
      [0m[34;1m"status"[0m[1;39m: [0m[0;32m"SERVING"[0m[1;39m,
      [0m[34;1m"functions"[0m[1;39m: [0m[1;39m[
        [1;39m{
          [0m[34;1m"action"[0m[1;39m: [0m[0;32m"INDEX_DOC"[0m[1;39m,
          [0m[34;1m"name"[0m[1;39m: [0

- Describe a specific pipeline



In [11]:
!curl --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/pipe-cde31766ffc8b8285b841d" | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   870    0   870    0     0    437      0 --:--:--  0:00:01 --:--:--   439
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"pipelineId"[0m[1;39m: [0m[0;32m"pipe-cde31766ffc8b8285b841d"[0m[1;39m,
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_ingestion"[0m[1;39m,
    [0m[34;1m"type"[0m[1;39m: [0m[0;32m"INGESTION"[0m[1;39m,
    [0m[34;1m"description"[0m[1;39m: [0m[0;32m"Ingestion of medium articles"[0m[1;39m,
    [0m[34;1m"status"[0m[1;39m: [0m[0;32m"SERVING"[0m[1;39m,
    [0m[34;1m"functions"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"action"[0m[1;39m: [0m[0;32m"INDEX_DOC"[0m[1;39m,
        [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_index_func"[0m[1;39m,
       

#### Run pipelines

Still remember the dataset we have prepared in the first section? Now we are going to

- Use them in the data ingestion pipeline so that they will be chunked, vectorized, and saved into the collection created along with the data ingestion pipeline,
- Run the search pipeline to conduct a semantic similarity search among the documents.
- Run the deletion pipeline to remove certain document from the collection.

Let's kick it started!

- Run the data ingestion pipeline

In [13]:
with open('../New_Medium_Data.json') as f:
    data = json.load(f)


os.environ["PAYLOAD"] = json.dumps({
    "data": {
        "signed_url": data[0]['signed_url'],
        "title": data[0]['title'],
        "link": data[0]['link'],
        "publication": data[0]['publication'],
    }
})

!curl --http1.1 --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/pipe-cde31766ffc8b8285b841d/run" \
    -d "${PAYLOAD}" | jq .    

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1112    0    56  100  1056     18    343  0:00:03  0:00:03 --:--:--   362:01 --:--:--   793
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"doc_name"[0m[1;39m: [0m[0;32m"0.txt"[0m[1;39m,
    [0m[34;1m"num_chunks"[0m[1;39m: [0m[0;39m14[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m}[0m


Now you can go to Zilliz Cloud to check the inserted chunks in your collection. For the sake of the demonstration of the search pipelie, you can insert more documents.

- Run the search pipeline.

In [162]:
os.environ["PAYLOAD"] = json.dumps({
    "data": {
        "query_text": "How can I organize my knowledge base using vector database?"
    },
    "params": {
        "limit": 3,
        "outputFields": ["title", "doc_name", "chunk_text"]
    }
})

!curl --http1.1 --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/pipe-192f4e39dd4e4bf2d69ac4/run" \
    -d "${PAYLOAD}" | jq .   

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5305    0  5141  100   164   2331     74  0:00:02  0:00:02 --:--:--  2462
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"result"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"id"[0m[1;39m: [0m[0;39m445951244000285400[0m[1;39m,
        [0m[34;1m"distance"[0m[1;39m: [0m[0;39m0.7854970693588257[0m[1;39m,
        [0m[34;1m"title"[0m[1;39m: [0m[0;32m"Configuring SSL for Hasura GraphQL on DigitalOcean Kubernetes"[0m[1;39m,
        [0m[34;1m"chunk_text"[0m[1;39m: [0m[0;32m"understanding of the world.FreeDistraction-free reading. No ads.Organize your knowledge with lists and highlights.Tell your story. Find your audience.Sign up for freeMembershipAccess the best member-only stories.Support independent authors.Li

- Run a deletion pipeline.

In [164]:
os.environ["PAYLOAD"] = json.dumps({
    "data": {
        "doc_name": "0.txt"
    }
})

!curl --http1.1 --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/pipe-e16150dbc23917f270d22b/run" \
    -d "${PAYLOAD}" | jq .   

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    76    0    45  100    31     52     36 --:--:-- --:--:-- --:--:--    89
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"num_deleted_chunks"[0m[1;39m: [0m[0;39m14[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m}[0m


#### Clean up

You can drop the pipelines that are no longer in need.

In [15]:
!curl --http1.1 --request DELETE \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer ${YOUR_CLUSTER_TOKEN}" \
    --url "${ZILLIZ_CLOUD_API_ENDPOINT_PREFIX}/v1/pipelines/pipe-71e95af38d958f50d8178b" | jq .   

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   381    0   381    0     0    177      0 --:--:--  0:00:02 --:--:--   177
[1;39m{
  [0m[34;1m"code"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"pipelineId"[0m[1;39m: [0m[0;32m"pipe-71e95af38d958f50d8178b"[0m[1;39m,
    [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_search"[0m[1;39m,
    [0m[34;1m"type"[0m[1;39m: [0m[0;32m"SEARCH"[0m[1;39m,
    [0m[34;1m"description"[0m[1;39m: [0m[0;32m"Ingestion of medium articles"[0m[1;39m,
    [0m[34;1m"status"[0m[1;39m: [0m[0;32m"SERVING"[0m[1;39m,
    [0m[34;1m"functions"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"action"[0m[1;39m: [0m[0;32m"SEARCH_DOC_CHUNK"[0m[1;39m,
        [0m[34;1m"name"[0m[1;39m: [0m[0;32m"medium_articles_search_func"[0m[1;39m,
     