Skip to content

numaproj-contrib/apache-pulsar-source-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Pulsar Source for Numaflow

This document details the setup of an Apache Pulsar source within a Numaflow pipeline. The integration of Apache Pulsar as a source allows for the efficient ingestion of data from Apache Pulsar topics into a Numaflow pipeline for further processing and routing to various sinks like Redis.

Quick Start

Follow this guide to set up an Apache Pulsar source in your Numaflow pipeline.

Prerequisites

Step-by-Step Guide

1. Configure Apache Pulsar

Ensure that you have an Apache Pulsar topic and a corresponding subscription ready for use. You can create these using the Pulsar admin CLI or the Pulsar dashboard.

2. Deploy a Numaflow Pipeline with Apache Pulsar Source

Create a Kubernetes manifest (e.g., pulsar-source-pipeline.yaml) with the following configuration:

apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
metadata:
  name: apache-pulsar-source
spec:
  limits:
    readBatchSize: 10
  vertices:
    - name: in
      source:
        udsource:
          container:
            image: "quay.io/numaio/numaflow-go/apache-pulsar-source-go:latest"
            env:
              - name: PULSAR_TOPIC
                value: "test-topic"
              - name: PULSAR_SUBSCRIPTION_NAME
                value: "test-subscription"
              - name: PULSAR_HOST
                value: "pulsar-broker.numaflow-system.svc.cluster.local:6650"
              - name: PULSAR_ADMIN_ENDPOINT
                value: "http://pulsar-broker.numaflow-system.svc.cluster.local:8080"
    - name: log-sink
      sink:
        log: {}
  edges:
    - from: in
      to: log-sink

Replace test-topic, test-subscription, and the Pulsar host details with your specific Apache Pulsar configuration.

Apply the configuration to your cluster:

kubectl apply -f pulsar-source-pipeline.yaml

3. Verify the Pipeline

Check the Numaflow pipeline's logs to ensure that it successfully connects to Apache Pulsar and ingests data from the specified topic.

4. Clean up

To delete the Numaflow pipeline:

kubectl delete -f pulsar-source-pipeline.yaml
# Remove Docker volumes
# Replace with your volume names
docker volume rm pulsar-data-volume pulsar-config-volume

Additional Resources