## Kaskada: Materializing Results to a Pulsar Topic
Kaskada allows you to create a materialization, a resource similar to a query, that automatically runs when new data is loaded into any table the materialization references. Materializations can be used to populate feature vectors in a variety of feature stores to be used in production for low-latency inference. In this example, we'll showcase results being incrementally materialized to a Pulsar topic. 

For more information, see https://kaskada-ai.github.io/docs-site/kaskada/main/reference/working-with-materializations.html. 

In [None]:
import kaskada.api.release as release
import os
from getpass import getpass
os.environ[release.ReleaseClient.GITHUB_ACCESS_TOKEN_ENV] = getpass(prompt='Github Access Token:')

In [None]:
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()

### Create the table and load data

In [None]:
import kaskada.table

kaskada.table.create_table('transactions', 'transaction_time', 'id')

In [None]:
kaskada.table.load('transactions', 'data/transactions_part1.parquet')

### Create a referenceable query

In [None]:
%load_ext fenlmagic

In [None]:
%%fenl --result-behavior final-results --var test_query 

transactions

### Create a materialization

In [None]:
from kaskada import materialization as materialize
from kaskada.materialization import PulsarDestination

# A Pulsar topic is composed of a "tenant", "namespace", and "name". 
# Together, they comprise the `topic_url` in the format: `persistent://<tenant>/<namespace>/<name>`. 
#
# Note that if you change the "tenant" or "namespace", they must already exist.
# The "public" tenant and "default" namespace are created by default.
#
# A "name" may be used for a single topic. If the query changes, the original topic
# must be manually deleted to reuse the name. 
#
# The "broker_service_url" is how the client connects to the broker. The pulsar container is exposed 
# with the hostname "pulsar". 
destination=PulsarDestination(tenant="public", namespace="default", topic_name="my_topic", broker_service_url="pulsar://pulsar:6650")

# Creating a materialization runs the query and materializes results to your Pulsar topic. 
materialize.create_materialization(
    name = "test_materialization",
    expression = test_query.expression,
    destination = destination,
    views = [] 
)

### Consume events from your Pulsar topic 


In [None]:
import pulsar

client = pulsar.Client('pulsar://pulsar:6650')
myTopic = "persistent://public/default/my_topic"
consumer = client.subscribe(myTopic, subscription_name='my-subscription', initial_position=pulsar.InitialPosition.Earliest)

for _ in range(2):
    msg = consumer.receive()
    print("Received message: '%s'" % msg.data())
    consumer.acknowledge(msg)

### Load new data 

In [None]:
# Loading data into a table referenced by an existing materialization will cause the query 
# to materialize incremental results to your destination. 
#
# In this example, we expect all events in `transaction_part2.parquet` to be materialized 
# to our topic.
kaskada.table.load('transactions', 'data/transactions_part2.parquet')

In [None]:
for _ in range(3):
    msg = consumer.receive()
    print("Received message: '%s'" % msg.data())
    consumer.acknowledge(msg)

In [None]:
# `transactions_part3.parquet` contains late data, meaning events from this file occurred 
# sometime prior to the latest event in a previous input file. Kaskada will re-run the query
# from a point in time at which the late data is processed in order relative to existing
# input data. Therefore, we expect to see results materialized to your topic starting from
# the earliest event time in `transactions_part3.parquet`.
kaskada.table.load('transactions', 'data/transactions_part3.parquet')

In [None]:
# This loops infinitely, awaiting more messages in the topic. 
# You can interrupt the cell to break out of execution.
while True:
    msg = consumer.receive()
    print("Received message: '%s'" % msg.data())
    consumer.acknowledge(msg)