## Materializing Results to a Pulsar Topic
Kaskada allows you to create a materialization, a resource similar to a query, that automatically runs when new data is loaded into any table the materialization references. Materializations can be used to populate feature vectors in a variety of feature stores to be used in production for low-latency inference. In this example, we'll showcase results being incrementally materialized to a Pulsar topic. 

For more information, see https://kaskada-ai.github.io/docs-site/kaskada/main/reference/working-with-materializations.html. 

In [1]:
import kaskada.api.release as release
import os
from getpass import getpass
# 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
os.environ[release.ReleaseClient.GITHUB_ACCESS_TOKEN_ENV] = getpass(prompt='Github Access Token:')

Github Access Token:········


In [6]:
from kaskada.api.session import LocalBuilder
session = LocalBuilder().download(False).build()
# session = LocalBuilder().build()

INFO:kaskada.api.release:Using latest release version: engine@v0.1.1
INFO:kaskada.api.release:Skipping download. Using binary: /Users/jordan.frazier/.cache/kaskada/bin/engine@v0.1.1/kaskada-engine
INFO:kaskada.api.release:Skipping download. Using binary: /Users/jordan.frazier/.cache/kaskada/bin/engine@v0.1.1/kaskada-manager
INFO:kaskada.api.session:Initializing manager process
INFO:kaskada.api.session:Initializing compute process
INFO:kaskada.api.session:Successfully connected to session.


### Create the table and load data


In [2]:
import kaskada.table

kaskada.table.create_table('transactions', 'transaction_time', 'id')

0,1
table,table_nametransactionsentity_key_column_nameidtime_column_nametransaction_timeversion0create_time2023-03-01T12:23:58.004733update_time2023-03-01T12:23:58.004733
request_details,request_id79204ce91941ee981b391edff16a0fdf

0,1
table_name,transactions
entity_key_column_name,id
time_column_name,transaction_time
version,0
create_time,2023-03-01T12:23:58.004733
update_time,2023-03-01T12:23:58.004733

0,1
request_id,79204ce91941ee981b391edff16a0fdf


In [3]:
kaskada.table.load('transactions', '../testdata/transactions/transactions_part1.parquet')

0,1
data_token_id,6be8cbfd-ee7f-4e1a-9ca8-e3a85bef25f2
request_details,request_id8f366a85f3224d58a580c576f148fe60

0,1
request_id,8f366a85f3224d58a580c576f148fe60


### Create a referenceable query

In [4]:
%load_ext fenlmagic

In [None]:
%%fenl --result-behavior final-results --var my_query 

transactions 

: 

### Create a materialization

In [12]:
from kaskada import materialization as materialize
from kaskada.materialization import PulsarDestination

# A Pulsar topic is composed of a "tenant", "namespace", and "name". 
# Together, they comprise the `topic_url` in the format: `persistent://<tenant>/<namespace>/<name>`. 
#
# Note that if you change the "tenant" or "namespace", they must already exist.
# The "public" tenant and "default" namespace are created by default.
#
# A "name" may be used for a single topic. If the query changes, the original topic
# must be manually deleted to reuse the name. 
destination=PulsarDestination(tenant="public", namespace="default", name="my_topic")

# Creating a materialization runs the query and materializes results to your Pulsar topic. 
materialize.create_materialization(
    name = "my_materialization",
    query = my_query.query,
    destination = destination,
    views = [] 
)

NameError: name 'my_query' is not defined


### Consume events from your Pulsar topic 


import pulsar

client = pulsar.Client('pulsar://localhost:6650')
myTopic = "persistent://public/default/my_topic"
consumer = client.subscribe(myTopic, subscription_name='my-sub')

while True:
    msg = consumer.receive()
    print("Received message: '%s'" % msg.data())
    consumer.acknowledge(msg)

client.close()

#### Load new data 

In [None]:
# Loading data into a table referenced by an existing materialization will cause the query 
# to materialize incremental results to your destination. 
#
# In this example, we expect all events in `transaction_part2.parquet` to be materialized 
# to our topic.
kaskada.table.load('transactions', '../testdata/transactions/transactions_part2.parquet')

: 

In [8]:
# `transactions_part3.parquet` contains late data, meaning events from this file occurred 
# sometime prior to the latest event in a previous input file. Kaskada will re-run the query
# from a point in time at which the late data is processed in order relative to existing
# input data. Therefore, we expect to see results materialized to your topic starting from
# the earliest event time in `transactions_part3.parquet`.
kaskada.table.load('transactions', '../testdata/transactions/transactions_part3.parquet')

0,1
data_token_id,10873462-ae04-40ac-b0e5-cb1b391c76fc
request_details,request_idec13263f9c4c0bb54216f07126928375

0,1
request_id,ec13263f9c4c0bb54216f07126928375
