## Kaskada: Materializing Results to a Pulsar Topic
Kaskada allows you to create a materialization, a resource similar to a query, that automatically runs when new data is loaded into any table the materialization references. Materializations can be used to populate feature vectors in a variety of feature stores to be used in production for low-latency inference. In this example, we'll showcase results being incrementally materialized to a Pulsar topic. 

For more information, see https://kaskada-ai.github.io/docs-site/kaskada/main/reference/working-with-materializations.html. 

In [2]:
import kaskada.api.release as release
import os
from getpass import getpass
os.environ[release.ReleaseClient.GITHUB_ACCESS_TOKEN_ENV] = getpass(prompt='Github Access Token:')

Github Access Token:········


In [None]:
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()

: 

### Create the table and load data

In [None]:
import kaskada.table

kaskada.table.create_table('transactions', 'transaction_time', 'id')

: 

In [5]:
kaskada.table.load('transactions', 'data/transactions_part1.parquet')

0,1
data_token_id,0bfbb41a-673e-4d01-8cb7-fcc268971fd4
request_details,request_id6b4237ad0e6e2448f695a5ca931949a4

0,1
request_id,6b4237ad0e6e2448f695a5ca931949a4


### Create a referenceable query

In [6]:
%load_ext fenlmagic

In [7]:
%%fenl --result-behavior final-results --var test_query 

transactions

Unnamed: 0,_time,_subsort,_key_hash,_key,id,price,quantity,purchaser,purchaser_id,credit_provider,email,transaction_time,idx
0,2013-02-05 11:53:07.000000001,18446744073709551615,194650352360165,f81fcc64-e02e-418b-8bfd-96cdab3f6b17,f81fcc64-e02e-418b-8bfd-96cdab3f6b17,203.21,9,Cynthia Campbell,b62b8e9e399d074b1c3189ad9e706c53,American Express,Cynthia.Campbell@example.com,2000-04-07 11:33:06,17923
1,2013-02-05 11:53:07.000000001,18446744073709551615,202890757993855,776ddcc3-a9c8-4ca2-8a5b-729c7cd3aa45,776ddcc3-a9c8-4ca2-8a5b-729c7cd3aa45,153.44,9,Harold Stone,a6ef6bebf1ea8bb23abfe3e3368c550f,VISA 13 digit,Harold.Stone@example.com,2009-01-05 09:08:26,39724
2,2013-02-05 11:53:07.000000001,18446744073709551615,414607641634714,efc29486-5d2f-447c-abe4-948bc901c131,efc29486-5d2f-447c-abe4-948bc901c131,25.90,1,Tony Jones,87ed48e2f83db823a697ed8cf79cd6e1,JCB 16 digit,Tony.Jones@example.com,2006-09-12 12:27:54,33953
3,2013-02-05 11:53:07.000000001,18446744073709551615,608822328614928,124e01a6-ef27-494d-b04b-c12a5b087ff9,124e01a6-ef27-494d-b04b-c12a5b087ff9,98.74,6,Aaron Dougherty,772f8b1479587b85209345f0e05a36f5,Maestro,Aaron.Dougherty@example.com,2012-02-22 18:16:26,47572
4,2013-02-05 11:53:07.000000001,18446744073709551615,1126833284249765,ae12e699-060f-4dfa-afbf-4ac77e2a4576,ae12e699-060f-4dfa-afbf-4ac77e2a4576,81.12,7,Victoria Ross,33b82c5a7e5846d2abd51c82f44e81fe,Mastercard,Victoria.Ross@example.com,2002-08-08 04:27:02,23668
...,...,...,...,...,...,...,...,...,...,...,...,...,...
49995,2013-02-05 11:53:07.000000001,18446744073709551615,18446074954814204992,68567c17-e3e7-4838-b175-487eb19d57cc,68567c17-e3e7-4838-b175-487eb19d57cc,1.46,5,Darren Haynes,8f4c4a47e385061fcc719a294e98c497,Mastercard,Darren.Haynes@example.com,2004-04-15 13:01:56,27922
49996,2013-02-05 11:53:07.000000001,18446744073709551615,18446106367110200566,c86fe5b0-48d2-4621-8c15-39159d72ac32,c86fe5b0-48d2-4621-8c15-39159d72ac32,242.35,7,Zachary Peterson,6d91b1bba438fe2d9025267cc3a1f94a,Diners Club / Carte Blanche,Zachary.Peterson@example.com,1999-08-12 19:33:48,16277
49997,2013-02-05 11:53:07.000000001,18446744073709551615,18446394539458624218,1bcbdbb4-521c-495b-b6d9-1d5fbb3919a8,1bcbdbb4-521c-495b-b6d9-1d5fbb3919a8,31.18,5,Darren Haynes,8f4c4a47e385061fcc719a294e98c497,JCB 16 digit,Darren.Haynes@example.com,1996-12-24 23:01:23,9683
49998,2013-02-05 11:53:07.000000001,18446744073709551615,18446445840915012776,ae56c8fc-cbff-4ee6-93a3-d55e934ec502,ae56c8fc-cbff-4ee6-93a3-d55e934ec502,13.37,6,Timothy Tran,d477f4d7bd73fb43ad59ee744c151d79,JCB 16 digit,Timothy.Tran@example.com,2004-06-04 01:49:46,28261

0,1
state,SUCCESS
query_id,80ed13a0-268b-4278-85c5-1b27b458d892
metrics,time_preparing0.117stime_computing0.133soutput_files1
analysis,can_executeTrue
schema,(see Schema tab)
request_details,request_id6c13406163bd3a0a9d4af8b440b300d1
expression,transactions

0,1
time_preparing,0.117s
time_computing,0.133s
output_files,1

0,1
can_execute,True

0,1
request_id,6c13406163bd3a0a9d4af8b440b300d1

Unnamed: 0,column_name,column_type
0,id,string
1,price,f64
2,quantity,i64
3,purchaser,string
4,purchaser_id,string
5,credit_provider,string
6,email,string
7,transaction_time,string
8,idx,i64


### Create a materialization

In [10]:
from kaskada import materialization as materialize
from kaskada.materialization import PulsarDestination

# A Pulsar topic is composed of a "tenant", "namespace", and "name". 
# Together, they comprise the `topic_url` in the format: `persistent://<tenant>/<namespace>/<name>`. 
#
# Note that if you change the "tenant" or "namespace", they must already exist.
# The "public" tenant and "default" namespace are created by default.
#
# A "name" may be used for a single topic. If the query changes, the original topic
# must be manually deleted to reuse the name. 
#
# The "broker_service_url" is how the client connects to the broker. The pulsar container is exposed 
# with the hostname "pulsar". 
destination=PulsarDestination(tenant="public", namespace="default", topic_name="my_topic", broker_service_url="pulsar://pulsar:6650")

# Creating a materialization runs the query and materializes results to your Pulsar topic. 
materialize.create_materialization(
    name = "test_materialization",
    expression = test_query.expression,
    destination = destination,
    views = [] 
)

0,1
materialization,materialization_namemy_materializationquerytransactionsdestinationsliceNone(full dataset used for query)schema(see Schema tab)create_time2023-03-07T14:11:47.054408
analysis,can_executeTrue
request_details,request_id34926dbf062513877fee21bebedd9863

0,1
materialization_name,my_materialization
query,transactions
destination,
slice,None(full dataset used for query)
schema,(see Schema tab)
create_time,2023-03-07T14:11:47.054408

0,1
,(full dataset used for query)

0,1
can_execute,True

0,1
request_id,34926dbf062513877fee21bebedd9863

Unnamed: 0,column_name,column_type
0,id,string
1,price,f64
2,quantity,i64
3,purchaser,string
4,purchaser_id,string
5,credit_provider,string
6,email,string
7,transaction_time,string
8,idx,i64


### Consume events from your Pulsar topic 


In [None]:
import pulsar

client = pulsar.Client('pulsar://pulsar:6650')
myTopic = "persistent://public/default/topic-my_topic"
consumer = client.subscribe(myTopic, subscription_name='my-subscription', initial_position=pulsar.InitialPosition.Earliest)

for _ in range(2):
    msg = consumer.receive()
    print("Received message: '%s'" % msg.data())
    consumer.acknowledge(msg)

### Load new data 

In [None]:
# Loading data into a table referenced by an existing materialization will cause the query 
# to materialize incremental results to your destination. 
#
# In this example, we expect all events in `transaction_part2.parquet` to be materialized 
# to our topic.
kaskada.table.load('transactions', 'data/transactions_part2.parquet')

: 

In [None]:
for _ in range(3):
    msg = consumer.receive()
    print("Received message: '%s'" % msg.data())
    consumer.acknowledge(msg)

In [8]:
# `transactions_part3.parquet` contains late data, meaning events from this file occurred 
# sometime prior to the latest event in a previous input file. Kaskada will re-run the query
# from a point in time at which the late data is processed in order relative to existing
# input data. Therefore, we expect to see results materialized to your topic starting from
# the earliest event time in `transactions_part3.parquet`.
kaskada.table.load('transactions', 'data/transactions_part3.parquet')

0,1
data_token_id,10873462-ae04-40ac-b0e5-cb1b391c76fc
request_details,request_idec13263f9c4c0bb54216f07126928375

0,1
request_id,ec13263f9c4c0bb54216f07126928375


In [None]:
# This loops infinitely, awaiting more messages in the topic. 
# You can interrupt the cell to break out of execution.
while True:
    msg = consumer.receive()
    print("Received message: '%s'" % msg.data())
    consumer.acknowledge(msg)