<p align="center">
<img src="img/lasair.png" alt="drawing" width="50"/>
</p>
<h1 align="center">  Kafka Tutorial: </h1>
<h2 align="center">  Consuming Public Alerts </h2>


---

## What you'll learn

This notebook will walk you through making your own code to consume alerts from Lasair filters. You will learn:

* How to set up a kafka consumer with the `lasair` helper function
* What stream options are available for the Lasair filters
* How to navigate the data produced by these stream options. 


## Pre-requesites


### **Have you installed the `lasair` client ?**

You can do this through pip!

```bash
pip install lasair
```

### You also need the following python libraries:
  - `json`
  - `pandas`
  - `matplotlib`



---

<h2 align="center">  1. Setting-up your first consumer </h1>



In [2]:
import json
from pathlib import Path
from lasair import LasairError, lasair_consumer
import random # This only needed for the tutorial

### Config for the `lasair_consumer`

To connect to a kafka stream you need a few things:
* The socket (Host:Port) of the server (where is the server on the internet and how do I get into it?): `lasair-lsst-kafka_pub.lsst.ac.uk:9092`
* The endpoint (where are do I send my requests on the server?): `https://api.lasair.lsst.ac.uk/api`
* The topic (corresponding to your filter of choice):  `lasair_2Zooniverse"`
* The `group_id`, which keeps track of which alerts you've already seen and which ones you have yet to receive. 
* How many alerts (N) are we polling for?


In [3]:
#endpoint     = "https://api.lasair.lsst.ac.uk/api"
endpoint     = "https://api.lasair.lsst-dev.lsst.ac.uk/api"
#kafka_server = "lasair-lsst-kafka_pub.lsst.ac.uk:9092"
kafka_server = "lasair-lsst-dev-kafka_pub.lsst.ac.uk:9092"
#topic        = "lasair_83lvra_feeder_full"
topic        = "lasair_83lasair_tutorial_basic_stream"
group_id     = "tutorial"+str(int(random.random()*10000000000)) # CREATING RANDOM ID
N            = 3

Now let's use the Lasair client to make our consumer

In [4]:
consumer = lasair_consumer(
    kafka_server,    
    group_id,        
    topic            
)

### Polling your first alerts

In [5]:
n = 0
while n < N: # whilst we have not read N messages
    # ask the lasair consumer to poll the NEXT MESSAGE IN THE QUEUE
    msg = consumer.poll(timeout=20)
    
    if msg is None:
        # If that message is None, we have reached the end of the queue!
        break

    if msg.error():
        # If there is an error we want to raise an exception
        raise LasairError("Error while consuming message: {}".format(msg.error()))
        break

    # If we have a message we need to read it into Json format
    jmsg = json.loads(msg.value())

    # Then we can write it out!
    print(json.dumps(jmsg, indent=2))
    n += 1
print('You have reached the end of the queue')

{
  "diaObjectId": 313928193916534808,
  "lastDiaSourceMjdTai": 61069.08298840221,
  "latestR": 0.983908,
  "nDiaSources": 51,
  "ra": 52.32789570550791,
  "decl": -26.86992277816011,
  "separationArcsec": 1.081
}
{
  "diaObjectId": 314003013720080448,
  "lastDiaSourceMjdTai": 61088.212443683144,
  "latestR": 0.964177,
  "nDiaSources": 18,
  "ra": 150.1314100332278,
  "decl": 2.4053689123972593,
  "separationArcsec": 0.696,
  "UTC": "2026-02-17 05:13:10"
}
{
  "diaObjectId": 314051320824201308,
  "lastDiaSourceMjdTai": 61088.09744609636,
  "latestR": 0.953794,
  "nDiaSources": 3,
  "ra": 52.55095983903494,
  "decl": -28.077585952842437,
  "separationArcsec": 9.058,
  "UTC": "2026-02-17 05:13:10"
}
You have reached the end of the queue


%4|1771582701.075|MAXPOLL|rdkafka#consumer-1| [thrd:main]: Application maximum poll interval (300000ms) exceeded by 252ms (adjust max.poll.interval.ms for long-running message processing): leaving group


<h2 align="center">  2. Saving the data with the correct format </h2>


In a real life setting you won't be printing large dictionaries to your notebook or terminal, you want it in a `.json` file. 

Let's select an output directory for our data **NOTE: I set this tutorial up to point to the /tmp directory** the data will be cleared when you restart your system. Feel free to select a different location

In [8]:
output_dir   = "/tmp/lasair_consumer_output" # this won't work on windows
OUTPUT_PATH = Path(output_dir)
OUTPUT_PATH.mkdir(exist_ok=True, parents=True) # If sub directory doesn't exist, create it. If it does exist, do nothing. If the parent directories don't exist, create them too.

Since we have already listened to our alerts we have moved in the queue! If we want the same alerts we printed above, we need a new `group_id`

In [9]:
group_id     = "tutorial"+str(int(random.random()*10000000000)) # CREATING RANDOM ID

consumer = lasair_consumer(
    kafka_server,    
    group_id,        
    topic            
)

Now we poll and we dump each message in a file which will have the structure:

```
[
    {MESSAGE_ALERT1},
    {MESSAGE_ALERT2},
    ....
    {MESSAGE_ALERTN},
]
```

Each message contains fields and sub-dictionaries.

Now our consumer **is a little more invovled** than it was above, because we need to make sure the brackets and commas are in the right place:

In [10]:
n = 0
first = True
# To ensure we don't leave out file open we work within a `with` scope
with open(OUTPUT_PATH / f"message_BASIC.tmp.json", "w", encoding="utf-8") as f:
    # first we write the opening square bracket for the json list
    f.write("[\n")
    while n < N:
        msg = consumer.poll(timeout=20)
        if msg is None:
            break
        if msg.error():
            raise LasairError("Error while consuming message: {}".format(msg.error()))
            break
        # 2. If we make it here it means we have messages. 
        raw = msg.value()
        # msg.value() may be bytes or str depending on client
        if isinstance(raw, bytes):
            raw = raw.decode("utf-8")

        # 3. Get the JSON data for our alert.
        result = json.loads(raw)
        
        # write comma before each object after the first
        if not first:
            f.write(",\n")
        first = False
        
        json.dump(result, f, indent=2, ensure_ascii=False)


        n += 1
    f.write("]\n")


Above we saved the data to a `.tmp.json` file which we will now rename. This practice is called "saving files atomically" and it's a way to not overwrite a good file with corrupted data. If the while loop above breaks halfway through we will be able to tell the good from the bad files. (For example Vim has Swap files for the same reason). 


Once we are happy everything has run properly we can replace our tmp file name with its final name.

In [11]:
# Clean up the temporary files and rename them to .json
import os


os.replace(str(OUTPUT_PATH / f"message_BASIC.tmp.json"), str(OUTPUT_PATH / f"message_BASIC.json"))

<h2 align="center"> 3. Reading a Basic Alert File </h2>

I am going to show you how to handle these with pandas since it already has excellent JSON support. 
This tutorial will not give you a "raw python" solution. 

In [12]:
import pandas as pd

Pandas already has a `read_json` function, which works quite well even for nested data structures (which we will need later).

In [13]:
dat = pd.read_json(OUTPUT_PATH/"message_BASIC.json") 

In [14]:
dat.head()

Unnamed: 0,diaObjectId,lastDiaSourceMjdTai,latestR,nDiaSources,ra,decl,separationArcsec,UTC
0,313963356964782123,61069.082988,0.977299,2,52.822487,-27.574832,1.599,
1,313871013231722745,61088.212444,0.936848,23,149.15694,0.772347,0.326,2026-02-17 05:13:10
2,313936975777235040,61088.097015,0.935171,42,52.509732,-28.27798,0.646,2026-02-17 05:13:10


As you can see the columns we have here are the same listed in our SQL query for the [Lasair Tutorial Basic Stream Filter](https://lasair-lsst-dev.lsst.ac.uk/filters/130/).

**WARNING: UPDATE LINK**





When creating your filter you can select from a few types of streams:
* Kafka stream: Just the fields you selected during Filter creation
* Lite lightcurve: The fields you selected at filter creation + the lightcurve history 
* Full Alert: The fields you selected + the full alert packet. 

In the example above we've only looked at the most basic form of output. 
Now we are going to play with the lightcurve and full alert modes.


[Docs Reference: Alert Streams](https://lasair-lsst.readthedocs.io/en/main/core_functions/alert-streams.html#alert-streams)

<h2 align="center"> 5. Lite Lightcurve Alerts [NOT YET AVAILABLE]</h2>

To get the lite Ligthcruev data we have to **Change our Topic to point to the right filter**. 

In [15]:
topic = "lasair_83lasair_tutorial_lite_lightcurve"

We also need to recreate our consumer to point to the right topic (note that changing the `group_id` here is unnecessary)

In [16]:
consumer = lasair_consumer(
    kafka_server,    
    group_id,        
    topic            
)

In [17]:
n = 0
first = True
# To ensure we don't leave out file open we work within a `with` scope
with open(OUTPUT_PATH / f"message_LiteLC.tmp.json", "w", encoding="utf-8") as f:
    # first we write the opening square bracket for the json list
    f.write("[\n")
    while n < N:
        msg = consumer.poll(timeout=20)
        if msg is None:
            break
        if msg.error():
            raise LasairError("Error while consuming message: {}".format(msg.error()))
            break
        # 2. If we make it here it means we have messages. 
        raw = msg.value()
        # msg.value() may be bytes or str depending on client
        if isinstance(raw, bytes):
            raw = raw.decode("utf-8")

        # 3. Get the JSON data for our alert.
        result = json.loads(raw)
        
        # write comma before each object after the first
        if not first:
            f.write(",\n")
        first = False
        
        json.dump(result, f, indent=2, ensure_ascii=False)


        n += 1
    f.write("]\n")

os.replace(str(OUTPUT_PATH / f"message_LiteLC.tmp.json"), str(OUTPUT_PATH / f"message_LiteLC.json"))

In [18]:
dat_llc = pd.read_json(OUTPUT_PATH/"message_LiteLC.json") 

<h2 align="center"> 6. Full Alert Packet Data </h2>

In [19]:
topic = "lasair_83lvra_feeder_full"
consumer = lasair_consumer(
    kafka_server,    
    group_id,        
    topic            
)
N = 3

%5|1771432401.898|REQTMOUT|rdkafka#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/1001: Timed out HeartbeatRequest in flight (after 45047ms, timeout #0)
%4|1771432401.898|REQTMOUT|rdkafka#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/1001: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
%5|1771432461.969|REQTMOUT|rdkafka#consumer-4| [thrd:lasair-lsst-dev-kafka_pub.lsst.ac.uk:9092/1001]: lasair-lsst-dev-kafka_pub.lsst.ac.uk:9092/1001: Timed out FindCoordinatorRequest in flight (after 60069ms, timeout #0)
%4|1771432461.969|REQTMOUT|rdkafka#consumer-4| [thrd:lasair-lsst-dev-kafka_pub.lsst.ac.uk:9092/1001]: lasair-lsst-dev-kafka_pub.lsst.ac.uk:9092/1001: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
%5|1771489464.953|REQTMOUT|rdkafka#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/1001: Timed out HeartbeatRequest in flight (after 45041ms, timeout #0)
%4|1771489464.953|REQTMOUT|rdkafka#consumer-4| [thrd:Group

In [None]:
n = 0
first = True
# To ensure we don't leave out file open we work within a `with` scope
with open(OUTPUT_PATH / f"message_LiteLC.tmp.json", "w", encoding="utf-8") as f:
    # first we write the opening square bracket for the json list
    f.write("[\n")
    while n < N:
        msg = consumer.poll(timeout=20)
        if msg is None:
            break
        if msg.error():
            raise LasairError("Error while consuming message: {}".format(msg.error()))
            break
        # 2. If we make it here it means we have messages. 
        raw = msg.value()
        # msg.value() may be bytes or str depending on client
        if isinstance(raw, bytes):
            raw = raw.decode("utf-8")

        # 3. Get the JSON data for our alert.
        result = json.loads(raw)
        
        # write comma before each object after the first
        if not first:
            f.write(",\n")
        first = False
        
        json.dump(result, f, indent=2, ensure_ascii=False)


        n += 1
    f.write("]\n")

os.replace(str(OUTPUT_PATH / f"message_LiteLC.tmp.json"), str(OUTPUT_PATH / f"message_LiteLC.json"))