# SciDX Streaming Capabilities Demonstration 

This demonstration showcases the **SciDX Streaming capabilities**, leveraging both the **SciDX POP Library** for managing data objects and the **Streaming Library** for real-time data streaming and processing. 

## Objectives 

We will: 
1. **Register Earthscope data streams** using the SciDX POP Library (*Data Provider*).
2. **Discover and apply filters** to customize data streams for specific use cases (*Data Consumer*).
3. **Consume and visualize real-time data streams**.
  
## Workflow 

Below is a diagram illustrating the interaction between the data provider and consumer in the streaming workflow:

![Data Stream Library](data_stream_library.png) 

### Key Components: 
- **POP Library:** Used to register and discover data objects (acts as the Data Provider). Interacts with the POP API to register and manage data objects.
- **Streaming Library:** Used to create, manage, and consume real-time data streams (acts as the Data Consumer). Manages real-time data streams, including applying filters and consuming messages.

## Step 1: Setting Up the POP (Data Provider) and Streaming (Data Consumer) Clients

In this step, we will:
1. **Import necessary modules** for handling data streams.
2. **Initialize the Point of Presence (POP) client** to manage data registration and discoer*).
3. **Initialize the Streaming client** to manage and consume real-time data stmesses.


In [1]:
import asyncio
import time
from scidx_streaming import StreamingClient
from pointofpresence import APIClient
from earthscope_demo import earthscope_topic_metadata, filters, API_URL, USERNAME, PASSWORD, EARTHSCOPE_USERNAME, EARTHSCOPE_PASSWORD

Here, we:
1. Initialize the `APIClient` to handle data registration and discovery.
2. Initialize the `StreamingClient` to handle real-time data streams.

In [2]:
# Initialize the POP client for data registration and discovery
client = APIClient(base_url=API_URL, username=USERNAME, password=PASSWORD)

# Initialize the Streaming client for real-time data streaming
streaming = StreamingClient(client)
print(f"Streaming Client initialized. User ID: {streaming.user_id}")

Streaming Client initialized. User ID: d4402055-669b-4ea9-b98b-053877a61ea1


## Step 2: Register an Earthscope Data Stream (Data Provider)

In this step, we will use the **POP client**, and the metadata for accessing an **Earthscope data stream**, to register it into our POP.

In [3]:
# Register the Earthscope data stream with the POP client
client.register_kafka_topic(earthscope_topic_metadata)

{'id': '5e29e6a1-79f6-4f43-b941-de3774c7bcd8'}

## Step 3: Search for the Registered Earthscope Data Stream (Data Provider) 

Now that we have registered the data stream, we will: 
1. Use the **POP client** to search for datasets using the `search_datasets` method.
2. Verify that the **Earthscope data stream** is correctly registered by searching for it.

This ensures the dataset is discoverable for use by the Data Consumers.

In [4]:
# Search for the registered Earthscope data stream
search_results = client.search_datasets("earthscope_kafka_gnss_observations")
print(f"Number of datasets found: {len(search_results)}")

Number of datasets found: 1


# Transition: From Data Provider to Data Consumer 

The **POP client** (Data Provider) has completed its role in: 
1. Registering the **Earthscope data stream**.
2. Verifying its discoverability.

We now transition to the **Streaming client** (Data Consumer) to: 
1. Create customized data streams by applying filters.
2. Consume and visualize the real-time data.

## Step 4: Create a Stream with Filtered Data from the Earthscope Topic (Data Consumer) 

In this step, we’ll create a Kafka data stream for the registered **Earthscope topic**. The *filtering capabilities* allow us to refine the data stream by applying conditions, alerts, and transformations.

### Filtering Logic Breakdown: 
- **Station Selection**: Filters data to include specific stations (`SNCL`), ensuring the stream only processes data from:
    - **P505.PW.LY_.00**
    - **DHLG.CI.LY_.20**
    - **P159.PW.LY_.00**
- **Alert System**:
  - **High Quality Data (Blue Alert)**: Activated when `Q > 2,000,000`.
  - **Low Quality Data (Red Alert)**: Triggered when `Q ≤ 2,000,000`.
- **Dynamic Rate Adjustment**:
  - For data flagged with a **Red Alert**, the `rate` field is adjusted with a multiplier of `2`.

These filters allow us to isolate meaningful subsets of the data, trigger alerts dynamically, and transform the data stream for more actionable insights.

Here’s the filtering logic applied in this demonstration:
```python
filters = [ 
    "SNCL IN ['P505.PW.LY_.00', 'DHLG.CI.LY_.20', 'P159.PW.LY_.00']", 
    "IF Q > 2000000 THEN alert = blue", 
    "IF Q <= 2000000 THEN alert = red", 
    "IF alert = 'red' THEN rate = 2" 
]
```

In [5]:
# Create a Kafka stream for Earthscope data with filters applied
stream = await streaming.create_kafka_stream(
    keywords=["earthscope_kafka_gnss_observations"],
    match_all=True,
    filter_semantics=filters,
    username=EARTHSCOPE_USERNAME,
    password=EARTHSCOPE_PASSWORD
)

# Retrieve the stream's topic name
topic = stream.data_stream_id
print(f"Stream created: {topic}")

Stream created: data_stream_d4402055-669b-4ea9-b98b-053877a61ea1_2


## Step 5: Consuming the Filtered Stream Data 

With the Kafka stream created, we now: 
1. Initialize a **data consumer** using the `consume_kafka_messages` method.
2. Start **real-time consumption** of filtered data.

The consumer continuously listens for incoming messages and populates a dynamic DataFrame. 

**Note**: It may take a few seconds for data to populate due to real-time processing.

In [6]:
# Start consuming the filtered Kafka stream
consumer = streaming.consume_kafka_messages(topic)

In [8]:
# Display the first 10 rows of the consumed data
consumer.dataframe.head(10)

Unnamed: 0,time,Q,type,SNCL,coor,err,rate,alert
0,"[1736258066000, 1736258067000, 1736258068000, ...","[2908001.0, 2908001.0, 2908001.0, 2908001.0, 2...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[P505.PW.LY_.00, P505.PW.LY_.00, P505.PW.LY_.0...","[[0.093, -0.044, 0.089], [0.093, -0.047, 0.087...","[[0.008, 0.011, 0.03], [0.006, 0.008, 0.023], ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[blue, blue, blue, blue, blue, blue, blue, blu..."
1,"[1736259284000, 1736259285000, 1736259286000, ...","[1909041.0, 1510181.0, 1510191.0, 1909041.0, 1...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[DHLG.CI.LY_.20, P159.PW.LY_.00, P159.PW.LY_.0...","[[-0.001, -0.033, 0.029], [0.203, -0.043, -0.1...","[[0.03, 0.04, 0.084], [0.023, 0.029, 0.049], [...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[red, red, red, red, red, red, red, red, red, ..."
2,"[1736258117000, 1736258118000, 1736258119000, ...","[2908001.0, 2908001.0, 2908001.0, 2908001.0, 2...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[P505.PW.LY_.00, P505.PW.LY_.00, P505.PW.LY_.0...","[[0.094, -0.043, 0.106], [0.097, -0.039, 0.096...","[[0.006, 0.008, 0.023], [0.007, 0.009, 0.025],...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[blue, blue, blue, blue, blue, blue, blue, blu..."
3,"[1736258143000, 1736258144000, 1736258145000, ...","[2908001.0, 2908001.0, 2908001.0, 2908001.0, 2...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[P505.PW.LY_.00, P505.PW.LY_.00, P505.PW.LY_.0...","[[0.09, -0.04, 0.086], [0.089, -0.04, 0.085], ...","[[0.008, 0.012, 0.031], [0.006, 0.008, 0.023],...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[blue, blue, blue, blue, blue, blue, blue, blu..."
4,"[1736259315000, 1736259314000, 1736259315000, ...","[1510231.0, 1909041.0, 1909041.0, 1510221.0, 1...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[P159.PW.LY_.00, DHLG.CI.LY_.20, DHLG.CI.LY_.2...","[[0.199, -0.042, -0.165], [0.001, -0.035, 0.03...","[[0.027, 0.035, 0.059], [0.03, 0.039, 0.084], ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[red, red, red, red, red, red, red, red, red, ..."
5,"[1736259345000, 1736259344000, 1736259345000, ...","[1510261.0, 1909041.0, 1909041.0, 1510271.0, 1...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[P159.PW.LY_.00, DHLG.CI.LY_.20, DHLG.CI.LY_.2...","[[0.198, -0.046, -0.166], [-0.001, -0.037, 0.0...","[[0.032, 0.041, 0.068], [0.03, 0.039, 0.083], ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[red, red, red, red, red, red, red, red, red, ..."
6,"[1736258167000, 1736258168000, 1736258169000, ...","[2808001.0, 2808001.0, 2808071.0, 2808081.0, 2...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[P505.PW.LY_.00, P505.PW.LY_.00, P505.PW.LY_.0...","[[0.093, -0.038, 0.098], [0.095, -0.038, 0.095...","[[0.01, 0.014, 0.038], [0.011, 0.015, 0.039], ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[blue, blue, blue, blue, blue, blue, blue, blu..."
7,"[1736258210000, 1736258211000, 1736258212000, ...","[2808011.0, 2808021.0, 2808031.0, 2808051.0, 2...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU]","[P505.PW.LY_.00, P505.PW.LY_.00, P505.PW.LY_.0...","[[0.094, -0.041, 0.102], [0.091, -0.041, 0.099...","[[0.011, 0.015, 0.042], [0.012, 0.016, 0.043],...","[1, 1, 1, 1, 1, 1, 1, 1]","[blue, blue, blue, blue, blue, blue, blue, blue]"
8,"[1736259358000, 1736259355000, 1736259356000, ...","[1510221.0, 1909041.0, 1909041.0, 1510231.0, 1...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[P159.PW.LY_.00, DHLG.CI.LY_.20, DHLG.CI.LY_.2...","[[0.199, -0.045, -0.163], [-0.001, -0.037, 0.0...","[[0.026, 0.033, 0.055], [0.03, 0.039, 0.083], ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]","[red, red, red, red, red, red, red, red, red, ..."
9,"[1736259362000, 1736259361000, 1736259362000, ...","[1510271.0, 1909041.0, 1909041.0, 1510261.0, 1...","[ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ENU, ...","[P159.PW.LY_.00, DHLG.CI.LY_.20, DHLG.CI.LY_.2...","[[0.199, -0.046, -0.163], [-0.002, -0.036, 0.0...","[[0.033, 0.042, 0.07], [0.03, 0.039, 0.083], [...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[red, red, red, red, red, red, red, red, red, ..."


## Step 6: Stopping Data Consumption and Cleaning Up 

To wrap up, we will: 
1. Stop the data consumer to halt data processing.
2. Delete the created stream from the Kafka topic using the Streaming client.
3. Remove the registered dataset using the POP client.

This ensures all resources and background tasks are properly released.

In [9]:
# Stop the Kafka consumer
consumer.stop()

# Delete the Kafka stream
await streaming.delete_stream(stream)

# Delete the registered dataset from the POP system
client.delete_resource_by_id(search_results[0]["id"])
print("Cleanup completed: Stream and registered dataset deleted.")

Shutting down Kafka consumer for topic: public.gnss.positions.normalized.geojson.compact
Kafka consumer for topic: public.gnss.positions.normalized.geojson.compact has been shut down.
Cleanup completed: Stream and registered dataset deleted.
