<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 20px;">
    <div style="flex: 0 0 auto; margin-left: 0; margin-bottom: 0; width: 300px; height: 200px; ">
        <img src="https://www.anl.gov/sites/www/files/2020-05/SAGE-image.jpg" alt="SAGE Logo"/>
    </div>
    <div style="flex: 0 0 auto; margin-left: auto; margin-bottom: 0;">
        <img src="https://nairrpilot.org/app/site/media/ndp.jpg" alt="NDP Logo" width="200"/>
    </div>
</div>

# SciDX SAGE Streaming Tutorial

In this tutorial, we will stream and manage data from **SAGE sensors** using the **SciDX platform**. You will learn to how to register, filter, and process real-time data streams efficiently.

This tutorial covers:

1. **Registering SAGE sensor data streams** with the sciDX API.
2. **Applying filters** to customize data streams for specific needs.
3. **Consuming and visualizing real-time data** to analyze sensor data dynamically.

### Initilizing

## Step 1: Setting Up the sciDX Client and Logging In

To start, we’ll import the necessary modules, set up our API URL and credentials, and then initialize the sciDX client. This client will allow us to interact with the sciDX platform for registering and consuming sensor data.

In [1]:
# Import the sciDX client and demo-specific modules
from scidx_streaming import StreamingClient
from pointofpresence import APIClient
from sage_demo import sensor_data, filters, SageDataProcessing, plot_temp_alerts
from dotenv import load_dotenv
import os

In [2]:
# Load environment variables from .env file
load_dotenv(dotenv_path="/home/jovyan/work/streaming_library/.env")

API_URL = os.getenv("API_URL")
USERNAME = os.getenv("USERNAME")
PASSWORD = os.getenv("PASSWORD")
EARTHSCOPE_USERNAME = os.getenv("EARTHSCOPE_USERNAME")
EARTHSCOPE_PASSWORD = os.getenv("EARTHSCOPE_PASSWORD")

In [3]:
# Initialize API Client
client = APIClient(base_url=API_URL, username=USERNAME, password=PASSWORD)

# Initialize Streaming Client
streaming = StreamingClient(client)
print(f"User ID: {streaming.user_id}")

User ID: d4402055-669b-4ea9-b98b-053877a61ea1


## Step 2: Registering BME280 Sensor Data from multiple SAGE nodes

In this step, we’ll register data streams from BME280 sensors, specifically for temperature, pressure, and humidity, into sciDX. Each sensor’s data will be registered as a unique resource with its respective URL.

### Data Streams
- **Temperature**
- **Pressure**
- **Humidity**

In [4]:
# Register each sensor data stream from the `sensor_data` list
for sensor in sensor_data:
    client.register_url(sensor)

## Step 3: Searching for Registered BME280 Sensor Data

With the BME280 sensor data registered, we can now search for these resources on the sciDX platform. Using the prefix `sage_demo_bme280`, we’ll retrieve all datasets related to this sensor.

In [5]:
# Search for all registered BME280 sensor datasets
search_results = client.search_datasets("sage_demo_bme280")
print(f"Number of datasets found: {len(search_results)}")

Number of datasets found: 3


## Step 4: Creating and Filtering a Data Stream Using sciDX Filtering Capabilities

In this step, we’ll create a Kafka data stream on sciDX with filters to select specific data and apply custom alerts. Here’s how the data is filtered and transformed:

- **State Selection**: Filters to only include data from California, Montana, Oregon, North Dakota, Michigan, and Illinois.
- **Data Mapping**: Maps sensor readings to temperature, pressure, and humidity fields.
- **State Assignment**: Associates each sensor with its corresponding state name.
- **Conditional Alerts**:
  - **Heatwave Alert**: Activates if temperature > 35°C or humidity < 25%.
  - **State-Specific Alerts**: Certain states have unique temperature thresholds that will trigger alerts:
    - Montana: Temperature > 40°C
    - Oregon: Temperature > 30°C
  - **Pressure Alert**: Activates if pressure exceeds 101,000 Pa.
- **Pressure Adjustment**: For certain temperature alerts, reduces pressure readings by 5%.

These filters give us a tailored view of the data, isolating specific conditions and providing custom alerts.

In [6]:
# Create the Kafka stream using sciDXClient with specified filters for SAGE data
stream = await streaming.create_kafka_stream(
    keywords=["sage_demo"],
    match_all=True,
    filter_semantics=filters
)

topic = stream.data_stream_id

print(f"Stream created: {topic}")

Stream created: data_stream_d4402055-669b-4ea9-b98b-053877a61ea1_2


## Step 5: Consuming the Filtered Stream Data

With the Kafka stream created and filters applied, we can now start consuming the filtered data in real time. Our consumer will listen continuously to incoming messages and populate a dynamic DataFrame as new data arrives.

In [7]:
# Start consuming Kafka messages from the created topic
consumer = streaming.consume_kafka_messages(topic)

#### Viewing a Simple Data Summary

We can view a summary of the latest data received, focusing on selected columns. The summary will also show the current total rows and columns.

In [18]:
# Display a summary of the raw received data
consumer.dataframe.head(10)

Unnamed: 0,name,timestamp,value,meta.host,meta.job,meta.node,meta.plugin,meta.sensor,meta.task,meta.vsn,meta.zone,humidity,state,alert
0,"[env.relative_humidity, env.relative_humidity]","[2025-01-06T23:30:25.104570364Z, 2025-01-06T23...","[12.556, 17.175]","[000048b02d35a9ce.ws-nxcore, 000048b02d3ae2f2....","[Pluginctl, Pluginctl]","[000048b02d35a9ce, 000048b02d3ae2f2]","[waggle/plugin-iio:0.6.0, waggle/plugin-iio:0....","[bme280, bme280]","[wes-iio-bme280, wes-iio-bme280]","[W085, W06F]","[core, core]","[12.556, 17.175]","[North Dakota, Montana]","[Heatwave, Heatwave]"


Unhandled error in URL stream processing: Cannot connect to host data.sagecontinuum.org:443 ssl:default [Connect call failed ('165.124.33.154', 443)]
Unhandled error in URL stream processing: Cannot connect to host data.sagecontinuum.org:443 ssl:default [Connect call failed ('165.124.33.154', 443)]


## Step 6: Processing and Visualizing Data

With data now being consumed, let’s set up a processor to organize and analyze the stream for easy viewing.

In [None]:
# Set up the data processor
processor = SageDataProcessing(consumer)

### Viewing Processed Data

Let’s view the latest 5 rows of the processed and aggregated data to see a snapshot of our current data.

In [None]:
processor.get_aggregated_df().tail(5)

## Step 7: Stopping Data Consumption and Processing

To wrap up, we’ll stop the data consumer and processor, ending the data flow and background tasks.

In [None]:
# Stop the data consumer and processor
processor.stop()
consumer.stop()

#### Cleaning Up Resources

Finally, we’ll delete the registered entries for the SAGE sensors to free up resources on the sciDX platform. This cleanup ensures that no unused resources remain after this tutorial.

In [19]:
consumer.stop()
await streaming.delete_stream(stream)

{'message': "Stream 'data_stream_d4402055-669b-4ea9-b98b-053877a61ea1_2' deleted successfully"}

In [20]:
# Get the resource IDs from the search results
resource_ids = [result['id'] for result in search_results]

# Delete both resources using the sciDXClient's delete_resource method
for resource_id in resource_ids:
    delete_response = client.delete_resource_by_id(resource_id)
    print(f"Deleted resource {resource_id}: {delete_response}")

Deleted resource c8e3cc04-53d6-4071-bd61-f867dd8afefa: {'message': 'c8e3cc04-53d6-4071-bd61-f867dd8afefa deleted successfully'}
Deleted resource 9c597ac7-b85c-4cf0-b8da-862c23385d43: {'message': '9c597ac7-b85c-4cf0-b8da-862c23385d43 deleted successfully'}
Deleted resource 9af9b424-9a98-4e59-92a8-6d2c07a54992: {'message': '9af9b424-9a98-4e59-92a8-6d2c07a54992 deleted successfully'}


In [None]:
# Clear all the user data streams
await streaming.delete_stream('all')