<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 20px;">
    <div style="flex: 0 0 auto; margin-left: 0; margin-bottom: 0; width: 300px; height: 200px; ">
        <img src="https://www.anl.gov/sites/www/files/2020-05/SAGE-image.jpg" alt="SAGE Logo"/>
    </div>
    <div style="flex: 0 0 auto; margin-left: auto; margin-bottom: 0;">
        <img src="https://nairrpilot.org/app/site/media/ndp.jpg" alt="NDP Logo" width="200"/>
    </div>
</div>

# SciDX SAGE Streaming Tutorial

In this tutorial, we will stream and manage data from **SAGE sensors** using the **SciDX platform**. You will learn to how to register, filter, and process real-time data streams efficiently.

This tutorial covers:

1. **Registering SAGE sensor data streams** with the sciDX API.
2. **Applying filters** to customize data streams for specific needs.
3. **Consuming and visualizing real-time data** to analyze sensor data dynamically.

### Initilizing

## Step 1: Setting Up the sciDX Client and Logging In

To start, we’ll import the necessary modules, set up our API URL and credentials, and then initialize the sciDX client. This client will allow us to interact with the sciDX platform for registering and consuming sensor data.

In [None]:
# Import the sciDX client and demo-specific modules
from scidx.client import sciDXClient
from sage_demo import sensor_data, filters, SageDataProcessing, plot_temp_alerts

# Define URL of the sciDX API
API_URL = "https://vdc-192.chpc.utah.edu/scidx"

# Set user credentials
username = "demo@sage.com"
password = "sage"

In [2]:
# Initialize the sciDXClient with the API URL
client = sciDXClient(API_URL)

# Log in to the sciDX platform using the provided username and password
client.login(username, password)

## Step 2: Registering BME280 Sensor Data from multiple SAGE nodes

In this step, we’ll register data streams from BME280 sensors, specifically for temperature, pressure, and humidity, into sciDX. Each sensor’s data will be registered as a unique resource with its respective URL.

### Data Streams
- **Temperature**
- **Pressure**
- **Humidity**

In [3]:
# Register each sensor data stream from the `sensor_data` list
for sensor in sensor_data:
    response = client.register_url(sensor)
    print(f"{sensor['resource_title']} registered successfully with ID: {response['id']}")

BME280 Temperature registered successfully with ID: 6e5eb6e0-7050-4610-9257-302566316696
BME280 Pressure registered successfully with ID: 3354de76-4fc4-48fb-bc8e-1f462d823d70
BME280 Humidity registered successfully with ID: 9aebd5f8-8dc4-49fb-b380-eb7536dcbb13


## Step 3: Searching for Registered BME280 Sensor Data

With the BME280 sensor data registered, we can now search for these resources on the sciDX platform. Using the prefix `sage_demo_bme280`, we’ll retrieve all datasets related to this sensor.

In [4]:
# Search for all registered BME280 sensor datasets
search_results = client.search_resource(search_term="sage_demo_bme280")
print(f"Number of datasets found: {len(search_results)}")

Number of datasets found: 3


## Step 4: Creating and Filtering a Data Stream Using sciDX Filtering Capabilities

In this step, we’ll create a Kafka data stream on sciDX with filters to select specific data and apply custom alerts. Here’s how the data is filtered and transformed:

- **State Selection**: Filters to only include data from California, Montana, Oregon, North Dakota, Michigan, and Illinois.
- **Data Mapping**: Maps sensor readings to temperature, pressure, and humidity fields.
- **State Assignment**: Associates each sensor with its corresponding state name.
- **Conditional Alerts**:
  - **Heatwave Alert**: Activates if temperature > 35°C or humidity < 25%.
  - **State-Specific Alerts**: Certain states have unique temperature thresholds that will trigger alerts:
    - Montana: Temperature > 40°C
    - Oregon: Temperature > 30°C
  - **Pressure Alert**: Activates if pressure exceeds 101,000 Pa.
- **Pressure Adjustment**: For certain temperature alerts, reduces pressure readings by 5%.

These filters give us a tailored view of the data, isolating specific conditions and providing custom alerts.

In [5]:
# Create the Kafka stream using sciDXClient with specified filters for SAGE data
stream_response = client.create_kafka_stream(
    keywords=["sage_demo"], 
    filter_semantics=filters
)

print("Stream Created. Topic:", stream_response['topic'])

Stream Created. Topic: data_stream_604e99e6-a9a6-4cd1-9c47-d55e3b92eee0_1


## Step 5: Consuming the Filtered Stream Data

With the Kafka stream created and filters applied, we can now start consuming the filtered data in real time. Our consumer will listen continuously to incoming messages and populate a dynamic DataFrame as new data arrives.

In [6]:
# Start consuming Kafka messages from the created topic
consumer = client.consume_kafka_messages(topic=stream_response['topic'])

#### Viewing a Simple Data Summary

We can view a summary of the latest data received, focusing on selected columns. The summary will also show the current total rows and columns.

In [7]:
# Display a summary of the raw received data
consumer.summary(['timestamp', 'state', 'alert', 'temperature', 'humidity', 'pressure'])

Unnamed: 0,timestamp,state,alert,temperature,humidity,pressure
2841,"['2024-11-12T19:44:45.347679417Z', '2024-11-12T19:44:51.650720965Z']","['Montana', 'California']","['None', 'None']","[24.14, 34.95]",,


Unnamed: 0,Total Rows,Total Columns
0,2842,16



Exiting summary display...


## Step 6: Processing and Visualizing Data

With data now being consumed, let’s set up a processor to organize and analyze the stream for easy viewing and plotting.

In [8]:
# Set up the data processor
processor = SageDataProcessing(consumer)

#### Real-Time Temperature Alerts

We can now plot temperature data in real time, with alerts highlighting any critical conditions across states.

In [9]:
await plot_temp_alerts(processor)

FigureWidget({
    'data': [{'hoverinfo': 'text',
              'marker': {'color': 'red', 'size': 10, 'symbol': 'circle'},
              'mode': 'markers',
              'name': 'Alert',
              'showlegend': True,
              'text': [Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave, Heatwave, Heatwave, Heatwave, Heatwave, Heatwave,
                       Heatwave

Last 10 Alerts:
At 19:45:24, Heatwave alert in North Dakota.
At 19:45:54, Heatwave alert in North Dakota.
At 19:46:24, Heatwave alert in North Dakota.
At 19:46:54, Heatwave alert in North Dakota.
At 19:47:24, Heatwave alert in North Dakota.
At 19:47:54, Heatwave alert in North Dakota.
At 19:48:24, Heatwave alert in North Dakota.
At 19:48:54, Heatwave alert in North Dakota.
At 19:49:24, Heatwave alert in North Dakota.
At 19:49:54, Heatwave alert in North Dakota.
Visualization stopped.


### Viewing Processed Data

Let’s view the latest 5 rows of the processed and aggregated data to see a snapshot of our current data.

In [10]:
processor.get_aggregated_df().tail(5)

Unnamed: 0,timestamp,state,alert,temperature,humidity,pressure
185,2024-11-12 19:50:50+00:00,California,Heatwave,35.05,8.13,83145.039062
186,2024-11-12 19:50:54+00:00,Michigan,Heatwave,23.17,17.726,95903.0
187,2024-11-12 19:50:54+00:00,North Dakota,Heatwave,29.6,15.336,97771.886718
188,2024-11-12 19:50:54+00:00,Oregon,,22.97,49.745,100145.039062
189,2024-11-12 19:51:00+00:00,Illinois,,25.24,25.698,100755.875


## Step 7: Stopping Data Consumption and Processing

To wrap up, we’ll stop the data consumer and processor, ending the data flow and background tasks.

In [11]:
# Stop the data consumer and processor
processor.stop()
consumer.stop()

#### Cleaning Up Resources

Finally, we’ll delete the registered entries for the SAGE sensors to free up resources on the sciDX platform. This cleanup ensures that no unused resources remain after this tutorial.

In [12]:
# Get the resource IDs from the search results
resource_ids = [result['id'] for result in search_results]

# Delete both resources using the sciDXClient's delete_resource method
for resource_id in resource_ids:
    delete_response = client.delete_resource(resource_id)
    print(f"Deleted resource {resource_id}: {delete_response}")

Deleted resource 3354de76-4fc4-48fb-bc8e-1f462d823d70: {'message': '3354de76-4fc4-48fb-bc8e-1f462d823d70 deleted successfully'}
Deleted resource 6e5eb6e0-7050-4610-9257-302566316696: {'message': '6e5eb6e0-7050-4610-9257-302566316696 deleted successfully'}
Deleted resource 9aebd5f8-8dc4-49fb-b380-eb7536dcbb13: {'message': '9aebd5f8-8dc4-49fb-b380-eb7536dcbb13 deleted successfully'}
