<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 20px;">
    <div style="flex: 0 0 auto; margin-left: 0; margin-bottom: 0;">
        <img src="https://scidx.sci.utah.edu/wp-content/uploads/2024/12/logo-sm.png" alt="scidx Logo"/>
    </div>
<!--    <div style="flex: 0 0 auto; margin-left: auto; margin-bottom: 0;">
        <img src="https://nairrpilot.org/app/site/media/ndp.jpg" alt="NDP Logo" width="200"/>
    </div> -->
</div>

# SciDX Streaming Capabilities Demonstration: NetCDF Streaming 

This demonstration showcases the **SciDX Streaming capabilities**, leveraging both the **SciDX POP Library** for managing data objects and the **Streaming Library** for real-time data streaming and processing of a NetCDF data file.


In [None]:
from scidx_streaming import StreamingClient
from pointofpresence import APIClient

Here, we:
1. Initialize the `APIClient` to handle data registration and discovery.
2. Initialize the `StreamingClient` to handle real-time data streams.

In [None]:
# Get the token from https://token.ndp.utah.edu
API_URL="155.101.6.191:8003"
TOKEN="TOKEN_PLACEHOLDER"
# Initialize the POP client for data registration and discovery
client = APIClient(base_url=API_URL, token=TOKEN)

# Initialize the Streaming client for real-time data streaming
streaming = StreamingClient(client)
print(f"Streaming Client initialized. User ID: {streaming.user_id}")

# Basic use:

### 1. Register a NetCDF file

In this step, we will use the **POP client**, and the metadata for resgitsering an online **NetCDF** into our POP.

In [None]:
# Register the NetCDF data with the POP client
netcdf_metadata = {
    "resource_name": "netcdf_example_noaa",
    "resource_title": "Example NetCDF",
    "owner_org": "saleem_test",
    "resource_url": "https://noaa-nesdis-tcprimed-pds.s3.amazonaws.com/v01r01/final/1987/AL/02/TCPRIMED_v01r01-final_AL021987_SSMI_F08_000766_19870813094358.nc",
    "file_type": "NetCDF",
    "notes": "Some additional notes about the resource.",
    "processing": {
        "group": "overpass_metadata"
    }
}

try:
    print(client.register_url(netcdf_metadata))
    print('Correctly registered')
except ValueError as e: # If the dataset already exists just show the error
    print(e)

### 2. Search for the registered entry

This ensures the dataset is discoverable for use by the Data Consumers.

In [None]:
# Search for the registered Earthscope data stream
search_results = client.search_datasets("netcdf_example_noaa", server="local")
print(f"Number of datasets found: {len(search_results)}")

### 3. Create a Data Stream from the registered entry

In [None]:
# Create a Kafka stream without filters
stream = await streaming.create_kafka_stream(
    keywords=["netcdf_example_noaa"]
)

# Retrieve the stream's topic name
topic = stream.data_stream_id
print(f"Stream created: {topic}")

### 4. Consume the Streamed Data 

In [None]:
# Start consuming the filtered Kafka stream
consumer = streaming.consume_kafka_messages(topic)

In [None]:
# After some seconds you can visualize the dataset
consumer.dataframe

## 5: Stop Data Consumption and Clean up 

To wrap up, we will: 
1. Stop the data consumer to halt data processing.
2. Delete the created stream from the Kafka topic using the Streaming client.
3. Remove the registered dataset using the POP client.

This ensures all resources and background tasks are properly released.

In [None]:
# Stop the Kafka consumer
consumer.stop()

# Delete the Kafka stream
await streaming.delete_stream(stream)

# Delete the registered dataset from the POP system
client.delete_resource_by_id(search_results[0]["id"])
print("Cleanup completed: Stream and registered dataset deleted.")

# Advanced Use:

### 1. Register and Pre-process a NetCDF file

This time we will map the values of interest

In [None]:
# Register the NetCDF data with the POP client
netcdf_metadata = {
    "resource_name": "netcdf_example_noaa_advanced",
    "resource_title": "Example NetCDF",
    "owner_org": "saleem_test",
    "resource_url": "https://noaa-nesdis-tcprimed-pds.s3.amazonaws.com/v01r01/final/1987/AL/02/TCPRIMED_v01r01-final_AL021987_SSMI_F08_000766_19870813094358.nc",
    "file_type": "NetCDF",
    "notes": "Some additional notes about the resource.",
    "mapping": {
        "lat": "latitude",
        "lon": "longitude",
        "horizontal": "x",
        "vertical": "y",
        "bins": "angle_bins"
    },
    "processing": {
        "group": "passive_microwave/S2"
    }
}

try:
    print(client.register_url(netcdf_metadata))
    print('Correctly registered')
except ValueError as e: # If the dataset already exists just show the error
    print(e)

### 2. Search for the registered entry

This ensures the dataset is discoverable for use by the Data Consumers.

In [None]:
# Search for the registered Earthscope data stream
search_results = client.search_datasets("netcdf_example_noaa_advanced", server="local")
print(f"Number of datasets found: {len(search_results)}")

### 3. Create a Data Stream from the registered entry with Filters

The *filtering capabilities* allow us to refine the data stream by applying conditions, alerts, and transformations.

#### Filtering capabilities: 

| **Type**                        | **Explanation**                                             | **Example**                                       |
|---------------------------------|-------------------------------------------------------------|---------------------------------------------------|
| Column Comparisons              | Column-to-column comparisons                                | `x > y`                                           |
| Mathematical Operations         | Addition, subtraction, multiplication and division          | `x > 10*y`                                        |
| IN Operator                     | Check if values are in a list                               | `station IN ['A', 'B']`                           |
| Conditional Logic (IF-THEN-ELSE)| Apply rules based on conditional statements                 | `IF x > 20 THEN alert = High ELSE y = 10`         |
| Logical Operators (AND, OR)     | Combine multiple conditions using AND and OR operators       | `IF x > 10 OR z = 20 THEN alert = High ELSE alert = Low` |
| Window-Based Filtering          | Calculate aggregates (mean, sum, max, min) over sliding windows | `IF window_filter(9, sum, x > 20) THEN alert = High` |


In [None]:
filters = [
    "lat > 25",
    "vertical < horizontal",
    "IF bins < 50 THEN alert = 0 ELSE alert = 1"
]

# Create a Kafka stream with the NetCDF selected data
stream = await streaming.create_kafka_stream(
    keywords=["netcdf_example_noaa_advanced"],
    match_all=True,
    filter_semantics=filters
)

# Retrieve the stream's topic name
topic = stream.data_stream_id
print(f"Stream created: {topic}")

In [None]:
# Start consuming the filtered Kafka stream
consumer = streaming.consume_kafka_messages(topic)

In [None]:
consumer.dataframe

## 5: Stop Data Consumption and Clean up 

To wrap up, we will: 
1. Stop the data consumer to halt data processing.
2. Delete the created stream from the Kafka topic using the Streaming client.
3. Remove the registered dataset using the POP client.

This ensures all resources and background tasks are properly released.

In [None]:
# Stop the Kafka consumer
consumer.stop()

# Delete the Kafka stream
await streaming.delete_stream(stream)

# Delete the registered dataset from the POP system
client.delete_resource_by_id(search_results[0]["id"])
print("Cleanup completed: Stream and registered dataset deleted.")