# Set up Signals for real-time calculation

Welcome to the [Snowplow Signals](https://docs.snowplow.io/docs/signals/) Quick Start [tutorial](https://docs.snowplow.io/tutorials/signals-quickstart/start).

This notebook is intended to be used for the final part of the tutorial, allowing you to explore attribute retrieval without having to integrate Signals into an application. It's hosted in Google Colab so you won't need to configure anything locally.

It also shows how you'd go about defining attribute groups and services using the [Python SDK](https://github.com/snowplow-incubator/snowplow-signals-sdk), instead of using the UI, and why that's a bad idea.

If you prefer to run the cells in one go with Run all, update your details in the required places first - they're marked with `UPDATE THIS`.

## Install and connect to Signals

Find your API credentials in [BDP Console](https://console.snowplowanalytics.com), on the **Signals** > **Overview** page. Add them to the Colab notebook secrets, using these names:
  * Signals API URL -> `SP_API_URL`
  * API key -> `SP_API_KEY`
  * API key ID -> `SP_API_KEY_ID`
  * Organization ID -> `SP_ORG_ID`

Then, run the following cell to install the SDK.

In [None]:
%pip install snowplow-signals

To connect to your deployment, initialize the Signals object with your API credentials:


In [None]:
from snowplow_signals import Signals
from google.colab import userdata

sp_signals = Signals(
    api_url=userdata.get('SP_API_URL'),
    api_key=userdata.get('SP_API_KEY'),
    api_key_id=userdata.get('SP_API_KEY_ID'),
    org_id=userdata.get('SP_ORG_ID'),
)

## Retrieve calculated attributes

Follow the tutorial instructions to find your current session ID from outgoing web events. Provide it as `identifier` below, to retrieve attributes that Signals has just calculated about your session.

In [None]:
response = sp_signals.get_service_attributes(
    name="quickstart_service",
    attribute_key="domain_sessionid",
    identifier="d99f6db1-7b28-46ca-a3ef-f0aace99ed86", # UPDATE THIS
)

df=response.to_dataframe()
df

To retrieve individual attributes rather than using a service, use the `get_attributes()` method.

In [None]:
response = sp_signals.get_attributes(
    name="quickstart_group",
    version=1,
    attributes=["page_view_count"],
    attribute_key="domain_sessionid",
    identifiers=["472f97c1-eec1-45fe-b081-3ff695c30415"]
)

df=response.to_dataframe()
df

## Defining attribute groups and services using the SDK

We strongly recommend using the BDP Console to manage your Signals configuration.

The following sections show how you'd use the Python SDK to define attribute groups and services. There's a high risk of mistakes in selecting the correct events and properties, in providing unique names, and in appropriately increasing the attribute group version as you evolve the definitions.

### Define what attributes to calculate

In this tutorial you will define an attribute group containing three attributes based on page view events.

#### Define attributes

The first attribute counts the number of page view events within the last 15 minutes. It uses the `counter` aggregation. The time window is defined by the `period` parameter.

The second attribute stores the last seen browser name (e.g. "Safari"), using the `last` aggregation. The `property` tells Signals where to look in the event for the value.

Browser information is appended to every event by the [YAUAA enrichment](/docs/pipeline/enrichments/available-enrichments/yauaa-enrichment/) as an entity with schema URI `iglu:nl.basjes/yauaa_context/jsonschema/1-0-1`. Within the event payload, this URI becomes `contexts_nl_basjes_yauaa_context_1`. The `property` defined in this attribute uses the `agentName` field from the YAUAA entity. Note the `[0]` index to access the entity data.

The third attribute stores the first seen referrer path, based on the `refr_urlhost` [atomic event property](/docs/fundamentals/canonical-event/#platform-specific-fields) and the `first` aggregation. By using a `criteria` filter, it's only calculated for page views where the referrer isn't an empty string. This is a trivial example just to demonstrate how to use filters.

In [None]:
from snowplow_signals import Attribute, Event, Criteria, Criterion
from datetime import timedelta

page_view_count = Attribute(
    name="page_view_count",
    description="Page views in the last 15 minutes.",
    type="int32",
    events=[
        Event(
            vendor="com.snowplowanalytics.snowplow",
            name="page_view",
            version="1-0-0",
        )
    ],
    aggregation="counter",
    period=timedelta(minutes=15),
)

most_recent_browser = Attribute(
    name="most_recent_browser",
    description="The last browser name tracked.",
    type="string",
    events=[
        Event(
            vendor="com.snowplowanalytics.snowplow",
            name="page_view",
            version="1-0-0",
        )
    ],
    aggregation="last",
    property="contexts_nl_basjes_yauaa_context_1[0].agentName",
)

first_referrer = Attribute(
    name="first_referrer",
    description="The first referrer tracked.",
    type="string",
    events=[
        Event(
            vendor="com.snowplowanalytics.snowplow",
            name="page_view",
            version="1-0-0",
        )
    ],
    aggregation="first",
    property="refr_urlhost",
    criteria=Criteria(
        all=[
            Criterion(
                property="page_referrer",
                operator="!=",
                value=""
            )
        ]
    ),
    default_value=None
)

#### Define an attribute group

Single attribute definitions can't be published to Signals, as they don't make sense outside of their attribute group context.

Because of the `domain_sessionid` attribute key, Signals will calculate these attributes as follows:
* How many page views in the last 15 minutes for each session
* The last seen browser name for each session
* The first seen referrer for each session

Update the code to use your own email address before running.

In [None]:
from snowplow_signals import StreamAttributeGroup, domain_sessionid

my_attribute_group = StreamAttributeGroup(
    name="quickstart_group_notebook",
    version=1,
    attribute_key=domain_sessionid,
    owner="user@company.com", # UPDATE THIS
    attributes=[
        page_view_count,
        most_recent_browser,
        first_referrer
    ],
)

#### Test the definitions

Test the group definitions against events in your atomic event table for the last hour.

In [None]:
data = sp_signals.test(
    attribute_group=my_attribute_group,
)
data

#### Define a service

Services allow you to retrieve attributes in bulk, from multiple attribute groups.

Update the code to use your own email address before running.

In [None]:
from snowplow_signals import Service

my_service = Service(
    name='quickstart_service_notebook',
    owner="user@company.com", # UPDATE THIS
    attribute_groups=[my_attribute_group]
)

### Deploy configuration

Publish the attribute group and service to Signals. They'll go live immediately: Signals will start processing events from your real-time stream, and will populate your Profiles Store with computed attributes.

In [None]:
sp_signals.publish([my_attribute_group, my_service])