## Batch attribute using Signals

This notebook creates a new view using the SDK that will be used as a basis to compute batch attributes via autogenerated dbt models based on the attribute definitions.

<div class="alert alert-info">
<b>Please note:</b>
Batch attributes are attributes that are computed in the warehouse. They typically span over 1 or more days which would not be possible to do in stream. 
</div>

### Flow of data

```mermaid
flowchart LR
    sp(Snowplow Pipeline)
    stream[/Autogenerated dbt model/]
    signals(Signals)

    sp --> stream
    stream --> signals
```

---

In [None]:
from snowplow_signals import Signals
from dotenv import load_dotenv
import os

load_dotenv()

sp_signals = Signals(
    api_url=os.environ["SNOWPLOW_API_URL"],
    api_key=os.environ["SNOWPLOW_API_KEY"],
    api_key_id=os.environ["SNOWPLOW_API_KEY_ID"],
    org_id=os.environ["SNOWPLOW_ORG_ID"],
)

### Define the entity (coming soon!)

For now, you can only compute attributes for `domain_userid`, however, soon this will be flexible for you to change. (e.g. you would be able to change the default to let's say `user_id` or any other Snowplow user field)

### Define a new attribute

There are 4 main types of attributes that you may likely want to define: 
1. `Time Windowed Attributes`: Actions that happened in the `last_x_number of days`. Period needs to be defined as timedelta in days.

2. `First Touch Attributes`: Events (or properties) that happened for the first time for a given entity. Period needs to be left as None.

3. `Last Touch Attributes`: Events (or properties) that happened for the last time for a given entity. Period needs to be left as None.

4. `Lifetime Attributes`s, calculated over all the available data for the entity. Period needs to be left as None.

We have illustrated each of these 4 types with an example block below. 

1. `products_added_to_cart_attribute_last_7_days`: This attribute calculates the number of add to cart ecommerce events in the last 7 days

2. `total_product_price_clv`: This attribute is calculated across the customer lifetime

3. `first_mkt_source`: This attribute takes the first page_view event and reads the mkt_source property for a specific entity (e.g. domain_userid)

4. `last_device_class`: This attribute takes the first page_view event and extracts and retrieves the yauaa deviceClass property for a specific entity (domain_userid)

Each block creates a single attribute definition including the logic how it should be calculated (its filters and aggregation).

In [None]:
from snowplow_signals import (
    Attribute,
    Criteria,
    Criterion,
    Event,
)
from datetime import timedelta

products_added_to_cart_last_7_days = Attribute(
    name="products_added_to_cart_last_7_days",
    type="string_list",
    events=[
        Event(
            vendor="com.snowplowanalytics.snowplow",
            name="snowplow_ecommerce_action",
            version="1-0-2",
        )
    ],
    aggregation="unique_list",
    property="contexts_com_snowplowanalytics_snowplow_ecommerce_product_1[0].name",
    criteria=Criteria(
        all=[
            Criterion(
                property="unstruct_event_com_snowplowanalytics_snowplow_ecommerce_snowplow_ecommerce_action_1:type",
                operator="=",
                value="add_to_cart",
            ),
        ],
    ),
    period=timedelta(days=7),
)

total_product_price_clv = Attribute(
    name="total_product_price_clv",
    type="float",
    events=[
        Event(
            vendor="com.snowplowanalytics.snowplow",
            name="snowplow_ecommerce_action",
            version="1-0-2",
        )
    ],
    aggregation="sum",
    property="contexts_com_snowplowanalytics_snowplow_ecommerce_product_1[0].price",
    criteria=Criteria(
        all=[
            Criterion(
                property="unstruct_event_com_snowplowanalytics_snowplow_ecommerce_snowplow_ecommerce_action_1:type",
                operator="=",
                value="add_to_cart"
            )   
        ]
    ),
)

first_mkt_source = Attribute(
    name="first_mkt_source",
    type="string",
    events=[
        Event(
            vendor="com.snowplowanalytics.snowplow",
            name="page_view",
            version="1-0-0",
        )
    ],
    aggregation="first",
    property="mkt_source",
)

last_device_class = Attribute(
    name="last_device_class",
    type="string",
    events=[
        Event(
            vendor="com.snowplowanalytics.snowplow",
            name="page_view",
            version="1-0-0",
        )
    ],
    aggregation="last",
    property="contexts_nl_basjes_yauaa_context_1[0]:deviceClass",
)


Now go ahead and rewrite the above example code block to fit your need and add as many attributes as you like that you want to be processed together in the same dbt project and ultimately one attribute table.

<div class="alert alert-info">
<b>Please note:</b>
The name of the attribute will be a column name in the warehouse, be mindful of the warehouse limitations (keep it descriptive but concise). 
</div>

### Wrapping the attribute in a view

All attributes need to be included in views that can be considered as "tables" of attributes which will be processed together in the warehouse using dbt in an incremental fashion to save costs. Views are immutable and versioned. 

Below you can see how you can create a view using the attribute definitions provided earlier.

In [None]:
from snowplow_signals import View, domain_userid

view = View(
    name="batch_ecommerce_attributes",
    version=1,
    entity=domain_userid,
    offline=True,
    attributes=[
        products_added_to_cart_last_7_days,
        total_product_price_clv,
        first_mkt_source,
        last_device_class
    ],
)

### Testing the view

Execute the view from the atomic events table to verify that it works correctly. To keep the test simple the data will be filtered on the last hour only regardless of the period defined and the results will also be limited per 10 users.

In [None]:
data = sp_signals.test(
    view,
    app_ids=["website"],
)
data

### Applying the view to Signals

The following block pushes the view definition to the Signals API and makes it available for processing.

In [None]:
sp_signals.apply([view])

### Generating the dbt project 

Follow our [Tutorial](https://deploy-preview-1197--snowplow-docs.netlify.app/tutorials/snowplow-signals-cli/start/) on how to use the Signals CLI tool to generate the dbt project for this view. Once you are happy with the output in your warehouse and you have your attributes table generated related to your view, you can also use the CLI to materialize the table, which means that from that point onwards Signals will automatically update your attributes and sends the new records to the pipeline.



## Fetching the data

Once the attributes are materialized and the cron job picked up the values (default: update every 5 minutes) you will be able to test that the attribute values are consumed by the Signals pipeline. Make sure you specify a test user identifier that is part of your latest attributes table (`domain_userid` by default):

In [None]:
view = sp_signals.get_view(name="ecommerce_transaction_interactions_features", version=1)

response = sp_signals.get_online_attributes(
    source=view,
    identifiers=[
        "20928cd8f9927721e57a327b00be987f72237bff7a32cb840db3c202e5805995",
    ],
)

response.to_dataframe()