## Batch features using Signals

This notebook creates a new feature view using the SDK that will be computed using batch processing using an auto-generated dbt project with the same name.

<div class="alert alert-info">
<b>Please note:</b>
Batch features as features that are computed in the warehouse. They typically span over 1 or more days which would not be possible to do in stream. 
</div>

### Flow of data

```mermaid
flowchart LR
    sp(Snowplow Pipeline)
    stream[/Autogenerated dbt model/]
    signals(Signals)

    sp --> stream
    stream --> signals
```

---

In [None]:
from snowplow_signals import Signals

sp_signals = Signals(api_url='https://0fcfdf97-6447-4208-8cd0-39f82befbd07.svc.snplow.net')

### Define a new feature

There are 4 main types of features that you may likely want to define: 
1. Actions that happened in the `last_x_number of days`

2. events or properties that happened for the `first time`

3. `last time `

4. `overall aggregation`s. 

We have illustrated each of these 4 types with an example blocks below. 

1. `products_added_to_cart_feature_last_7_days`: This feature calculates the number of add to cart ecommerce events in the last 7 days (Please note how the period is noted -> period="P7D").

2. `total_product_price_clv`: This feature is calculated across the customer lifetime (period left as None)

3. `first_mkt_source`: This feature takes the first page_view event and reads the mkt_source property for a specific entity (domain_userid)

4. `last_device_class`: This feature takes the first page_view event and extracts and retrieves the yauaa deviceClass property for a specific entity (domain_userid)

Each block creates a single feature definition including the logic how it should be calculated (its filters and aggregation).







In [None]:
from snowplow_signals import (
    Feature,
    FilterCombinator,
    FilterCondition,
)
from datetime import timedelta

products_added_to_cart_feature_last_7_days = Feature(
    name="products_added_to_cart_last_7_days",
    dtype="STRING_LIST",
    events=[
        "iglu:com.snowplowanalytics.snowplow.ecommerce/snowplow_ecommerce_action/jsonschema/1-0-2"
    ],
    type="unique_list",
    property="contexts_com_snowplowanalytics_snowplow_ecommerce_product_1[0].name",
    scope="user",
    filter=FilterCombinator(
        combinator="and",
        condition=[
            FilterCondition(
                property="unstruct_event_com_snowplowanalytics_snowplow_ecommerce_snowplow_ecommerce_action_1:type",
                operator="equals",
                value="add_to_cart",
            ),
        ],
    ),
    period=timedelta(days=7),
)

total_product_price_clv = Feature(
    name="total_product_price_clv",
    dtype="FLOAT",
    events=[
        "iglu:com.snowplowanalytics.snowplow.ecommerce/snowplow_ecommerce_action/jsonschema/1-0-2"
    ],
    type="aggregation(sum)",
    property="contexts_com_snowplowanalytics_snowplow_ecommerce_product_1[0].price",
    filter=FilterCombinator(
        combinator="and",
        condition=[
            FilterCondition(
                property="unstruct_event_com_snowplowanalytics_snowplow_ecommerce_snowplow_ecommerce_action_1:type",
                operator="equals",
                value="add_to_cart"
            )   
        ]
    ),
)

first_mkt_source = Feature(
    name="first_mkt_source",
    dtype="STRING",
    events=[
        "iglu:com.snowplowanalytics.snowplow/page_view/jsonschema/1-0-0"
    ],
    type="first",
    property="mkt_source",
)

last_device_class = Feature(
    name="last_device_class",
    dtype="STRING",
    events=[
        "iglu:com.snowplowanalytics.snowplow/page_view/jsonschema/1-0-0"
    ],
    type="last",
    property="contexts_nl_basjes_yauaa_context_1[0]:deviceClass",
)


Now go ahead and rewrite the above example code block to fit your need and add as many features as you like that you want to be processed together in the same dbt project and ultimately one feature table.

<div class="alert alert-info">
<b>Please note:</b>
The name of the feature will be a column name in the warehouse, be mindful of the warehouse limitations (keep it descriptive but concise). 
</div>

### Wrapping the feature in a feature view

All features need to be included in feature views that can be considered as "tables" of features which will be processed together in the warehouse using dbt in an incremental fashion to save costs. Feature views are immutable and versioned. 

Below you can see how you can create a feature view using the feature definitions provided earlier.

In [None]:
from snowplow_signals import FeatureView, user_entity

feature_view = FeatureView(
    name="batch_ecommerce_features",
    version=1,
    entities=[
        user_entity,
    ],
    features=[
        products_added_to_cart_feature_last_7_days,
        total_product_price_clv,
        first_mkt_source,
        last_device_class
    ],
)

### Testing the feature view

Execute the feature view from the atomic events table to verify that it works correctly. To keep the test simple the data will be filtered on the last hour only regardless of the period defined and the results will also be limited per 10 users.

In [None]:
data = sp_signals.test(
    feature_view=feature_view,
    app_ids=["website"],
)
data

### Applying the feature view to Signals

The following block pushes the feature view definition to the Signals API and makes it available for processing.

In [None]:
sp_signals.apply([feature_view])

### Generating the dbt project 

Fetch and copy the API URL that you will need to generate the dbt project including the sql models that will incrementally update your features in your github repository. 

`filtered_events -> daily_aggregates -> features`


In [None]:
print("URL for the feature view:", f"{sp_signals.api_client.api_url}/api/v1/registry/feature_views/{feature_view.name}")