<img width="200" src="https://mmlspark.blob.core.windows.net/graphics/emails/vw-blue-dark-orange.svg" />

# Contextual-Bandits using Vowpal Wabbit

[Azure Personalizer](https://azure.microsoft.com/en-us/products/cognitive-services/personalizer) emits logs in DSJSON-format. This example demonstrates how to perform off-policy evaluation.

#### Read dataset

In [None]:
from pyspark.sql import SparkSession

# Bootstrap Spark Session
spark = SparkSession.builder.getOrCreate()

from synapse.ml.core.platform import *

from synapse.ml.core.platform import materializing_display as display

In [None]:
import pyspark.sql.types as T
from pyspark.sql import functions as F

schema = T.StructType(
    [
        T.StructField("input", T.StringType(), False),
    ]
)

df = (
    spark.read.format("text")
    .schema(schema)
    .load("wasbs://publicwasb@mmlspark.blob.core.windows.net/decisionservice.json")
)
# print dataset basic info
print("records read: " + str(df.count()))
print("Schema: ")
df.printSchema()

In [None]:
display(df)

#### Use VowalWabbitFeaturizer to convert data features into vector

In [None]:
from synapse.ml.vw import VowpalWabbitDSJsonTransformer

df_ready = (
    VowpalWabbitDSJsonTransformer()
    .setDsJsonColumn("input")
    .transform(df)
    .withColumn("splitId", F.lit(0))
    .repartition(2)
)
df_ready.printSchema()

# exclude JSON to avoid overflow
display(df_ready.drop("input"))

#### Model Training

VowpalWabbits 
* trains a model for each split (=group)
* synchronizes accross partitions after every split
* store the 1-step ahead predictions in the model

In [None]:
from synapse.ml.vw import VowpalWabbitGeneric

model = (
    VowpalWabbitGeneric(
        passThroughArgs="--cb_adf --cb_type mtr --clip_p 0.1 -q GT -q MS -q GR -q OT -q MT -q OS --dsjson --preserve_performance_counters"
    )
    .setInputCol("input")
    .setSplitCol("splitId")
    .setPredictionIdCol("EventId")
    .fit(df_ready)
)

### Model Prediction

In [None]:
df_predictions = model.getOneStepAheadPredictions()  # .show(5, False)
df_headers = df_ready.drop("input")

df_headers_predictions = df_headers.join(df_predictions, "EventId")
display(df_headers_predictions)

In [None]:
from synapse.ml.vw import VowpalWabbitCSETransformer

metrics = VowpalWabbitCSETransformer().transform(df_headers_predictions)

display(metrics)

For each field of the reward column the metrics are calculated

In [None]:
per_reward_metrics = metrics.select("reward.*")

display(per_reward_metrics)