The Contextual Meta-Bandit (CMB) can be used to select models using the context with online learning based on Reiforcement Learning problem. It's can be used for recommender system ensemble, A/B test, and other dynamic model selector problem.
For each interaction, the environment provides an observation (which contains contextual information). The Meta-Bandit uses it to select one of the recommenders and let the selected one decide the action (recommender item). The environment receives this action and calculates the reward given to the Meta-Bandit.
- python=3.6.7
- pandas=0.25.1
- scipy=1.3.1
- numpy=1.17.0
- scikit-learn=0.21.2
- requests=2.18.4
- unidecode==1.1.1
- creme==0.5.1
- bentoml==0.7.8
- pyyaml
- responses==0.10.14
- prometheus_client
CMB can be installed using conda:
>>> conda env create -f environment.yml
>>> conda activate meta-bandit-selector
arms:
arm1: http://arm.localhost/predict
arm2: http://arm.localhost/predict
arm3: http://arm.localhost/predict
bandit_policy_params:
epsilon: 0.1
seed: 42
We implemented two different policies for the bandit.
- e-greedy (policy/e_greedy.py)
- Softmax (policy/softmax.py)
>> python package.py --h
usage: package.py [-h] [--config-path CONFIG_PATH]
[--polity-module POLITY_MODULE] [--polity-cls POLITY_CLS]
Process some integers.
optional arguments:
-h, --help show this help message and exit
--config-path CONFIG_PATH
--polity-module POLITY_MODULE
--polity-cls POLITY_CLS
Example of Package meta-bandit with e-greedy:
>> python package.py \
--config-path config/config_egreedy.yml \
--polity-module policy.e_greedy \
--polity-cls EGreedyPolicy
>> bentoml serve MetaBanditClassifier:latest
In the prediction it is necessary to send the context information that will be used by the meta-bandit and the arm's input.
{
"context": {
// context information for Meta-Bandit
},
"input": {
// input arms
}
}
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '{
"context": {
"f1": 1,
"f2": 0
},
"input": {
"user": 1,
"items": [
0,
1,
3,
7,
4,
6,
5,
2
]
}
}' \
http://localhost:5000/predict
Request result:
{
"result": {..}, // Arm Result
"bandit": {
"arm": "arm1"
}
}
The /update method should be used to update the meta-bandit oracle for online learning. The result of the past action, containing the context and the reward, must be sent to the meta-bandit.
{
"context": {
// context
},
"arm": "arm1",
"reward": 1
}
we use prometheus.io to monitor the meta-bandit. It is possible to monitor the performance metrics of the oracle and the execution of each arm.
BENTOML_MetaBanditClassifier_oracle_metric_sum
BENTOML_MetaBanditClassifier_oracle_metric_count
BENTOML_MetaBanditClassifier_oracle_metric_create
BENTOML_MetaBanditClassifier_arm_total
Example of arm exploration in a recommendation system:
....
>> python -m unittest tests/src/test_*
Please cite the associated paper for this work if you use this code [Paper]:
@inproceedings{santana2020contextual,
title={Contextual Meta-Bandit for Recommender Systems Selection},
author={Santana, Marlesson RO and Melo, Luckeciano C and Camargo, Fernando HF and Brand{\~a}o, Bruno and Soares, Anderson and Oliveira, Renan M and Caetano, Sandor},
booktitle={Fourteenth ACM Conference on Recommender Systems},
pages={444--449},
year={2020}
}
https://dl.acm.org/doi/10.1145/3383313.3412209
Copyright ---
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.