Add slate bandit dataset #82

fullflu · 2021-03-14T15:35:32Z

Overview

Add SyntheticSlateBanditDataset (based on SyntheticBanditDataset)

Tasks

Implement behavior policy
Test behavior policy
Implement five reward structures
Implement click models
Test reward
fix document
ignore E203 (flake8): https://black.readthedocs.io/en/stable/the_black_code_style.html#slices

usaito · 2021-03-17T12:30:46Z

@fullflu Thanks! I've looked through the slate dataset class and suggest some rephrasing as listed below.

impression_id -> slate_id
pscore_joint_above -> pscore_cascade
pscore_joint_all -> pscore
pscore_marginal -> pscore_item_position
action_set = np.arange(self.n_actions) -> unique_action_set = np.arange(self.n_actions)
you are using get sometimes which is ambiguous, I suggest you use more concrete verbs such as obtain, calc, or define
you set RIPS to reward_structure but RIPS is an estimator not an assumption. I think reward_structure should be one of cascade(=RIPS), item_position(=IIPS), or None (=SIPS, which allows any complex interaction)
def self.sample_action( -> self.sample_action_and_obtain_pscore(

[nits]

you use np.arange and range alternately. I suggest you use only np.arange (just because I use it in other parts of the package) if there is no reason to use both
https://github.com/fullflu/zr-obp/blob/46ea0ce278cab090dc589ed69743e61eeefd63b1/obp/dataset/synthetic_slate.py#L250-L254

I think you can implement the same process as follows

# calculate joint pscore
pscore_i *= score_[sampled_action_index]
pscore_joint_above[i * self.len_list + position_] = pscore_i

…ewards depend on those of other slots

fullflu · 2021-04-03T09:53:45Z

fullflu · 2021-04-03T12:46:24Z

Error: Unable to resolve action psf/black@stable, unable to find version stable

psf/black#2079

usaito · 2021-04-03T21:08:37Z

@fullflu

Thanks! Overall, the slate stuff is great; I can't wait to see some simulations results!

Please address the following minor comments.

[must]

Please specify the output of the function. (and why does random_state come first?). Also, please explain what this function does.
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L581
"slot weighted" is very confusing. How about "reward function incorporating additive interactions among combinatorial action"?
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L601-602

[imo]

It may be better to rephrase it as n_unique_action. We also call the number of combinatorial actions as the number of actions, so we may want to differentiate these different concepts. How do you think?
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L156
I think it is better to automatically generate exam_weight in the class (e.g./ $\theta(k) = 1/k$, $k$ is the position).
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L69-L71
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L212-215
how about the following code?

        slot_weight_matrix = np.identity(len_list)
        for position_ in np.arange(len_list):
            slot_weight_matrix[:, position_] = -1 / np.exp(
                np.abs(np.arange(len_list) - position_)
            )
        return slot_weight_matrix

instead of the current one below.
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L269-L278

action_effect_matrix -> action_interaction_matrix
action_interaction_exponential_reward_function
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L709
action_interaction_matrix (which is the same wording as used in def action_effect_additive_reward_function()
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L714
the output should be named "expected_reward_factual"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L800-L805

[ask]

I couldn't see why you are using separate words "action_effect_matrix" and "slot_weight_matrix" for the same concept. I think you should use only action_interaction_matrix for both.
I couldn't figure out why this is needed
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L263

[nits]

"... sampled based on ..."
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L64-66
the error message should be updated (now it uses "-IPS")
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L210
"Calculate the marginal propensity score, i.e., the probability that an action (specified by action_list) is presented at a position."
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L283
"Actions sampled by a behavior policy."
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L324
"When n_actions and len_list are large, giving True to this parameter may lead to a large computational time."
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L314
"When True is given, actions are sampled by the uniform random behavior policy."
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L318
"sampled_action"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L358
"impute joint pscore"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L393
"when return_exact_uniform_pscore_item_position is True, behavior_policy_function must be specified"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L495
"sample actions and calculate pscores"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L514
"behavior_policy_logit_ has an invalid shape", "expected_reward_factual has an invalid shape"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L513
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L559
Can you use np.arange(, just to be consistent?
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L380
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L572
"muptiplied by" -> "times"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L774
"Expected rewards given factual actions"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L642
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L753
action_weight -> action_interaction_weight
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L791

fullflu · 2021-04-04T12:08:32Z

[must]

Please specify the output of the function. (and why does random_state come first?). Also, please explain what this function does.
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L581
"slot weighted" is very confusing. How about "reward function incorporating additive interactions among combinatorial action"?
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L601-602

[imo]

It may be better to rephrase it as n_unique_action. We also call the number of combinatorial actions as the number of actions, so we may want to differentiate these different concepts. How do you think?
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L156
I think it is better to automatically generate exam_weight in the class (e.g./ $\theta(k) = 1/k$, $k$ is the position).
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L69-L71
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L212-215
how about the following code?

        slot_weight_matrix = np.identity(len_list)
        for position_ in np.arange(len_list):
            slot_weight_matrix[:, position_] = -1 / np.exp(
                np.abs(np.arange(len_list) - position_)
            )
        return slot_weight_matrix
instead of the current one below.

https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L269-L278

--> skipped (because those lines are necessary to generate upper triangular matrix)

action_effect_matrix -> action_interaction_matrix
action_interaction_exponential_reward_function
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L709
action_interaction_matrix (which is the same wording as used in def action_effect_additive_reward_function()
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L714
the output should be named "expected_reward_factual"
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L800-L805

[ask]

I couldn't see why you are using separate words "action_effect_matrix" and "slot_weight_matrix" for the same concept. I think you should use only action_interaction_matrix for both.
I couldn't figure out why this is needed
https://github.com/fullflu/zr-obp/blob/2a8b38b19698cb847eaa202b7c53f587c2ad526b/obp/dataset/synthetic_slate.py#L263
--> This line is necessary to rewrite -1 to 1 of the diagonal components

[nits]

usaito · 2021-04-11T09:10:09Z

@fullflu Thanks! Some additional comments.

dataset/synthetic_slate.py

[ask]

Don't you need type hints?
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L252
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L263

[nits]

"Number of unique actions"
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L34
"... which must be one of ..."
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L50
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L205
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L209
It seems we cannot set this argument now
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L106
we don't need exam_weight argument any more
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L157
the function names should be like ...action_interaction_matrix(
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L252-L253
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L263-L264
expected_reward_factual *= self.exam_weight,
https://github.com/fullflu/zr-obp/blob/18e8958509e0ce38286c16d28af7187d4e41b1b2/obp/dataset/synthetic_slate.py#L414

dataset/test_synthetic_slate.py

[nits]

fullflu added 3 commits March 15, 2021 00:28

add slate bandit dataset

142bb20

add basic reward functions

3d6ca4e

add reward generator

46ea0ce

fullflu added 4 commits March 21, 2021 17:44

fix sample_reward_given_expected_reward; add reward structure where r…

4d7326c

…ewards depend on those of other slots

clean reward functions; add several comments

95a5440

fix reward structures; add test of slate dataset

b88b0b8

add option of calculating exact pscore marginal of random policy

e48e5de

fullflu added 2 commits April 3, 2021 19:23

apply review

4c208f4

add comment; add click models

a52d860

fullflu changed the title ~~[WIP] add slate bandit dataset~~ Add slate bandit dataset Apr 3, 2021

merge master

2a8b38b

fullflu added 9 commits April 4, 2021 21:16

add comment and fix argument order (generate_symmetic_matrix)

9a6edd9

unify slot_weight and action_effect -> action_interaction

46bcc7a

n_actions -> n_unique_action

4d96d75

result -> expected_reward_factual

247bedd

remove exam weight from initialization

c12b0e0

fix nits

a8b9179

add input validation of slate dataset

42be0a5

ignore E203

5efae77

remove TODO comment

18e8958

fullflu added 2 commits April 17, 2021 14:07

apply review of synthetic_slate

8f80615

fix bugs; apply review of test_synthetic_slate

8066412

usaito merged commit 9bd83c2 into st-tech:master Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add slate bandit dataset #82

Add slate bandit dataset #82

fullflu commented Mar 14, 2021 •

edited

Loading

usaito commented Mar 17, 2021

fullflu commented Apr 3, 2021 •

edited

Loading

fullflu commented Apr 3, 2021

usaito commented Apr 3, 2021 •

edited

Loading

fullflu commented Apr 4, 2021 •

edited

Loading

usaito commented Apr 11, 2021 •

edited

Loading

Add slate bandit dataset #82

Add slate bandit dataset #82

Conversation

fullflu commented Mar 14, 2021 • edited Loading

Overview

Tasks

usaito commented Mar 17, 2021

fullflu commented Apr 3, 2021 • edited Loading

fullflu commented Apr 3, 2021

usaito commented Apr 3, 2021 • edited Loading

fullflu commented Apr 4, 2021 • edited Loading

usaito commented Apr 11, 2021 • edited Loading

dataset/synthetic_slate.py

dataset/test_synthetic_slate.py

fullflu commented Mar 14, 2021 •

edited

Loading

fullflu commented Apr 3, 2021 •

edited

Loading

usaito commented Apr 3, 2021 •

edited

Loading

fullflu commented Apr 4, 2021 •

edited

Loading

usaito commented Apr 11, 2021 •

edited

Loading