refactor: faster calc_ground_truth_policy_value in SyntheticSlateBanditDataset #102

aiueola · 2021-05-30T00:52:17Z

new feature

allowed len_list >= n_unique_actions setting when using is_factorizable
https://github.com/aiueola/zr-obp/blob/97ff9987716a7e1351ae30e135313a54e6e9fbbe/obp/dataset/synthetic_slate.py#L201

bug fix

fixed _calc_epsilon_greedy_pscore when using is_factorizable
https://github.com/aiueola/zr-obp/blob/97ff9987716a7e1351ae30e135313a54e6e9fbbe/obp/dataset/synthetic_slate.py#L1122

refactor

implemented faster calc_ground_truth_policy_value by avoiding for loops with n_rounds.
- refactored pscore calculation when is_factorizable=True.
  https://github.com/aiueola/zr-obp/blob/f126964583c8d265eeb323301874a8060fbddd28/obp/dataset/synthetic_slate.py#L828
- refactored action_interaction_reward_function (to calculate expected_reward_factual) with batch processing.
  https://github.com/aiueola/zr-obp/blob/f126964583c8d265eeb323301874a8060fbddd28/obp/dataset/synthetic_slate.py#L852
unified both action_interaction_additive_reward_function and action_interaction_decay_reward_function into action_interaction_reward_function for the faster implementation and preventing memory error.
https://github.com/aiueola/zr-obp/blob/f126964583c8d265eeb323301874a8060fbddd28/obp/dataset/synthetic_slate.py#L1162
add tqdm for pscore calclation when is_factorizable=False
https://github.com/aiueola/zr-obp/blob/f126964583c8d265eeb323301874a8060fbddd28/obp/dataset/synthetic_slate.py#L838

result

setting: len_list=5, n_unique_actions=10, n_rounds=1000,is_factorizable=True
pscore when is_factorizable=True: 1.5min
expected_reward_factural
- before: standard_additive = 7.5min, cascade_additive = 7min, standard_decay = 17min, cascade_decay = 17min, independent = 17min
- after: standard_additive = 7min, cascade_additive = 6min, standard_decay = 8.5min, cascade_decay = 6.5min, independent = 4min

test

add and edit corresponding tests.

others

minor fix on typos and docstrings

usaito · 2021-05-30T04:04:35Z

@aiueola Thanks!

[imo]

I think action_interaction_reward_function should be given action and enumerated_action separately (then, is_enumerated will be deleted). That's because enumerated_action is not sampled from a prob distribution and they are conceptually different.
https://github.com/aiueola/zr-obp/blob/b20d8158b67683c0a479ff8fe8fe6451a2a132c8/obp/dataset/synthetic_slate.py#L1171

Some minor points are below.

Desc of the function ("Reward function incorporating additive interactions among combinatorial action") should be updated
https://github.com/aiueola/zr-obp/blob/b20d8158b67683c0a479ff8fe8fe6451a2a132c8/obp/dataset/synthetic_slate.py#L1175
Do you mean If is_enumerated=True by saying If not is_enumerated=False? If so, please avoid indirect expressions as possible as you can.
https://github.com/aiueola/zr-obp/blob/b20d8158b67683c0a479ff8fe8fe6451a2a132c8/obp/dataset/synthetic_slate.py#L1187
How about

is_additive = reward_structure in ["standard_additive", "cascade_additive"]
is_cascade = reward_structure in ["cascade_additive", "cascade_decay"]

instead of
https://github.com/aiueola/zr-obp/blob/b20d8158b67683c0a479ff8fe8fe6451a2a132c8/obp/dataset/synthetic_slate.py#L1263

aiueola added 13 commits May 29, 2021 09:45

tmp

5fa5758

rm conflict

f9c8e6c

action_interaction_reward_function

905bee7

fix

f6525d1

fix

0e0f62f

black

ec9aa51

batch processing for calc_ground_truth_policy_value

c95125a

test

32cf231

fix test

0a0131b

minor fix

b726d84

faster calc_ground_truth_policy_value

586f1d4

minor fix

f126964

minor fix

b20d815

aiueola added 2 commits May 30, 2021 14:04

fix

979cf48

bug fix

97ff998

usaito merged commit 3cb08b5 into st-tech:master May 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: faster calc_ground_truth_policy_value in SyntheticSlateBanditDataset #102

refactor: faster calc_ground_truth_policy_value in SyntheticSlateBanditDataset #102

aiueola commented May 30, 2021 •

edited

Loading

usaito commented May 30, 2021

refactor: faster calc_ground_truth_policy_value in SyntheticSlateBanditDataset #102

refactor: faster calc_ground_truth_policy_value in SyntheticSlateBanditDataset #102

Conversation

aiueola commented May 30, 2021 • edited Loading

new feature

bug fix

refactor

result

test

others

usaito commented May 30, 2021

aiueola commented May 30, 2021 •

edited

Loading