You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In "ucb1.py" in the "rl" folder for solving the bandits problem, what is the point of using "n=total times of playing" in the UCB formula which is: mean + np.sqrt(2*np.log(n) / nj) ?
I tested the two following formulas (without "n" ) instead and they worked totally fine: mean + np.sqrt(2 / nj)
and even mean + (1 / nj)
I also tested them with different total number of plays, but the final results of the agents were so similar.
I would be grateful if you elaborate on the usage of n in the formula.
Best,
Parnia
The text was updated successfully, but these errors were encountered:
Hello,
In "ucb1.py" in the "rl" folder for solving the bandits problem, what is the point of using "n=total times of playing" in the UCB formula which is:
mean + np.sqrt(2*np.log(n) / nj)
?I tested the two following formulas (without "n" ) instead and they worked totally fine:
mean + np.sqrt(2 / nj)
and even
mean + (1 / nj)
I also tested them with different total number of plays, but the final results of the agents were so similar.
I would be grateful if you elaborate on the usage of n in the formula.
Best,
Parnia
The text was updated successfully, but these errors were encountered: