Blackjack uses np_random.choice() in a way that creates order-of-mag slowdown #1130

BStudent · 2018-08-16T06:38:18Z

WHAT TO DO

Change the draw_card(...) function to the revised form shown, which will greatly speedup the Blackjack env:

Proposed change

The commented-out portion is the existing / unmodified code:

def draw_card(np_random):
    return deck[int(13.0*np.random.random_sample())]   # <-- replacement code
    # return int(np_random.choice(deck))  # <-- Existing code

Why?

In concrete quantitative terms, the example provided here shows that replacing np_random.choice(...) with another function of equivalently simple syntax results in 27x speedup of the random choice, and for this example program, 4x speedup overall.

Researching the issue on stack overflow, the issue is known and appears on several posts in various forms:
np_random.choice() - presumably derived from numpy.random.choice() - is not terribly efficient in the first place, ranging from complete bottleneck to only moderately inefficient under the ideal conditions of large vectorized operations.
When used like it is in the blackjack.py::draw_card function of open.ai gym, it is astonishingly slow: by a factor of 27x to be specific.

Example

When implementing a blackjack player with a fixed strategy (provided by Udacity) I noticed neither server nor local machine was getting the speed-performance I'd expect. I ran cProfile on toy(text) problem for 50000 iterations:

if DO_Part_1:
        
    doProf = False
    if doProf:
        print('profiling')
        niter = 50000
        cProfile.run('mc_prediction_qp(env, niter, generate_episode_from_limit_stochastic)', 'mc_prof_stats2')
        print('done profiling')
        exit()

Profile with original code using `choice` (sorted by time-in-function ex time-in-children):

Of the 6.56 seconds that the code ran, the calls toe choice(...) take 3.92 seconds:

         3188536 function calls in 6.553 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   355579    3.924    0.000    4.040    0.000 {method 'choice' of 'mtrand.RandomState' objects}
        1    0.556    0.556    6.553    6.553 <ipython-input-1-b79eda292fd7>:85(mc_prediction_q)
    79215    0.289    0.000    1.771    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:91(step)
    50000    0.254    0.000    5.980    0.000 <ipython-input-1-b79eda292fd7>:63(generate_episode_from_limit_stochastic)
    79215    0.232    0.000    0.275    0.000 /home/bstudent/Develop/der-gym-meister/gym/spaces/discrete.py:16(contains)
   344365    0.176    0.000    0.381    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:24(sum_hand)
   276364    0.162    0.000    1.920    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:12(draw_card)
   129215    0.138    0.000    0.304    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:110(_get_obs)
   473580    0.121    0.000    0.147    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:20(usable_ace)
   449006    0.116    0.000    0.116    0.000 {built-in method builtins.sum}
    79215    0.096    0.000    0.096    0.000 {built-in method numpy.core.multiarray.arange}
    79215    0.093    0.000    0.116    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/site-packages/numpy/core/getlimits.py:376(__new__)
    50000    0.092    0.000    1.565    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:113(reset)
   100000    0.076    0.000    1.349    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:16(draw_hand)
   108617    0.047    0.000    0.177    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:30(is_bust)
   158434    0.043    0.000    0.043    0.000 {built-in method builtins.isinstance}
    58804    0.035    0.000    0.155    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:34(score)
   155579    0.034    0.000    0.034    0.000 {method 'append' of 'list' objects}
    29402    0.029    0.000    0.029    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:5(cmp)
    79215    0.023    0.000    0.023    0.000 {method 'get' of 'dict' objects}
    50000    0.012    0.000    0.012    0.000 {built-in method builtins.len}
     1680    0.002    0.000    0.002    0.000 {built-in method numpy.core.multiarray.zeros}
       20    0.001    0.000    0.001    0.000 {method 'acquire' of '_thread.lock' objects}

After changing the code, as shown in Apppendix, 27x local and 4x program total speedup

The random_sample() function effectively replaces np.random.choice() and now appears as the fifth-ranked time-consumer, taking 0.15 sec for the same number of calls that took 3.92 sec using the np.random.choice(...) function (about 27x speedup).

         2871036 function calls in 1.658 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.323    0.323    1.657    1.657 <ipython-input-3-68851fc02b73>:88(mc_prediction_q)
    79192    0.188    0.000    0.747    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:92(step)
   343982    0.156    0.000    0.315    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:25(sum_hand)
   276395    0.155    0.000    0.269    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:13(draw_card)
   355587    0.147    0.000    0.147    0.000 {method 'random_sample' of 'mtrand.RandomState' objects}
    50000    0.125    0.000    1.323    0.000 <ipython-input-3-68851fc02b73>:63(generate_episode_from_limit_stochastic)
   473174    0.097    0.000    0.118    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:21(usable_ace)
   449463    0.093    0.000    0.093    0.000 {built-in method builtins.sum}
   129192    0.088    0.000    0.238    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:111(_get_obs)
    50000    0.064    0.000    0.407    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:114(reset)
   100000    0.055    0.000    0.250    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:17(draw_hand)
   108534    0.037    0.000    0.138    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:31(is_bust)
    79192    0.035    0.000    0.047    0.000 /home/bstudent/Develop/der-gym-meister/gym/spaces/discrete.py:16(contains)
    58684    0.029    0.000    0.143    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:35(score)
   155587    0.022    0.000    0.022    0.000 {method 'append' of 'list' objects}
    29342    0.019    0.000    0.019    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:6(cmp)
    79196    0.012    0.000    0.012    0.000 {built-in method builtins.isinstance}
    50000    0.007    0.000    0.007    0.000 {built-in method builtins.len}
     1680    0.002    0.000    0.002    0.000 {built-in method numpy.core.multiarray.zeros}
       20    0.001    0.000    0.001    0.000 {method 'acquire' of '_thread.lock' objects}
        1    0.000    0.000    1.657    1.657 <string>:1(<module>)
      280    0.000    0.000    0.001    0.000 <ipython-input-3-68851fc02b73>:91(<lambda>)
      280    0.000    0.000    0.001    0.000 <ipython-input-3-68851fc02b73>:96(<lambda>)
      280    0.000    0.000    0.000    0.000 <ipython-input-3-68851fc02b73>:90(<lambda>)
      280    0.000    0.000    0.000    0.000 <ipython-input-3-68851fc02b73>:97(<lambda>)
      280    0.000    0.000    0.000    0.000 <ipython-input-3-68851fc02b73>:98(<lambda>)
      280    0.000    0.000    0.000    0.000 <ipython-input-3-68851fc02b73>:92(<lambda>)
       10    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/site-packages/zmq/sugar/socket.py:333(send)
       10    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/site-packages/ipykernel/iostream.py:195(schedule)
        1    0.000    0.000    1.658    1.658 {built-in method builtins.exec}
       12    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/threading.py:1104(is_alive)
        2    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/threading.py:215(__init__)
        4    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/site-packages/ipykernel/iostream.py:366(write)

APPENDIX: profiled code (calculate state-value function Q for fixed-strategy:

def generate_episode_from_limit_stochastic(bj_env):
    episode = []
    state = bj_env.reset()
    while True:

Very slow code using np.random.choice(...) is commented out:

        # probs = [0.8, 0.2] if state[0] > 18 else [0.2, 0.8]
        # action = np.random.choice(np.arange(2), p=probs)
        prob0 = 0.8 if state[0] > 18 else 0.2
        action = 0 + (np.random.random_sample() >= prob0)

        next_state, reward, done, info = bj_env.step(action)
        episode.append((state, action, reward))
        state = next_state
        if done:
            break
    return episode

def mc_prediction_qp(env, num_episodes, generate_episode, gamma=1.0):
    returns_sum = defaultdict(lambda: np.zeros(env.action_space.n))
    N = defaultdict(lambda: np.zeros(env.action_space.n))
    Q = defaultdict(lambda: np.zeros(env.action_space.n))
    
    for i_episode in range(1, num_episodes+1):
        if i_episode % 25000 == 0:
            print("\rEpisode {}/{}.".format(i_episode, num_episodes), end="")
            sys.stdout.flush()

        States_, Actions_, Returns_ = zip(* generate_episode(env))
        R_ = Returns_[-1]   # NOTE: all steps get the un-discounted final state return as their value
        Turns = len(States_)
        for T_ in range(Turns):
            S_t                     = States_[T_]
            A_t                     = Actions_[T_]
            N[S_t][A_t]             = N[S_t][A_t] + 1
            returns_sum[S_t][A_t]   = returns_sum[S_t][A_t] + R_
   
    for S_ in N.keys():
        Nnot0           = N[S_]!=0
        Q[S_][Nnot0]  =  returns_sum[S_][Nnot0] / N[S_][Nnot0]
   
    return Q

The text was updated successfully, but these errors were encountered:

christopherhesse · 2019-03-09T02:30:59Z

Thanks for the writeup! If anyone wants to make a PR with something like deck[int(len(deck)*np.random.random_sample())] in there, we could review that.

christopherhesse · 2019-03-22T22:08:26Z

Going to close this for now due to lack of activity, but a PR would be welcome.

rhalbersma · 2020-04-10T22:10:46Z

The inefficiency lies not within np_random.choice() itself but rather in the conversion of the Python list deck into a NumPy array. And the resolution is not to use random_sample() and convert its floating point to an int on 13-point integer scale, but rather to use randint() with the range 0...13 (exclusive). See e.g. https://stackoverflow.com/questions/29574605/performance-of-choice-vs-randint

Alternatively, the deck could be created directly as a NumPy array, which avoids the conversion altogether.

BStudent mentioned this issue Aug 16, 2018

gym/envs/robotics/hand/reach.py, see: Issue #1130 : use "choice" function with caution. #1131

Closed

christopherhesse closed this as completed Mar 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blackjack uses np_random.choice() in a way that creates order-of-mag slowdown #1130

Blackjack uses np_random.choice() in a way that creates order-of-mag slowdown #1130

BStudent commented Aug 16, 2018 •

edited

christopherhesse commented Mar 9, 2019

christopherhesse commented Mar 22, 2019

rhalbersma commented Apr 10, 2020

Blackjack uses np_random.choice() in a way that creates order-of-mag slowdown #1130

Blackjack uses np_random.choice() in a way that creates order-of-mag slowdown #1130

Comments

BStudent commented Aug 16, 2018 • edited

WHAT TO DO

Proposed change

Why?

Example

Profile with original code using choice (sorted by time-in-function ex time-in-children):

After changing the code, as shown in Apppendix, 27x local and 4x program total speedup

APPENDIX: profiled code (calculate state-value function Q for fixed-strategy:

christopherhesse commented Mar 9, 2019

christopherhesse commented Mar 22, 2019

rhalbersma commented Apr 10, 2020

BStudent commented Aug 16, 2018 •

edited

Profile with original code using `choice` (sorted by time-in-function ex time-in-children):