Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blackjack uses np_random.choice() in a way that creates order-of-mag slowdown #1130

Closed
BStudent opened this issue Aug 16, 2018 · 3 comments
Closed

Comments

@BStudent
Copy link

BStudent commented Aug 16, 2018

WHAT TO DO

Change the draw_card(...) function to the revised form shown, which will greatly speedup the Blackjack env:

Proposed change

The commented-out portion is the existing / unmodified code:

def draw_card(np_random):
    return deck[int(13.0*np.random.random_sample())]   # <-- replacement code
    # return int(np_random.choice(deck))  # <-- Existing code 

Why?

In concrete quantitative terms, the example provided here shows that replacing np_random.choice(...) with another function of equivalently simple syntax results in 27x speedup of the random choice, and for this example program, 4x speedup overall.

Researching the issue on stack overflow, the issue is known and appears on several posts in various forms:
np_random.choice() - presumably derived from numpy.random.choice() - is not terribly efficient in the first place, ranging from complete bottleneck to only moderately inefficient under the ideal conditions of large vectorized operations.
When used like it is in the blackjack.py::draw_card function of open.ai gym, it is astonishingly slow: by a factor of 27x to be specific.

Example

When implementing a blackjack player with a fixed strategy (provided by Udacity) I noticed neither server nor local machine was getting the speed-performance I'd expect. I ran cProfile on toy(text) problem for 50000 iterations:

if DO_Part_1:
        
    doProf = False
    if doProf:
        print('profiling')
        niter = 50000
        cProfile.run('mc_prediction_qp(env, niter, generate_episode_from_limit_stochastic)', 'mc_prof_stats2')
        print('done profiling')
        exit()

Profile with original code using choice (sorted by time-in-function ex time-in-children):

Of the 6.56 seconds that the code ran, the calls toe choice(...) take 3.92 seconds:

         3188536 function calls in 6.553 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   355579    3.924    0.000    4.040    0.000 {method 'choice' of 'mtrand.RandomState' objects}
        1    0.556    0.556    6.553    6.553 <ipython-input-1-b79eda292fd7>:85(mc_prediction_q)
    79215    0.289    0.000    1.771    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:91(step)
    50000    0.254    0.000    5.980    0.000 <ipython-input-1-b79eda292fd7>:63(generate_episode_from_limit_stochastic)
    79215    0.232    0.000    0.275    0.000 /home/bstudent/Develop/der-gym-meister/gym/spaces/discrete.py:16(contains)
   344365    0.176    0.000    0.381    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:24(sum_hand)
   276364    0.162    0.000    1.920    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:12(draw_card)
   129215    0.138    0.000    0.304    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:110(_get_obs)
   473580    0.121    0.000    0.147    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:20(usable_ace)
   449006    0.116    0.000    0.116    0.000 {built-in method builtins.sum}
    79215    0.096    0.000    0.096    0.000 {built-in method numpy.core.multiarray.arange}
    79215    0.093    0.000    0.116    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/site-packages/numpy/core/getlimits.py:376(__new__)
    50000    0.092    0.000    1.565    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:113(reset)
   100000    0.076    0.000    1.349    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:16(draw_hand)
   108617    0.047    0.000    0.177    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:30(is_bust)
   158434    0.043    0.000    0.043    0.000 {built-in method builtins.isinstance}
    58804    0.035    0.000    0.155    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:34(score)
   155579    0.034    0.000    0.034    0.000 {method 'append' of 'list' objects}
    29402    0.029    0.000    0.029    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:5(cmp)
    79215    0.023    0.000    0.023    0.000 {method 'get' of 'dict' objects}
    50000    0.012    0.000    0.012    0.000 {built-in method builtins.len}
     1680    0.002    0.000    0.002    0.000 {built-in method numpy.core.multiarray.zeros}
       20    0.001    0.000    0.001    0.000 {method 'acquire' of '_thread.lock' objects}

After changing the code, as shown in Apppendix, 27x local and 4x program total speedup

The random_sample() function effectively replaces np.random.choice() and now appears as the fifth-ranked time-consumer, taking 0.15 sec for the same number of calls that took 3.92 sec using the np.random.choice(...) function (about 27x speedup).

         2871036 function calls in 1.658 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.323    0.323    1.657    1.657 <ipython-input-3-68851fc02b73>:88(mc_prediction_q)
    79192    0.188    0.000    0.747    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:92(step)
   343982    0.156    0.000    0.315    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:25(sum_hand)
   276395    0.155    0.000    0.269    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:13(draw_card)
   355587    0.147    0.000    0.147    0.000 {method 'random_sample' of 'mtrand.RandomState' objects}
    50000    0.125    0.000    1.323    0.000 <ipython-input-3-68851fc02b73>:63(generate_episode_from_limit_stochastic)
   473174    0.097    0.000    0.118    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:21(usable_ace)
   449463    0.093    0.000    0.093    0.000 {built-in method builtins.sum}
   129192    0.088    0.000    0.238    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:111(_get_obs)
    50000    0.064    0.000    0.407    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:114(reset)
   100000    0.055    0.000    0.250    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:17(draw_hand)
   108534    0.037    0.000    0.138    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:31(is_bust)
    79192    0.035    0.000    0.047    0.000 /home/bstudent/Develop/der-gym-meister/gym/spaces/discrete.py:16(contains)
    58684    0.029    0.000    0.143    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:35(score)
   155587    0.022    0.000    0.022    0.000 {method 'append' of 'list' objects}
    29342    0.019    0.000    0.019    0.000 /home/bstudent/Develop/der-gym-meister/gym/envs/toy_text/blackjack.py:6(cmp)
    79196    0.012    0.000    0.012    0.000 {built-in method builtins.isinstance}
    50000    0.007    0.000    0.007    0.000 {built-in method builtins.len}
     1680    0.002    0.000    0.002    0.000 {built-in method numpy.core.multiarray.zeros}
       20    0.001    0.000    0.001    0.000 {method 'acquire' of '_thread.lock' objects}
        1    0.000    0.000    1.657    1.657 <string>:1(<module>)
      280    0.000    0.000    0.001    0.000 <ipython-input-3-68851fc02b73>:91(<lambda>)
      280    0.000    0.000    0.001    0.000 <ipython-input-3-68851fc02b73>:96(<lambda>)
      280    0.000    0.000    0.000    0.000 <ipython-input-3-68851fc02b73>:90(<lambda>)
      280    0.000    0.000    0.000    0.000 <ipython-input-3-68851fc02b73>:97(<lambda>)
      280    0.000    0.000    0.000    0.000 <ipython-input-3-68851fc02b73>:98(<lambda>)
      280    0.000    0.000    0.000    0.000 <ipython-input-3-68851fc02b73>:92(<lambda>)
       10    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/site-packages/zmq/sugar/socket.py:333(send)
       10    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/site-packages/ipykernel/iostream.py:195(schedule)
        1    0.000    0.000    1.658    1.658 {built-in method builtins.exec}
       12    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/threading.py:1104(is_alive)
        2    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/threading.py:215(__init__)
        4    0.000    0.000    0.000    0.000 /home/bstudent/anaconda3/envs/tfgpuV18/lib/python3.6/site-packages/ipykernel/iostream.py:366(write)

APPENDIX: profiled code (calculate state-value function Q for fixed-strategy:

def generate_episode_from_limit_stochastic(bj_env):
    episode = []
    state = bj_env.reset()
    while True:

Very slow code using np.random.choice(...) is commented out:

        # probs = [0.8, 0.2] if state[0] > 18 else [0.2, 0.8]
        # action = np.random.choice(np.arange(2), p=probs)
        prob0 = 0.8 if state[0] > 18 else 0.2
        action = 0 + (np.random.random_sample() >= prob0)

        next_state, reward, done, info = bj_env.step(action)
        episode.append((state, action, reward))
        state = next_state
        if done:
            break
    return episode

def mc_prediction_qp(env, num_episodes, generate_episode, gamma=1.0):
    returns_sum = defaultdict(lambda: np.zeros(env.action_space.n))
    N = defaultdict(lambda: np.zeros(env.action_space.n))
    Q = defaultdict(lambda: np.zeros(env.action_space.n))
    
    for i_episode in range(1, num_episodes+1):
        if i_episode % 25000 == 0:
            print("\rEpisode {}/{}.".format(i_episode, num_episodes), end="")
            sys.stdout.flush()

        States_, Actions_, Returns_ = zip(* generate_episode(env))
        R_ = Returns_[-1]   # NOTE: all steps get the un-discounted final state return as their value
        Turns = len(States_)
        for T_ in range(Turns):
            S_t                     = States_[T_]
            A_t                     = Actions_[T_]
            N[S_t][A_t]             = N[S_t][A_t] + 1
            returns_sum[S_t][A_t]   = returns_sum[S_t][A_t] + R_
   
    for S_ in N.keys():
        Nnot0           = N[S_]!=0
        Q[S_][Nnot0]  =  returns_sum[S_][Nnot0] / N[S_][Nnot0]
   
    return Q
@christopherhesse
Copy link
Contributor

Thanks for the writeup! If anyone wants to make a PR with something like deck[int(len(deck)*np.random.random_sample())] in there, we could review that.

@christopherhesse
Copy link
Contributor

Going to close this for now due to lack of activity, but a PR would be welcome.

@rhalbersma
Copy link

The inefficiency lies not within np_random.choice() itself but rather in the conversion of the Python list deck into a NumPy array. And the resolution is not to use random_sample() and convert its floating point to an int on 13-point integer scale, but rather to use randint() with the range 0...13 (exclusive). See e.g. https://stackoverflow.com/questions/29574605/performance-of-choice-vs-randint

Alternatively, the deck could be created directly as a NumPy array, which avoids the conversion altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants