Minsize argument for Dict.empty() #8110

stefanfed · 2022-05-25T02:03:14Z

Feature request

Being able to allocate memory for a Dict based on problem-specific information would minimize resizes and increase performance. A Dict is initialized by a call to the function numba_dict_new_minsize in dictobject.c. The starting number of buckets is fixed at D_MINSIZE = 8, which only allows for 5 entries without a resize.

The code comments state that this is suitable for the common case of a small dictionary used for passing keyword arguments. I believe an optional size argument would give users the ability to achieve significantly better performance with large hash tables. This would be in line with Numba's focus on fast numerical computation.

I've tried to modify Numba so that I could pass a size argument to Dict.empty() and have it call numba_dict_new instead of numba_dict_new_minsize. Unfortunately, I haven't been successful so far. Please consider adding such an option in a future release. Thank you.

The text was updated successfully, but these errors were encountered:

gmarkall · 2022-05-25T08:02:31Z

Many thanks for the feature request, and for identifying the required changes. It sounds like you've made a good attempt at making the change so far - would you like to push your branch up somewhere and perhaps I can suggest how to complete the implementation?

gmarkall · 2022-05-31T14:18:56Z

Noting from the triage meeting that a similar request is also made for typed lists regularly.

stefanfed · 2022-05-31T14:56:02Z

Apologies for the delay in responding. I'm working on uploading the (unsuccessful) changes I made to a fork.

I believe typed lists have this feature under the empty_list method. The argument is called allocated.

See numba#8110

stefanfed · 2022-06-11T00:33:51Z

Hi @gmarkall,

I've pushed my changes to a branch at https://github.com/stefanfed/numba/tree/dict_empty_allocated

This implementation of Dict.empty() works in both the object and nopython modes only when the optional argument allocated is set to the default None (or omitted). It appears to have the same behavior as before the changes. When an integer is provided, it shows the following error message and crashes the interpreter:

>>> Dict.empty(nb.int64, nb.int64, allocated=8)
LLVM ERROR: Symbol not found: numba_dict_new

I haven't figured out how these functions are linked to LLVM from dictobject.h. It seems numba_dict_new has been exported in the same way as numba_dict_new_minsize based on the preceding NUMBA_EXPORT_FUNC(int) macro. I'm not sure what that does exactly, to be honest. Likely, changes have to be made to another file to have LLVM recognize numba_dict_new. My understanding of the internal workings of numba with LLVM is lacking. That's my biggest problem.

The other problem is that allocated must be a power of 2 according to the notes in dictobject.h. I'm unsure what the proper behavior would be if a number that isn't a power of 2 is passed. I'd suggest rounding it up to the nearest power of 2. However, it could also be rounded down, or it could fail, leaving the check up to the user.

I'm also wondering what the ideal method rounding to the nearest power of 2 would be and at what stage to do it. Perhaps it should be done in the C implementation?

I'd appreciate your help in completing this. Thanks.

gmarkall · 2022-06-12T13:40:05Z

Hi @stefanfed - many thanks for pushing your branch!

Regarding this problem:

>>> Dict.empty(nb.int64, nb.int64, allocated=8)
LLVM ERROR: Symbol not found: numba_dict_new

The numba_dict_new function needs to be added to LLVM's symbol table. This is done using llvmlite.binding.add_symbol(). One place in Numba where a collection of these are added is in the _load_global_helpers() function:

numba/numba/core/base.py

Lines 155 to 158 in 2f89d45

    
           for c_helpers in (_helperlib.c_helpers, _dynfunc.c_helpers): 
        
               for py_name, c_address in c_helpers.items(): 
        
                   c_name = "numba_" + py_name 
        
                   ll.add_symbol(c_name, c_address)

The c_helpers dicts of these modules are constructed using a function with a macro called declmethod (for _helperlib, for example):

numba/numba/_helpermod.c

Lines 28 to 49 in 2f89d45

    
           static PyObject * 
        
           build_c_helpers_dict(void) 
        
           { 
        
               PyObject *dct = PyDict_New(); 
        
               if (dct == NULL) 
        
                   goto error; 
        
           #define _declpointer(name, value) do {                 \ 
        
               PyObject *o = PyLong_FromVoidPtr(value);           \ 
        
               if (o == NULL) goto error;                         \ 
        
               if (PyDict_SetItemString(dct, name, o)) {          \ 
        
                   Py_DECREF(o);                                  \ 
        
                   goto error;                                    \ 
        
               }                                                  \ 
        
               Py_DECREF(o);                                      \ 
        
           } while (0) 
        
           #define declmethod(func) _declpointer(#func, &numba_##func) 
        
           #define declpointer(ptr) _declpointer(#ptr, &numba_##ptr) 
        
               declmethod(fixed_fmod);

In order to make this work for the numba_dict_new method, I would suggest adding a similar construct that builds a helpers dict for typed dict methods containing the numba_dict_new method, and adding it to the list of c_helpers dicts that get initialized in _load_global_helpers().

It seems numba_dict_new has been exported in the same way as numba_dict_new_minsize based on the preceding NUMBA_EXPORT_FUNC(int) macro. I'm not sure what that does exactly, to be honest.

Basically it just adds __attribute__ ((visibility("hidden"))) on Unix-like platforms so that the symbols are defined but not exported by default (I believe this is already the default on Windows anyway).

The other problem is that allocated must be a power of 2 according to the notes in dictobject.h. I'm unsure what the proper behavior would be if a number that isn't a power of 2 is passed. I'd suggest rounding it up to the nearest power of 2. However, it could also be rounded down, or it could fail, leaving the check up to the user.

I think rounding up and documenting somewhere that the behaviour is to round up would be acceptable.

I'm also wondering what the ideal method rounding to the nearest power of 2 would be and at what stage to do it. Perhaps it should be done in the C implementation?

My guess would be that this would be the easiest place to ensure that only power-of-2 sizes are allocated - I think having no check / adjustment in the C side and relying on something in the Python side to round things up to power-of-2 sizes leaves open more opportunity for a future change to accidentally bypass the rounding-up, and I'd imagine it will also be easier to reason about the correctness of your changes at the point of review as well.

I'd appreciate your help in completing this. Thanks.

Many thanks for your efforts so far! I hope we can get this to completion - please do let us know if we have provide further guidance.

See numba#8110 The new 'allocated' parameter in Dict.empty() is the number of entries the dictionary should take without requiring a resize.

stefanfed · 2022-06-19T18:00:42Z

Hi @gmarkall,

I was able to fix the LLVM error by adding a single line to _helpermod.c. Thanks for pointing me there.

I added a function for rounding up to the next power of 2 in dictobject.c. To ensure no resizes are performed, I round up from 1.5 times the requested size because the load factor is set to 2/3 (after which the dictionary is resized). So the new parameter allocated is the number of dictionary entries that can be added without requiring a resize.

The implementation appears to work correctly now. I tested it with various inputs where the number of keys to be added is known ahead of time. The performance gain is over 2x for some ideal inputs (like a monotonically increasing range of ints), and it grows in steps with the input size as expected. When the ints are shuffled, the ratio goes down but the amount of time saved appears to stay the same. I saw no performance regressions.

I have a concern about the way I round up to the next power of 2:
https://github.com/stefanfed/numba/blob/078c70148de5f10041cb11f57c4f5e8ecdecba33/numba/cext/dictobject.c#L282-L296

Because Py_ssize_t is signed, it overflows when the next size is 2^63 on 64-bit, for example. I really doubt anyone has anywhere near enough memory to get to there, so this won't be a problem on 64-bit (though it could cause a crash if someone tried it). However, it could be a substantial problem on 32-bit. I wasn't sure how to resolve this - I'd appreciate your advice.

I'm not sure why Py_ssize_t is used for the size parameter as opposed to size_t. The size can never be negative. Could I change that, or would it cause problems elsewhere? Also, I'll note that the method may not be the most efficient because it doesn't take advantage of certain instructions that exist in most modern desktop processors. This likely isn't a priority, but it could be optimized.

Another change I hesitated to make involves numba_dict_new:
https://github.com/stefanfed/numba/blob/078c70148de5f10041cb11f57c4f5e8ecdecba33/numba/cext/dictobject.c#L560-L585

Currently, the only functionality of numba_dict_new_minsize is to call numba_dict_new with size=D_MINSIZE. Since we can guarantee D_MINSIZE is a power of 2, we don't need to round it up. Therefore, in numba_dict_new, I only attempt to round up if size != D_MINSIZE. An alternative is to copy all the code from numba_dict_new to numba_dict_new_minsize without the rounding up part. Would you say that's a good idea?

Should I open a pull request? I noticed that you require unit tests. I'm not 100% sure on this part. What kind of tests would you like me to write?

stefanfed · 2022-06-19T18:09:30Z

On second thought, numba_dict_new needs to be changed because the size parameter takes on 2 different meanings. In the minsize case, it's literally the allocation size, while in other cases it's the number of keys that can be inserted without a resize. So in the single case where someone passes allocated=8, it won't behave as I intend it to. I'll update it with the changes shortly.

…o insert without needing a resize. See numba#8110

stefanfed · 2022-06-19T22:11:43Z

I restructured a few things in my latest commit. I renamed allocated to n_keys, because it makes more sense with the desired behavior: it's the number of keys to make space for so that a resize isn't needed while adding them. I also created a new function called numba_dict_new_noresize in dictobject.c which contains all of the new functionality:
https://github.com/stefanfed/numba/blob/acbf5c274b4509488b6a47380088c0589874b376/numba/cext/dictobject.c#L563-L581

No other functions have been touched. This is then added to the LLVM symbol table and called from the python implementation. The overflow concern with rounding up to the nearest power of 2 remains. I'll have a closer look at solutions, though not using a signed type for this argument would be ideal.

gmarkall · 2022-06-20T08:45:20Z

I was able to fix the LLVM error by adding a single line to _helpermod.c. Thanks for pointing me there.

Ah - I hadn't noticed that _helpermod.c already contained some of the dict_new methods - thanks for spotting that it's simpler than I suggested it could be 🙂

Because Py_ssize_t is signed, it overflows when the next size is 2^63 on 64-bit, for example. I really doubt anyone has anywhere near enough memory to get to there, so this won't be a problem on 64-bit (though it could cause a crash if someone tried it). However, it could be a substantial problem on 32-bit. I wasn't sure how to resolve this - I'd appreciate your advice.

If my maths is correct:

In [2]: 2 ** (8 * 4 - 2)
Out[2]: 1073741824

On a 32-bit system, that is still a large size (1GB of contiguous allocation) for a system that can only have 2GB of addressable memory per process (On both Windows and Linux x86, at least). I'd be inclined not to worry about this (do let me know if I seem to have an incorrect calculation or assumption here though please 🙂)

Currently, the only functionality of numba_dict_new_minsize is to call numba_dict_new with size=D_MINSIZE. Since we can guarantee D_MINSIZE is a power of 2, we don't need to round it up. Therefore, in numba_dict_new, I only attempt to round up if size != D_MINSIZE. An alternative is to copy all the code from numba_dict_new to numba_dict_new_minsize without the rounding up part. Would you say that's a good idea?

Probably not - it sounds more succinct as it is.

Should I open a pull request?

Yes, I think your progress is looking great and we're a lot of the way there with this, so I think opening a PR would be totally appropriate and a really helpful way for us to iterate this to completion.

I noticed that you require unit tests. I'm not 100% sure on this part. What kind of tests would you like me to write?

I'll have a look through what the possibilities are here, since I think you're correct in identifying that it's not just obvious how to test this. In the meantime if you make any changes you were going to make and open the PR, and I'll add some suggestions for testing in comments on the PR.

Thanks for your work and perseverance so far!

See numba#8110

…o insert without needing a resize. See numba#8110

See numba#8110

gmarkall added the feature_request label May 25, 2022

gmarkall added the performance - run time Performance issue occurring at run time. label May 31, 2022

stefanfed closed this as completed May 31, 2022

stefanfed reopened this May 31, 2022

stefanfed added a commit to stefanfed/numba that referenced this issue Jun 10, 2022

Enhancement: Add a custom initial allocation size for typed dicts.

6ed0fd0

See numba#8110

stefanfed added a commit to stefanfed/numba that referenced this issue Jun 19, 2022

Enhancement: Add n_keys option to Dict.empty() - the number of keys t…

acbf5c2

…o insert without needing a resize. See numba#8110

stefanfed added a commit to stefanfed/numba that referenced this issue Jun 21, 2022

Add n_keys option to Dict.empty()

353049a

See numba#8110

stefanfed mentioned this issue Jun 22, 2022

Add n_keys option to Dict.empty() #8183

Merged

stefanfed added a commit to stefanfed/numba that referenced this issue Jul 6, 2022

Enhancement: Add n_keys option to Dict.empty() - the number of keys t…

99c702f

…o insert without needing a resize. See numba#8110

stefanfed added a commit to stefanfed/numba that referenced this issue Jul 6, 2022

Add n_keys option to Dict.empty()

e0a5a13

See numba#8110

sklam closed this as completed in #8183 Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minsize argument for Dict.empty() #8110

Minsize argument for Dict.empty() #8110

stefanfed commented May 25, 2022

gmarkall commented May 25, 2022

gmarkall commented May 31, 2022

stefanfed commented May 31, 2022

stefanfed commented Jun 11, 2022

gmarkall commented Jun 12, 2022

stefanfed commented Jun 19, 2022

stefanfed commented Jun 19, 2022

stefanfed commented Jun 19, 2022

gmarkall commented Jun 20, 2022

Minsize argument for Dict.empty() #8110

Minsize argument for Dict.empty() #8110

Comments

stefanfed commented May 25, 2022

Feature request

gmarkall commented May 25, 2022

gmarkall commented May 31, 2022

stefanfed commented May 31, 2022

stefanfed commented Jun 11, 2022

gmarkall commented Jun 12, 2022

stefanfed commented Jun 19, 2022

stefanfed commented Jun 19, 2022

stefanfed commented Jun 19, 2022

gmarkall commented Jun 20, 2022