New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@jit decorator with signature including List(List(int64)) not working #4678
Comments
Worth noting I do not really need to learn all the different ways of using decorators to achieve an end. All I want is to use numba to speed up group_by as implemented by pandas. So open to suggestions to that end. I got a signature to not throw a exception, below. But it goes into an endless loop. Some example in the documentation on how to declare a signature with a list of lists of a supported data type would go a long way to helping people who want to use numba.
|
Thanks for the report. It's strongly recommended that you don't specify type signatures unless you really need to, Numba is pretty good at just working them out. There are numerous issues in the above, the docs about typing problems are probably worth a read. I'd recommend using Here's some code that compiles and I think probably does something like what you want?: from numba import njit
import numpy as np
@njit
def _set_indices(keys_as_int, n_keys):
indices = [[0 for _ in range(0)] for _ in range(max(keys_as_int) + 1)] # the inner comp is a work-around to create an empty reflected list of a specific type
for i, k in enumerate(keys_as_int):
indices[k].append(i)
return [np.array(elt) for elt in indices]
def group_by(keys):
_, first_occurrences, keys_as_int = np.unique(keys, return_index=True,
return_inverse=True)
n_keys = max(keys_as_int) + 1
print(str(keys_as_int) + str(n_keys))
return _set_indices(keys_as_int, n_keys)
result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac'])
print(str(result)) and a version using the new from numba import njit, types
from numba.typed import List
import numpy as np
@njit
def _set_indices(keys_as_int, n_keys):
indices = List()
for _ in range(max(keys_as_int) + 1):
indices.append(List.empty_list(types.int64))
for i, k in enumerate(keys_as_int):
indices[k].append(i)
return indices
def group_by(keys):
_, first_occurrences, keys_as_int = np.unique(keys, return_index=True,
return_inverse=True)
n_keys = max(keys_as_int) + 1
print(str(keys_as_int) + str(n_keys))
return _set_indices(keys_as_int, n_keys)
result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac'])
print(str(result)) |
Thank you. If this is your general advice on using signatures maybe a word in the documentation will avoid others using them when it is not necessary? |
Using V0.45 of Numba
I am using the @jit signature to define the types of the incoming arguments. But in calling the function I get:
I know the list is empty, but my signature defines it so am not sure why Numba does not use that signature. If I have to prime the List of Lists this takes time with 10M rows.
I have tried the different forms of signatures (string form and the tuple form) and it still gives the error. It is not clear to me from the documentation why these signatures do not define the arguments as passed in and it is still relying on inferring types.
I expected the signature to enforce a data typing on the incoming arguments with no need for inferring the data types. Actual error
The text was updated successfully, but these errors were encountered: