Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@jit decorator with signature including List(List(int64)) not working #4678

Closed
davidwynter opened this issue Oct 9, 2019 · 3 comments
Closed
Labels
no action required No action was needed to resolve.

Comments

@davidwynter
Copy link

davidwynter commented Oct 9, 2019

Using V0.45 of Numba

I am using the @jit signature to define the types of the incoming arguments. But in calling the function I get:

ValueError: cannot compute fingerprint of empty list

I know the list is empty, but my signature defines it so am not sure why Numba does not use that signature. If I have to prime the List of Lists this takes time with 10M rows.

I have tried the different forms of signatures (string form and the tuple form) and it still gives the error. It is not clear to me from the documentation why these signatures do not define the arguments as passed in and it is still relying on inferring types.

@nb.jit("void(List(int64), int64, List(List(int64)))", nopython=True, cache=True)
def _set_indices(keys_as_int, n_keys, indices):
    for i, k in enumerate(keys_as_int):
        indices[k].append(i)
    indices = [([np.array(elt) for elt in indices])]

def group_by(keys):
    _, first_occurrences, keys_as_int = np.unique(keys, return_index=True, return_inverse=True)
    n_keys = max(keys_as_int) + 1
    indices = [[] for _ in range(max(keys_as_int) + 1)]
    print(str(keys_as_int) + str(n_keys) + str(indices))
    _set_indices(keys_as_int, n_keys, indices)
    return indices

result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac'])
print(str(result))

I expected the signature to enforce a data typing on the incoming arguments with no need for inferring the data types. Actual error

<ipython-input-274-401e07cd4e63> in <module>
----> 1 result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac'])
  2 print(str(result))

<ipython-input-273-acdebb81069c> in group_by(keys)
  4     indices = [[] for _ in range(max(keys_as_int) + 1)]
  5     print(str(keys_as_int) + str(n_keys) + str(indices))
----> 6     _set_indices(keys_as_int, n_keys, indices)
  7     return indices

ValueError: cannot compute fingerprint of empty list
@davidwynter
Copy link
Author

davidwynter commented Oct 10, 2019

Worth noting I do not really need to learn all the different ways of using decorators to achieve an end. All I want is to use numba to speed up group_by as implemented by pandas. So open to suggestions to that end.

I got a signature to not throw a exception, below. But it goes into an endless loop. Some example in the documentation on how to declare a signature with a list of lists of a supported data type would go a long way to helping people who want to use numba.

@jit(list[list(nb.int64)](nb.int64[:], nb.int64), nopython=True, cache=True)
def _set_indices(self, keys_as_int, n_keys):
    indices = [[i for i in range(0)] for _ in range(n_keys)]
    #nb.typeof(indices)
    for i, k in enumerate(keys_as_int):
        indices[k].append(i)
    return indices

@stuartarchibald
Copy link
Contributor

Thanks for the report. It's strongly recommended that you don't specify type signatures unless you really need to, Numba is pretty good at just working them out.

There are numerous issues in the above, the docs about typing problems are probably worth a read. I'd recommend using numba.typed.List as your list type, this is strongly typed list and type inference should be fine as a result. It'd be a good idea to read this which explains about the different types of list.

Here's some code that compiles and I think probably does something like what you want?:

from numba import njit
import numpy as np

@njit
def _set_indices(keys_as_int, n_keys):
    indices = [[0 for _ in range(0)] for _ in range(max(keys_as_int) + 1)] # the inner comp is a work-around to create an empty reflected list of a specific type
    for i, k in enumerate(keys_as_int):
        indices[k].append(i)
    return [np.array(elt) for elt in indices]

def group_by(keys):
    _, first_occurrences, keys_as_int = np.unique(keys, return_index=True,
                                                  return_inverse=True)
    n_keys = max(keys_as_int) + 1
    print(str(keys_as_int) + str(n_keys))
    return _set_indices(keys_as_int, n_keys)

result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac'])
print(str(result))

and a version using the new typed.List:

from numba import njit, types
from numba.typed import List
import numpy as np

@njit
def _set_indices(keys_as_int, n_keys):
    indices = List()
    for _ in range(max(keys_as_int) + 1):
        indices.append(List.empty_list(types.int64))

    for i, k in enumerate(keys_as_int):
        indices[k].append(i)
    return indices

def group_by(keys):
    _, first_occurrences, keys_as_int = np.unique(keys, return_index=True,
                                                  return_inverse=True)
    n_keys = max(keys_as_int) + 1
    print(str(keys_as_int) + str(n_keys))
    return _set_indices(keys_as_int, n_keys)

result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac'])
print(str(result))

@stuartarchibald stuartarchibald added no action required No action was needed to resolve. and removed needtriage labels Oct 11, 2019
@davidwynter
Copy link
Author

Thank you. If this is your general advice on using signatures maybe a word in the documentation will avoid others using them when it is not necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no action required No action was needed to resolve.
Projects
None yet
Development

No branches or pull requests

2 participants