Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing a string from a Numpy structured array throws an error #5802

Closed
2 tasks done
marnixhoh opened this issue Jun 2, 2020 · 4 comments
Closed
2 tasks done

Slicing a string from a Numpy structured array throws an error #5802

marnixhoh opened this issue Jun 2, 2020 · 4 comments
Labels
question Notes an issue as a question

Comments

@marnixhoh
Copy link

marnixhoh commented Jun 2, 2020

This works:

from numba import njit
import numpy as np

values_dtype = np.dtype([
    ('one', 'U25'),
    ('two', 'f8')
])

def my_test():
    values = np.zeros(2, dtype=values_dtype)
    values['one'][0] = '2020-03-03 00:01:00'
    stuff = values['one'][0][0:10]
    return stuff
result = my_test()
print(result)

But when jitting it, it throws an error:

from numba import njit
import numpy as np

values_dtype = np.dtype([
    ('one', 'U25'),
    ('two', 'f8')
])

@njit
def my_test():
    values = np.zeros(2, dtype=values_dtype)
    values['one'][0] = '2020-03-03 00:01:00'
    stuff = values['one'][0][0:10]
    return stuff
result = my_test()
print(result)

Invalid use of Function() with argument(s) of type(s): ([unichr x 25], slice)

numba: 0.49.1
numpy: 1.18.4
python: 3.7.1

@marnixhoh
Copy link
Author

I just found a work around. If you parse the string from the structured array using str(), it works. However, I am not sure what the performance penalty is for doing so??

from numba import njit
import numpy as np

values_dtype = np.dtype([
    ('one', 'U25'),
    ('two', 'f8')
])

@njit
def my_test():
    values = np.zeros(2, dtype=values_dtype)
    values['one'][0] = '2020-03-03 00:01:00'
    stuff = str(values['one'][0])[0:10]
    return stuff
result = my_test()
print(result)

@marnixhoh
Copy link
Author

I also wanted to point out that some string methods do work as expected. E.g. .split():

from numba import njit
import numpy as np

values_dtype = np.dtype([
    ('one', 'U25'),
    ('two', 'f8')
])

@njit
def my_test():
    values = np.zeros(2, dtype=values_dtype)
    values['one'][0] = '2020-03-03 00:01:00'
    stuff = values['one'][0].split(':')
    return stuff
result = my_test()
print(result)

Hope this information is helpful

@stuartarchibald
Copy link
Contributor

Thanks for the report. In the first post #5802 (comment) I think that this won't work as the type of the returned thing will depend on the values in the slice, e.g. [1:3] would produce a [unichr x 2], whereas [1:4] would produce a [unichr x 3].

With respect to your work around #5802 (comment) the performance penalty is likely that a string buffer has to be allocated and the characters copied in, without looking at the machine code I'd guess that it cannot be optimised away as it's a change of internal representation.

As to some method's working, it's because the character array will "look like" a string for the method call if the types can be statically determined.

@stuartarchibald stuartarchibald added question Notes an issue as a question and removed needtriage labels Jun 9, 2020
@stuartarchibald
Copy link
Contributor

Closing this question as it seems to be resolved. Numba now has a discourse forum https://numba.discourse.group/ which is great for questions like this, please do consider posting there in future :) Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Notes an issue as a question
Projects
None yet
Development

No branches or pull requests

3 participants