Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create integer from unicode string with int() #5650

Open
bf opened this issue Apr 29, 2020 · 5 comments
Open

Create integer from unicode string with int() #5650

bf opened this issue Apr 29, 2020 · 5 comments

Comments

@bf
Copy link

bf commented Apr 29, 2020

I am trying to convert a unicode string to an integer like this:

import numba

@numba.jit(nopython=True)
def str_to_int(str_date):
	return int(str_date[0:4])

print(str_to_int("2016 foo bar"))

and receive the following error:

Invalid use of Function(<class 'int'>) with argument(s) of type(s): (unicode_type)
 * parameterized
In definition 0:
    All templates rejected with literals.
In definition 1:
    All templates rejected without literals.
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<class 'int'>)
[2] During: typing of call at numba-bug.py (5)


File "numba-bug.py", line 5:
def str_to_int(str_date):
	return int(str_date[0:4])

I have also tried to replace the int(str_date[0:4]) with int(str_date[0]) * 1000 + int(str_date[1]) * 100 + int(str_date[2]) * 10 + int(str_date[3]) but the same error appears.

Is this a real bug or am I using the library wrong? Unfortunately I was unable to find more info on substring handling in the docs. Thank you!

@esc esc added the needtriage label Apr 29, 2020
@esc
Copy link
Member

esc commented Apr 29, 2020

@bf thanks for asking about this on the Numba issue tracker. Purely from memory, I don't think this is implemented yet and IIRC there was a PR recently to address this. I'll have a closer look now.

@bf
Copy link
Author

bf commented Apr 29, 2020

Thank you very much for your quick reply.

Luckily, I have found a workaround: By converting to bytesarray first, I can work with the strings as I originally intended.

But I am unsure about the performance impact, because converting str to bytesarray in python land involves UTF8 decode, which might be the most expensive operation in this context.

@numba.jit(nopython=True)
def date_str_to_int(bytes_date):
    str_byte = (
        byte_to_int(bytes_date[0]) * 10**5 + 
        byte_to_int(bytes_date[1]) * 10**4 + 
        byte_to_int(bytes_date[2]) * 10**3 + 
        byte_to_int(bytes_date[3]) * 10**2 +
        byte_to_int(bytes_date[5]) * 10**1 +
        byte_to_int(bytes_date[6])
    )

    return str_byte

@numba.jit(nopython=True)
def byte_to_int(b):
    # print("in", b)
    out = int(b) - 48
    # print("out", out)
    return out

str_with_date = "2016-03"
str_bytes = bytearray(str_with_date, encoding="utf-8")

date_str_to_int(str_bytes)
 => (int) 201603

@esc
Copy link
Member

esc commented Apr 29, 2020

So the pull-request I had in mind was actually the other way around:

#5463

@esc esc changed the title Unable to create integer from unicode string with int() Create integer from unicode string with int() Apr 29, 2020
@esc
Copy link
Member

esc commented Apr 29, 2020

@bf I have converted this into a feature request as it doesn't seem to be implemented yet.

@esc
Copy link
Member

esc commented May 4, 2020

Here is an example implementation that I wrote to illustrate how it would (could) be done:

from numba import njit


@njit
def str_to_int(s):
    final_index, result = len(s) - 1, 0
    for i,v in enumerate(s):
        result += (ord(v) - 48) * (10 ** (final_index - i))
    return result


print(str_to_int("1"))
print(str_to_int("12"))
print(str_to_int("123"))
print(str_to_int("1234"))
print(str_to_int("12345"))
print(str_to_int("123456"))
print(str_to_int("1234567"))
print(str_to_int("12345678"))
print(str_to_int("123456789"))
print(str_to_int("1234567890"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants