Create integer from unicode string with int() #5650

bf · 2020-04-29T16:05:07Z

I am trying to convert a unicode string to an integer like this:

import numba

@numba.jit(nopython=True)
def str_to_int(str_date):
	return int(str_date[0:4])

print(str_to_int("2016 foo bar"))

and receive the following error:

Invalid use of Function(<class 'int'>) with argument(s) of type(s): (unicode_type)
 * parameterized
In definition 0:
    All templates rejected with literals.
In definition 1:
    All templates rejected without literals.
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<class 'int'>)
[2] During: typing of call at numba-bug.py (5)


File "numba-bug.py", line 5:
def str_to_int(str_date):
	return int(str_date[0:4])

I have also tried to replace the int(str_date[0:4]) with int(str_date[0]) * 1000 + int(str_date[1]) * 100 + int(str_date[2]) * 10 + int(str_date[3]) but the same error appears.

Is this a real bug or am I using the library wrong? Unfortunately I was unable to find more info on substring handling in the docs. Thank you!

The text was updated successfully, but these errors were encountered:

esc · 2020-04-29T17:28:18Z

@bf thanks for asking about this on the Numba issue tracker. Purely from memory, I don't think this is implemented yet and IIRC there was a PR recently to address this. I'll have a closer look now.

bf · 2020-04-29T17:33:55Z

Thank you very much for your quick reply.

Luckily, I have found a workaround: By converting to bytesarray first, I can work with the strings as I originally intended.

But I am unsure about the performance impact, because converting str to bytesarray in python land involves UTF8 decode, which might be the most expensive operation in this context.

@numba.jit(nopython=True)
def date_str_to_int(bytes_date):
    str_byte = (
        byte_to_int(bytes_date[0]) * 10**5 + 
        byte_to_int(bytes_date[1]) * 10**4 + 
        byte_to_int(bytes_date[2]) * 10**3 + 
        byte_to_int(bytes_date[3]) * 10**2 +
        byte_to_int(bytes_date[5]) * 10**1 +
        byte_to_int(bytes_date[6])
    )

    return str_byte

@numba.jit(nopython=True)
def byte_to_int(b):
    # print("in", b)
    out = int(b) - 48
    # print("out", out)
    return out

str_with_date = "2016-03"
str_bytes = bytearray(str_with_date, encoding="utf-8")

date_str_to_int(str_bytes)
 => (int) 201603

esc · 2020-04-29T17:35:39Z

So the pull-request I had in mind was actually the other way around:

#5463

esc · 2020-04-29T17:38:14Z

@bf I have converted this into a feature request as it doesn't seem to be implemented yet.

esc · 2020-05-04T14:51:21Z

Here is an example implementation that I wrote to illustrate how it would (could) be done:

from numba import njit


@njit
def str_to_int(s):
    final_index, result = len(s) - 1, 0
    for i,v in enumerate(s):
        result += (ord(v) - 48) * (10 ** (final_index - i))
    return result


print(str_to_int("1"))
print(str_to_int("12"))
print(str_to_int("123"))
print(str_to_int("1234"))
print(str_to_int("12345"))
print(str_to_int("123456"))
print(str_to_int("1234567"))
print(str_to_int("12345678"))
print(str_to_int("123456789"))
print(str_to_int("1234567890"))

esc added the needtriage label Apr 29, 2020

esc changed the title ~~Unable to create integer from unicode string with int()~~ Create integer from unicode string with int() Apr 29, 2020

esc added feature_request and removed needtriage labels Apr 29, 2020

DannyWeitekamp mentioned this issue May 18, 2020

Overload float(), int() and str() to do number <-> string conversion #5723

Open

KenanHanke mentioned this issue Dec 6, 2022

Add overload to handle int(s: str) #8648

Closed

stuartarchibald mentioned this issue Dec 7, 2022

[Unicode] support int(str, base=10) as cpython does #8558

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create integer from unicode string with int() #5650

Create integer from unicode string with int() #5650

bf commented Apr 29, 2020 •

edited

esc commented Apr 29, 2020

bf commented Apr 29, 2020

esc commented Apr 29, 2020

esc commented Apr 29, 2020

esc commented May 4, 2020 •

edited

Create integer from unicode string with int() #5650

Create integer from unicode string with int() #5650

Comments

bf commented Apr 29, 2020 • edited

esc commented Apr 29, 2020

bf commented Apr 29, 2020

esc commented Apr 29, 2020

esc commented Apr 29, 2020

esc commented May 4, 2020 • edited

bf commented Apr 29, 2020 •

edited

esc commented May 4, 2020 •

edited