String to int conversion #3937

Alexander-Makaryev · 2019-04-03T09:45:23Z

This is the port of native implementation from cpython.

stuartarchibald · 2019-04-18T21:24:27Z

@Alexander-Makaryev thanks for this PR, is it ready for review? If not, please just comment on here when you'd like it reviewed and the core devs will schedule it. Thanks!

Alexander-Makaryev · 2019-04-22T06:52:40Z

@stuartarchibald yes, it is ready for review.
Another one (#3985) is ready too. It is done on the top of this branch and implementation is very close to this PR, so I think that it is a good idea to review both PRs by one developer at the same time.

stuartarchibald · 2019-04-23T10:15:48Z

@Alexander-Makaryev great, thanks for confirming, I'll schedule them for review.

sklam

(I have only reviewed the organization of the code and not much of the C code. )
Instead of creating a ext_string_conversion, I'd suggest adding the feature into _helperlib so that the new C functions are exposed via the c_helpers. For reference, you can look at how numba_dict_new_minsize is implemented in _dictobject.c and exposed in _helpermod.c (with declmethod(dict_new_minsize);). Lastly, dictobject.py creates an intrinsic to reference it via a .get_or_insert_function(fnty, name='numba_dict_new_minsize')

sklam · 2019-04-25T12:40:20Z

setup.py

@@ -276,9 +276,13 @@ def check_file_at_path(path2file):
                                depends=['numba/_pymodule.h'],
                                include_dirs=["numba"])

+    ext_string_conversion = Extension(name='string_conversion_ext',


This is making a top-level module. It should be nested inside numba. i.e. name='numba._string_conversion_ext'. But better yet is to build numba/_string_conversion.c into numba._helperlib.

sklam · 2019-04-25T12:40:52Z

numba/unicode.py

@@ -31,6 +31,9 @@
 from numba.targets.hashing import _Py_hash_t
 from numba.unsafe.bytes import memcpy_region

+import llvmlite.binding as ll
+import string_conversion_ext


string_conversion_ext should not be a top-level module. See comment in setup.py

sklam · 2019-04-25T12:48:18Z

numba/unicode.py

@@ -810,3 +813,13 @@ def iternext_unicode(context, builder, sig, args, result):
        # bump index for next cycle
        nindex = cgutils.increment_index(builder, index)
        builder.store(nindex, iterobj.index)
+
+
+ll.add_symbol('str2int', string_conversion_ext.str2int)


Please reuse the machinery with _helperlib to expose C functions. The manual registration of str2int here will not work with ahead-of-time compiled code.

stuartarchibald · 2019-04-25T13:06:58Z

Thanks for the PR. Further to @sklam's comments, having taken a quick look at this, I'm of the view that testing that is considerably more extensive needs writing ahead of re-review. Thanks.

shssf · 2019-05-24T16:05:44Z

@sklam is this approach did you mean?
@stuartarchibald I did changes I mentioned. The code is not working. Please help.

stuartarchibald · 2019-05-24T17:50:14Z

@shssf Yes, the approach presented is far closer to that needed for use in Numba, thanks.

Following on from gitter chat, I've added this branch https://github.com/stuartarchibald/numba/tree/pr_3937 which has stuartarchibald@a9bf601 as a patch to fix and demo the code working:

./runtests.py numba.tests.test_unicode.TestUnicodeStr2Int.test_str2int_demo

there are still a large number of things that are working essentially by accident, but I figure it's probably easier to fix something that half works with a bunch of problems than something that doesn't. I also added a load of comments to the code to try and explain literally what is going on, this however doesn't cover commenting on the issues in the current code. Main issues at present are 1) assumption that strings are equivalent to char *, which they are not, Numba's internal representation handles unicode! :) 2) Most of the types need more thought to make it work everywhere. 3) Overloading int will need more thought in general.

Hope this helps, we can definitely discuss more.

shssf · 2019-05-25T04:51:26Z

@stuartarchibald Thank you very much!
Are we going to support kind-2 and kind-4 in string conversion procedure? If so, could you recommend where I can find good examples of digits represented in unicode kind-4 for tests?

stuartarchibald · 2019-05-28T08:42:04Z

@shssf No problem. In terms of kind-2 and kind-4 strings, I think that yes this needs to work, consider the following:

from numba import njit

@njit
def foo():
    string = "🐍⚡123"
    part = string[2:]
    return int(part)

print(foo.py_func())
print(foo())

the above is something that could reasonably occur in practice but will be incorrectly handled by the current code. Further, I'm not convinced that the strto*(3) family produces sufficiently equivalent behaviour to that of the CPython implementation.

shssf · 2019-05-28T08:52:40Z

@stuartarchibald In this case part = string[2:] Numba will not support other digits like this https://ltl-taiwan.com/chinese-numbers/#chapter-1

stuartarchibald · 2019-05-28T10:21:35Z

@shssf The purpose of the demonstration was to highlight that wider kind unicode representations of numbers that could be encoded as ASCII are possible and trivial to achieve.

@njit
def foo():
    string = "🐍⚡123" # Unicode kind-2
    part = string[2:] # This slice is also unicode kind-2
    return int(part) # the `int` call now has to handle a unicode kind-2 repr of "123"

Numba will not support other digits like this https://ltl-taiwan.com/chinese-numbers/#chapter-1

Correct, Numba has to reproduce exactly what CPython does, if CPython doesn't support it then Numba doesn't have to either.

stuartarchibald · 2019-08-12T12:29:53Z

Just checking in on the state of this PR. @shssf @Alexander-Makaryev was wondering if you have time to address the changes above, if not, don't worry, one of the core developers can take over to do the fixes. Thanks.

shssf · 2019-08-12T14:35:19Z

@stuartarchibald Sorry we ran out of time to continue this.

stuartarchibald · 2019-08-15T15:15:33Z

@shssf thanks, am going to close this for now, there's merge conflicts and additional effort required to fully support unicode strings. Please feel free to reopen if updates are made to address these concerns. Thanks again.

implementation with cpython _PyLong_FromBytes

ed8302f

stuartarchibald added the 2 - In Progress label Apr 3, 2019

Alexander-Makaryev and others added 11 commits April 10, 2019 15:53

changed tests, added comments with cpython sources links

9891402

used _pymodule.h

098ae43

changed names of functions

ad40901

support of Python 2.7

ea40275

flake8 errors fixed

3b393ca

changed result type

da71614

py34 or later in tests with unicode

6fd0862

added name to UnicodeType

d765a0b

Merge branch 'master' into feature/string-conversion

1b74d29

fixing types in python/c functions

988a83b

cleanup, sharing of small ints removed

59d5758

Alexander-Makaryev changed the title ~~[WIP] String to int conversion~~ String to int conversion Apr 17, 2019

Alexander-Makaryev mentioned this pull request Apr 18, 2019

String to float conversion #3985

Closed

Alexander-Makaryev added 2 commits April 21, 2019 17:36

returned original funcs names

3d9cf03

reverted previous commit. it was bad idea

c069f8c

stuartarchibald added 3 - Ready for Review and removed 2 - In Progress labels Apr 23, 2019

sklam requested changes Apr 25, 2019

View reviewed changes

stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Apr 25, 2019

DrTodd13 force-pushed the master branch from e8ac495 to ca731bf Compare April 29, 2019 21:41

stuartarchibald force-pushed the master branch from e8ac495 to 14d8e85 Compare April 29, 2019 22:29

stuartarchibald mentioned this pull request May 9, 2019

Support for string lower() method #4049

Closed

PR3937. Not working changes. Proposal for a new design.

6f07f33

shssf added 2 commits May 26, 2019 14:56

PR3937. Comments addressed.

c668345

PR3937. style fix

95fc09c

stuartarchibald closed this Aug 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String to int conversion #3937

String to int conversion #3937

Alexander-Makaryev commented Apr 3, 2019

stuartarchibald commented Apr 18, 2019

Alexander-Makaryev commented Apr 22, 2019

stuartarchibald commented Apr 23, 2019

sklam left a comment •

edited

sklam Apr 25, 2019

sklam Apr 25, 2019

sklam Apr 25, 2019

stuartarchibald commented Apr 25, 2019

shssf commented May 24, 2019

stuartarchibald commented May 24, 2019

shssf commented May 25, 2019

stuartarchibald commented May 28, 2019

shssf commented May 28, 2019

stuartarchibald commented May 28, 2019

stuartarchibald commented Aug 12, 2019

shssf commented Aug 12, 2019

stuartarchibald commented Aug 15, 2019

String to int conversion #3937

String to int conversion #3937

Conversation

Alexander-Makaryev commented Apr 3, 2019

stuartarchibald commented Apr 18, 2019

Alexander-Makaryev commented Apr 22, 2019

stuartarchibald commented Apr 23, 2019

sklam left a comment • edited

Choose a reason for hiding this comment

sklam Apr 25, 2019

Choose a reason for hiding this comment

sklam Apr 25, 2019

Choose a reason for hiding this comment

sklam Apr 25, 2019

Choose a reason for hiding this comment

stuartarchibald commented Apr 25, 2019

shssf commented May 24, 2019

stuartarchibald commented May 24, 2019

shssf commented May 25, 2019

stuartarchibald commented May 28, 2019

shssf commented May 28, 2019

stuartarchibald commented May 28, 2019

stuartarchibald commented Aug 12, 2019

shssf commented Aug 12, 2019

stuartarchibald commented Aug 15, 2019

sklam left a comment •

edited