ENH: Parse complex number from string #14227

zjpoh · 2019-08-08T06:23:12Z

Addressing #13891.

I implemented parsing from string to complex number. Here are a couple assumptions that I made

If only one component is specified, the other component is assumed to be zero. For example, '1j' gives 0+1j and '1' gives 1+0j.
I follow the assumption for other dtypes, where spaces between separator is allowed and will be ignored but spaces in between numbers, +, -, or j are not allowed. For example, 1+1j, 1+1j is allowed but 1 + 1j,1+1j or 1+1 j are not.

Other comments.

Clearer documentation on what are the supported dtypes will be helpful.
Explanation and examples of when things break in the documentation and maybe raising a warning will be helpful. For example, np.fromstring("1,2,3 4", sep=",") # array([1, 2, 3]).

This is my first real numpy PR. Please let me know any advices / comments that you have! Thanks.

charris · 2019-08-08T17:35:15Z

numpy/core/src/multiarray/arraytypes.c.src

+
+    @type@ output;
+
+    if (endptr && (*endptr[0] == '+') || (*endptr[0] == '-')) {


The compiler is suggesting an extra set a parenthesis here for clarification, that is why the test failures.

@zjpoh ping

Sorry. I'm not available until next Monday. I'll fix it next Monday.

rgommers · 2019-08-09T01:58:32Z

If only one component is specified, the other component is assumed to be zero.

this is right

follow the assumption for other dtypes, where spaces between separator is allowed and will be ignored but spaces in between numbers, +, -, or j are not allowed.

sounds very reasonable

Clearer documentation on what are the supported dtypes will be helpful.

yes, would be nice to edit the docstring. also would be good to say in the dtype description that complex dtypes are only supported since numpy 1.18.0

Explanation and examples of when things break in the documentation and maybe raising a warning will be helpful.

not so sure this is necessary, since there's many ways to incorrectly format the strings. one or two examples could be fine, but optional I'd say.

This is my first real numpy PR. Please let me know any advices / comments that you have! T

From a first read, it looks pretty good. The clear PR description is also nice. Looks like you did everything right:)

rgommers · 2019-08-09T02:00:02Z

numpy/core/tests/test_longdouble.py

+        # Both components specified
+        assert_equal(np.fromstring("1+1j,2-2j, -3+3j,  -4e1", sep=",", dtype=ctypes),
+                     np.array([1. + 1.j, 2. - 2.j, - 3. + 3.j, - 40.]))
+


bonus points: you could use assert_raises and ensure the correct exception kind is given for incorrectly formatted strings

Similar to the other dtypes, if the string is incorrectly formatted, the function simply reads until it's unable to read. For example,

np.fromstring("1+1", sep=",") # array([1.])

I follow similar pattern and did not check for ill-formatted strings. For example,

np.fromstring("1+1j,2+2", sep=",", dtype="complex") # array([1.+1.j])

What happens if you have np.fromstring("1j+1"), also bad formatting I assume?

Yeap. I'm assuming that is invalid. What is your thoughts on this? I also added a test case for this. Thanks.

zjpoh · 2019-08-21T03:52:44Z

I just notice #13605. I think we should wait until that is merged then I'll make sure that ill-formatted complex number string raise the same error. Then this PR will be ready for review.

seberg · 2019-09-13T06:35:10Z

@zjpoh if you want to pick this up again, the deprecation/error one is finally merged.

zjpoh · 2019-09-17T04:08:45Z

@seberg Thanks for informing! 😄

seberg

If we implement fromtxt, should we also implement the _scan functions, so that fromfile works as well?

seberg · 2019-10-03T23:09:43Z

numpy/core/_add_newdocs.py

@@ -1036,7 +1036,12 @@
        A string containing the data.
    dtype : data-type, optional
        The data type of the array; default: float.  For binary input data,
-        the data must be in exactly this format.
+        the data must be in exactly this format.  Supported dtypes are


Maybe we can just say that most builtin numeric types are supported and extension types may be supported?

seberg · 2019-10-03T23:18:06Z

numpy/core/src/multiarray/arraytypes.c.src

+            output.imag = result;
+        }
+        else {
+            endptr = prev;


This is to indicate an error by not reading everything? I think it could even return a negative value in this case, in any case, I think I would like a comment explaining this path.

Yeap. This is to trigger the error. I'm adding a comment on that.

I'm fine with changing the imaginary part to -1 but can you explain why returning a negative is preferred since a negative number is equally likely as any other number? Thanks.

I meant the integer return of the function, would have to check mysefl, but IIRC certain return values may also be accepted to signal error.

seberg · 2019-10-03T23:18:49Z

numpy/core/src/multiarray/arraytypes.c.src

+        }
+
+        // Skip j
+        ++*endptr;


Shouldn't this be inside the *endptr[0] == 'j' branch? Maybe it does not matter, but that seems clearer to me?

seberg · 2019-10-03T23:20:09Z

numpy/core/tests/test_longdouble.py

+        # Both components specified
+        assert_equal(np.fromstring("1+1j,2-2j, -3+3j,  -4e1", sep=",", dtype=ctypes),
+                     np.array([1. + 1.j, 2. - 2.j, - 3. + 3.j, - 40.]))
+


What happens if you have np.fromstring("1j+1"), also bad formatting I assume?

seberg

Just nitpicking now, the code itself looked nice to me (although I did not have a second super close look). Thanks!

seberg · 2019-10-15T22:52:24Z

numpy/core/_add_newdocs.py

+        the data must be in exactly this format. Most builtin numeric types are 
+        supported and extension types may be supported.
+
+        Complex dtypes are only supported since numpy 1.18.0


Should probably havea .. versionadded: tag.

seberg · 2019-10-15T22:53:07Z

numpy/core/src/multiarray/arraytypes.c.src

+        }
+        else {
+            // Set endptr to previous char to trigger the not everything is
+            // read error


Silly nit, but can you use

/* * This is a multiline comment * C89 style comment for multiple lines. */

seberg · 2019-10-15T22:53:40Z

numpy/core/tests/test_longdouble.py

+                         np.array([1j]))
+
+
+


Tiny Nit: There seems to now be an additional blank line here.

seberg · 2019-10-15T22:54:25Z

numpy/core/tests/test_longdouble.py

+def test_fromstring_complex():
+    for ctype in ["complex", "cdouble", "cfloat"]:
+        # Check spacing between separator
+        assert_equal(np.fromstring("1, 2 ,  3  ,4",sep=",",dtype=ctype),


Small nits also here: Please make sure to put spaces after all commas in function arguments as per PEP8.

seberg · 2019-10-15T22:55:01Z

numpy/core/tests/test_longdouble.py

+        assert_equal(np.fromstring("1j, -2j,  3j, 4e1j",sep=",",dtype=ctype), 
+                     np.array([1.j, -2.j, 3.j, 40.j]))
+        # Both components specified
+        assert_equal(np.fromstring("1+1j,2-2j, -3+3j,  -4e1", sep=",", dtype=ctype),


Should the last one here have an imaginary part?

seberg · 2019-10-15T22:56:30Z

numpy/core/src/multiarray/arraytypes.c.src

+    }
+    else if (endptr && *endptr[0] == 'j') {
+        // Real component not specified
+


On emore style nit, I think I would remove the empty line here and below (and possibly above)

zjpoh · 2019-10-16T03:54:52Z

@seberg I missed your previous comment on we can add _scan. I can take a look at that. I don't mind waiting for that and merge the two PRs together. Thanks~

zjpoh · 2019-10-23T18:48:12Z

I found a function in CPython that parse complex from string https://github.com/python/cpython/blob/1b53a24fb4417c764dd5933bce505f5c94249ca6/Objects/complexobject.c#L784

But I'm not sure what to do with this. Any thoughts?

mattip · 2019-10-28T14:24:20Z

Can we re-use that function? It seems to be designed around convertng an arbitrary python object to a complex, not to be part of a stream-parsing routine. But it would be good to reuse the design of the actual string parsing there, if possible

zjpoh · 2019-10-28T15:20:43Z

Got it. 😃

I'm not available this week. I'll take a look at it next week.

eric-wieser · 2019-10-28T15:41:59Z

numpy/core/src/multiarray/arraytypes.c.src

@@ -1849,7 +1849,60 @@ BOOL_fromstr(char *str, void *ip, char **endptr,
 }

 /**begin repeat
- * #fname = CFLOAT, CDOUBLE, CLONGDOUBLE,
+ * #fname = CFLOAT, CDOUBLE#


Can we handle CLONGDOUBLE here too? I think we have an NumPyOS_ascii_strtold available.

mattip · 2019-10-31T06:48:04Z

Thanks @zjpoh

Parse complex number from string

b112fc3

charris added 01 - Enhancement component: numpy._core labels Aug 8, 2019

charris reviewed Aug 8, 2019

View reviewed changes

rgommers reviewed Aug 9, 2019

View reviewed changes

Add parenthesis as suggested by compiler. Update docstring.

f779af0

zjpoh mentioned this pull request Aug 20, 2019

Unexpected behaviour numpy.fromstring #11878

Closed

zjpoh added 2 commits September 26, 2019 22:04

Merge branch 'master' into from_string_complex

27332a8

Add deprecation warning for invalid complex string

3c9926b

seberg self-requested a review September 27, 2019 05:55

seberg reviewed Oct 3, 2019

View reviewed changes

Update per Sebastian's comments

23fdb82

seberg approved these changes Oct 15, 2019

View reviewed changes

seberg reviewed Oct 15, 2019

View reviewed changes

Fix style per Sebastian's comments

390cbbd

zjpoh mentioned this pull request Oct 17, 2019

ENH: Add complex number support for fromfile #14730

Merged

Add release doc

880acfe

eric-wieser reviewed Oct 28, 2019

View reviewed changes

mattip merged commit c03ce14 into numpy:master Oct 31, 2019


		@type@ output;

		if (endptr && (endptr[0] == '+') \|\| (endptr[0] == '-')) {

Uh oh!

ENH: Parse complex number from string #14227

ENH: Parse complex number from string #14227

Uh oh!

Conversation

zjpoh commented Aug 8, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers commented Aug 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zjpoh commented Aug 21, 2019

Uh oh!

seberg commented Sep 13, 2019

Uh oh!

zjpoh commented Sep 17, 2019

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zjpoh commented Oct 16, 2019

Uh oh!

zjpoh commented Oct 23, 2019

Uh oh!

mattip commented Oct 28, 2019

Uh oh!

zjpoh commented Oct 28, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Oct 31, 2019

Uh oh!

Uh oh!