-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
ENH: Parse complex number from string #14227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
@type@ output; | ||
|
||
if (endptr && (*endptr[0] == '+') || (*endptr[0] == '-')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler is suggesting an extra set a parenthesis here for clarification, that is why the test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zjpoh ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I'm not available until next Monday. I'll fix it next Monday.
this is right
sounds very reasonable
yes, would be nice to edit the docstring. also would be good to say in the
not so sure this is necessary, since there's many ways to incorrectly format the strings. one or two examples could be fine, but optional I'd say.
From a first read, it looks pretty good. The clear PR description is also nice. Looks like you did everything right:) |
# Both components specified | ||
assert_equal(np.fromstring("1+1j,2-2j, -3+3j, -4e1", sep=",", dtype=ctypes), | ||
np.array([1. + 1.j, 2. - 2.j, - 3. + 3.j, - 40.])) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bonus points: you could use assert_raises
and ensure the correct exception kind is given for incorrectly formatted strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the other dtypes, if the string is incorrectly formatted, the function simply reads until it's unable to read. For example,
np.fromstring("1+1", sep=",") # array([1.])
I follow similar pattern and did not check for ill-formatted strings. For example,
np.fromstring("1+1j,2+2", sep=",", dtype="complex") # array([1.+1.j])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you have np.fromstring("1j+1")
, also bad formatting I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap. I'm assuming that is invalid. What is your thoughts on this? I also added a test case for this. Thanks.
I just notice #13605. I think we should wait until that is merged then I'll make sure that ill-formatted complex number string raise the same error. Then this PR will be ready for review. |
@zjpoh if you want to pick this up again, the deprecation/error one is finally merged. |
@seberg Thanks for informing! 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we implement fromtxt, should we also implement the _scan
functions, so that fromfile
works as well?
numpy/core/_add_newdocs.py
Outdated
@@ -1036,7 +1036,12 @@ | |||
A string containing the data. | |||
dtype : data-type, optional | |||
The data type of the array; default: float. For binary input data, | |||
the data must be in exactly this format. | |||
the data must be in exactly this format. Supported dtypes are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just say that most builtin numeric types are supported and extension types may be supported?
output.imag = result; | ||
} | ||
else { | ||
endptr = prev; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to indicate an error by not reading everything? I think it could even return a negative value in this case, in any case, I think I would like a comment explaining this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap. This is to trigger the error. I'm adding a comment on that.
I'm fine with changing the imaginary part to -1 but can you explain why returning a negative is preferred since a negative number is equally likely as any other number? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the integer return of the function, would have to check mysefl, but IIRC certain return values may also be accepted to signal error.
} | ||
|
||
// Skip j | ||
++*endptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be inside the *endptr[0] == 'j'
branch? Maybe it does not matter, but that seems clearer to me?
# Both components specified | ||
assert_equal(np.fromstring("1+1j,2-2j, -3+3j, -4e1", sep=",", dtype=ctypes), | ||
np.array([1. + 1.j, 2. - 2.j, - 3. + 3.j, - 40.])) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you have np.fromstring("1j+1")
, also bad formatting I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nitpicking now, the code itself looked nice to me (although I did not have a second super close look). Thanks!
numpy/core/_add_newdocs.py
Outdated
the data must be in exactly this format. Most builtin numeric types are | ||
supported and extension types may be supported. | ||
|
||
Complex dtypes are only supported since numpy 1.18.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably havea .. versionadded:
tag.
} | ||
else { | ||
// Set endptr to previous char to trigger the not everything is | ||
// read error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly nit, but can you use
/*
* This is a multiline comment
* C89 style comment for multiple lines.
*/
numpy/core/tests/test_longdouble.py
Outdated
np.array([1j])) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny Nit: There seems to now be an additional blank line here.
numpy/core/tests/test_longdouble.py
Outdated
def test_fromstring_complex(): | ||
for ctype in ["complex", "cdouble", "cfloat"]: | ||
# Check spacing between separator | ||
assert_equal(np.fromstring("1, 2 , 3 ,4",sep=",",dtype=ctype), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits also here: Please make sure to put spaces after all commas in function arguments as per PEP8.
numpy/core/tests/test_longdouble.py
Outdated
assert_equal(np.fromstring("1j, -2j, 3j, 4e1j",sep=",",dtype=ctype), | ||
np.array([1.j, -2.j, 3.j, 40.j])) | ||
# Both components specified | ||
assert_equal(np.fromstring("1+1j,2-2j, -3+3j, -4e1", sep=",", dtype=ctype), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the last one here have an imaginary part?
} | ||
else if (endptr && *endptr[0] == 'j') { | ||
// Real component not specified | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On emore style nit, I think I would remove the empty line here and below (and possibly above)
@seberg I missed your previous comment on we can add |
I found a function in CPython that parse complex from string https://github.com/python/cpython/blob/1b53a24fb4417c764dd5933bce505f5c94249ca6/Objects/complexobject.c#L784 But I'm not sure what to do with this. Any thoughts? |
Can we re-use that function? It seems to be designed around convertng an arbitrary python object to a complex, not to be part of a stream-parsing routine. But it would be good to reuse the design of the actual string parsing there, if possible |
Got it. 😃 I'm not available this week. I'll take a look at it next week. |
@@ -1849,7 +1849,60 @@ BOOL_fromstr(char *str, void *ip, char **endptr, | |||
} | |||
|
|||
/**begin repeat | |||
* #fname = CFLOAT, CDOUBLE, CLONGDOUBLE, | |||
* #fname = CFLOAT, CDOUBLE# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle CLONGDOUBLE
here too? I think we have an NumPyOS_ascii_strtold
available.
Thanks @zjpoh |
Addressing #13891.
I implemented parsing from string to complex number. Here are a couple assumptions that I made
'1j'
gives0+1j
and'1'
gives1+0j
.+
,-
, orj
are not allowed. For example,1+1j, 1+1j
is allowed but1 + 1j,1+1j
or1+1 j
are not.Other comments.
np.fromstring("1,2,3 4", sep=",") # array([1, 2, 3])
.This is my first real numpy PR. Please let me know any advices / comments that you have! Thanks.