Skip to content

Conversation

zjpoh
Copy link
Member

@zjpoh zjpoh commented Aug 8, 2019

Addressing #13891.

I implemented parsing from string to complex number. Here are a couple assumptions that I made

  1. If only one component is specified, the other component is assumed to be zero. For example, '1j' gives 0+1j and '1' gives 1+0j.
  2. I follow the assumption for other dtypes, where spaces between separator is allowed and will be ignored but spaces in between numbers, +, -, or j are not allowed. For example, 1+1j, 1+1j is allowed but 1 + 1j,1+1j or 1+1 j are not.

Other comments.

  • Clearer documentation on what are the supported dtypes will be helpful.
  • Explanation and examples of when things break in the documentation and maybe raising a warning will be helpful. For example, np.fromstring("1,2,3 4", sep=",") # array([1, 2, 3]).

This is my first real numpy PR. Please let me know any advices / comments that you have! Thanks.


@type@ output;

if (endptr && (*endptr[0] == '+') || (*endptr[0] == '-')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is suggesting an extra set a parenthesis here for clarification, that is why the test failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjpoh ping

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I'm not available until next Monday. I'll fix it next Monday.

@rgommers
Copy link
Member

rgommers commented Aug 9, 2019

If only one component is specified, the other component is assumed to be zero.

this is right

follow the assumption for other dtypes, where spaces between separator is allowed and will be ignored but spaces in between numbers, +, -, or j are not allowed.

sounds very reasonable

Clearer documentation on what are the supported dtypes will be helpful.

yes, would be nice to edit the docstring. also would be good to say in the dtype description that complex dtypes are only supported since numpy 1.18.0

Explanation and examples of when things break in the documentation and maybe raising a warning will be helpful.

not so sure this is necessary, since there's many ways to incorrectly format the strings. one or two examples could be fine, but optional I'd say.

This is my first real numpy PR. Please let me know any advices / comments that you have! T

From a first read, it looks pretty good. The clear PR description is also nice. Looks like you did everything right:)

# Both components specified
assert_equal(np.fromstring("1+1j,2-2j, -3+3j, -4e1", sep=",", dtype=ctypes),
np.array([1. + 1.j, 2. - 2.j, - 3. + 3.j, - 40.]))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bonus points: you could use assert_raises and ensure the correct exception kind is given for incorrectly formatted strings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other dtypes, if the string is incorrectly formatted, the function simply reads until it's unable to read. For example,

np.fromstring("1+1", sep=",")  # array([1.])

I follow similar pattern and did not check for ill-formatted strings. For example,

np.fromstring("1+1j,2+2", sep=",", dtype="complex")  # array([1.+1.j])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you have np.fromstring("1j+1"), also bad formatting I assume?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap. I'm assuming that is invalid. What is your thoughts on this? I also added a test case for this. Thanks.

@zjpoh
Copy link
Member Author

zjpoh commented Aug 21, 2019

I just notice #13605. I think we should wait until that is merged then I'll make sure that ill-formatted complex number string raise the same error. Then this PR will be ready for review.

@seberg
Copy link
Member

seberg commented Sep 13, 2019

@zjpoh if you want to pick this up again, the deprecation/error one is finally merged.

@zjpoh
Copy link
Member Author

zjpoh commented Sep 17, 2019

@seberg Thanks for informing! 😄

@seberg seberg self-requested a review September 27, 2019 05:55
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement fromtxt, should we also implement the _scan functions, so that fromfile works as well?

@@ -1036,7 +1036,12 @@
A string containing the data.
dtype : data-type, optional
The data type of the array; default: float. For binary input data,
the data must be in exactly this format.
the data must be in exactly this format. Supported dtypes are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just say that most builtin numeric types are supported and extension types may be supported?

output.imag = result;
}
else {
endptr = prev;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to indicate an error by not reading everything? I think it could even return a negative value in this case, in any case, I think I would like a comment explaining this path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap. This is to trigger the error. I'm adding a comment on that.

I'm fine with changing the imaginary part to -1 but can you explain why returning a negative is preferred since a negative number is equally likely as any other number? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the integer return of the function, would have to check mysefl, but IIRC certain return values may also be accepted to signal error.

}

// Skip j
++*endptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be inside the *endptr[0] == 'j' branch? Maybe it does not matter, but that seems clearer to me?

# Both components specified
assert_equal(np.fromstring("1+1j,2-2j, -3+3j, -4e1", sep=",", dtype=ctypes),
np.array([1. + 1.j, 2. - 2.j, - 3. + 3.j, - 40.]))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you have np.fromstring("1j+1"), also bad formatting I assume?

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nitpicking now, the code itself looked nice to me (although I did not have a second super close look). Thanks!

the data must be in exactly this format. Most builtin numeric types are
supported and extension types may be supported.

Complex dtypes are only supported since numpy 1.18.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably havea .. versionadded: tag.

}
else {
// Set endptr to previous char to trigger the not everything is
// read error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly nit, but can you use

/*
 * This is a multiline comment
 * C89 style comment  for multiple lines.
 */

np.array([1j]))



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny Nit: There seems to now be an additional blank line here.

def test_fromstring_complex():
for ctype in ["complex", "cdouble", "cfloat"]:
# Check spacing between separator
assert_equal(np.fromstring("1, 2 , 3 ,4",sep=",",dtype=ctype),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits also here: Please make sure to put spaces after all commas in function arguments as per PEP8.

assert_equal(np.fromstring("1j, -2j, 3j, 4e1j",sep=",",dtype=ctype),
np.array([1.j, -2.j, 3.j, 40.j]))
# Both components specified
assert_equal(np.fromstring("1+1j,2-2j, -3+3j, -4e1", sep=",", dtype=ctype),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the last one here have an imaginary part?

}
else if (endptr && *endptr[0] == 'j') {
// Real component not specified

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On emore style nit, I think I would remove the empty line here and below (and possibly above)

@zjpoh
Copy link
Member Author

zjpoh commented Oct 16, 2019

@seberg I missed your previous comment on we can add _scan. I can take a look at that. I don't mind waiting for that and merge the two PRs together. Thanks~

@zjpoh
Copy link
Member Author

zjpoh commented Oct 23, 2019

I found a function in CPython that parse complex from string https://github.com/python/cpython/blob/1b53a24fb4417c764dd5933bce505f5c94249ca6/Objects/complexobject.c#L784

But I'm not sure what to do with this. Any thoughts?

@mattip
Copy link
Member

mattip commented Oct 28, 2019

Can we re-use that function? It seems to be designed around convertng an arbitrary python object to a complex, not to be part of a stream-parsing routine. But it would be good to reuse the design of the actual string parsing there, if possible

@zjpoh
Copy link
Member Author

zjpoh commented Oct 28, 2019

Got it. 😃

I'm not available this week. I'll take a look at it next week.

@@ -1849,7 +1849,60 @@ BOOL_fromstr(char *str, void *ip, char **endptr,
}

/**begin repeat
* #fname = CFLOAT, CDOUBLE, CLONGDOUBLE,
* #fname = CFLOAT, CDOUBLE#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle CLONGDOUBLE here too? I think we have an NumPyOS_ascii_strtold available.

@mattip mattip merged commit c03ce14 into numpy:master Oct 31, 2019
@mattip
Copy link
Member

mattip commented Oct 31, 2019

Thanks @zjpoh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants