Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Various fixes to _dtype_from_pep3118 #9054

Merged
merged 4 commits into from
May 9, 2017

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented May 5, 2017

Fixes #9053, and groundwork for #9049.

Again, probably easier to review commit-by-commit

Adding padding fields means that dtypes with the same fields cannot necessarily
be cast to one another. Padding shouldn't prevent casting.

Partially fixes numpy#9053
@eric-wieser eric-wieser changed the title [WIP] MAINT: refactor _dtype_from_pep3118 in terms of a stream BUG: Various fixes to _dtype_from_pep3118 May 5, 2017
@mhvk
Copy link
Contributor

mhvk commented May 5, 2017

I like the solution with the stream class (though I wonder to what extent you're reinventing ply...). But I cannot say I went through this in enough detail.

@mhvk
Copy link
Contributor

mhvk commented May 5, 2017

p.s. would be nice to have #8774 so you don't have to redefine gcd and lcm...

@eric-wieser
Copy link
Member Author

eric-wieser commented May 5, 2017

#8774 actually falls back on _internals._gcd for object arrays anyway. I also think that that file is minimizing the number of things from np.core it uses, to avoid the possibility of recursion loops either at import-time or runtime.

@eric-wieser
Copy link
Member Author

though I wonder to what extent you're reinventing ply

I've seen in the past that dependencies for numpy are a pain from a distribution perspective (#that-pr-about-deprecating-np.float), and it's better to do simple things ourselves.

@mhvk
Copy link
Contributor

mhvk commented May 5, 2017

Yes, makes sense. And as said, I quite like your stream class; and the path that gets taken is at least a bit clearer than with a lex file (or a regex).

@ahaldane
Copy link
Member

ahaldane commented May 8, 2017

  • The 'stream' cleanup commit looks nice. Perhaps consider making the class global level? I'm fine as is though.
  • The three bugfix commits look good:
    • The itemsize commit is partly like in ENH: properly account for trailing padding in PEP3118 #7798 as you mentioned. I am fine merging it as you have it for now. Although, as noted there we are probably incorrectly interpreting the spec: According to spec ix should be 5 bytes, not 8. If the user wants 8 they need to do ix0i. We might leave that for ENH: properly account for trailing padding in PEP3118 #7798, since the problem is purely theoretical right now: I don't know of public python modules (including numpy) which can create buffers with 'ix' format.
    • The commit using the 'names' list to store the field order looks correct, and even seems to fix bugs we didn't come across (eg the extra test that the 1-item itemsize is unchanged).
    • The autogenerated names commit looks good and is much easier to read than the old code

I read over it well enough that it has my +1 for merging. If there are no other comments I would be happy to merge in a day or two.

@ahaldane
Copy link
Member

ahaldane commented May 9, 2017

All right, pulling the trigger here. Thanks @eric-wieser !

@ahaldane ahaldane merged commit 8618066 into numpy:master May 9, 2017
@eric-wieser
Copy link
Member Author

. Although, as noted there we are probably incorrectly interpreting the spec: According to spec ix should be 5 bytes, not 8

Can you give a reference for that?

@pv
Copy link
Member

pv commented May 9, 2017 via email

@eric-wieser
Copy link
Member Author

eric-wieser commented May 9, 2017

@pv: Here's the problem. The c code you refer to gives 8, but struct.Struct('ix').size gives 5.

@pv
Copy link
Member

pv commented May 9, 2017 via email

@pv
Copy link
Member

pv commented May 9, 2017 via email

@ahaldane
Copy link
Member

ahaldane commented May 9, 2017

Here's the reference, (copied from my comment in #7798):

The struct module says this about how format strings treat trailing padding:

To align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero.

My interpretation is that for a format like 'ix', even though the type is implicitly @ (aligned) we should only add 1 trailing padding byte. If we wanted numpy-style padding bytes the format string should instead be something like 'ix0i'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants