Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtype "aligned" flag not respected by recarray. #5224

Closed
NeilGirdhar opened this issue Oct 23, 2014 · 20 comments
Closed

dtype "aligned" flag not respected by recarray. #5224

NeilGirdhar opened this issue Oct 23, 2014 · 20 comments

Comments

@NeilGirdhar
Copy link
Contributor

As per my stackoverflow question:

import numpy as np
a = np.zeros(4, dtype=np.dtype([('t', '<f8'), ('d', [('a', '<i4'), ('b', '<f8')], (100,))], align=True))
b = np.require(a, requirements=['ALIGNED'])
print([x.flags['ALIGNED'] for x in [a, b]])

prints [False, False] !

@charris writes that "The recarray class accepts an aligned parameter, but looks to lose it in format_parser".

This bug breaks usage of PyOpenGL as per my other stackoverflow question.

As an aside why does "c-contiguous" imply "aligned" for recarray?

@crackwitz
Copy link

occurs in my setup. v1.9.0 on win7 x64.

@crackwitz
Copy link

v1.9.1 doesn't work either

v1.8.2 DOES work

@adamlwgriffiths
Copy link
Contributor

Can confirm this is affecting my PyOpenGL code when using complex dtypes.

@ahaldane
Copy link
Member

Some preliminary notes:

This isn't a bug in recarray, but in ndarray. (recarray happens to have a completely duplicate format_parser, but that isn't what is being used here).

First of all, according to the docs, "align" has no effect in this line:

a = np.zeros(4, dtype=np.dtype([('t', '<f8'), ('d', [('a', '<i4'), ('b', '<f8')], (100,))], align=True))

since this uses the "list" form of dtype specification, for which the docs say:

This style does not accept align in the dtype constructor as it is assumed that all of the memory is accounted for by the array interface description.

I'm not sure exactly what this means yet.

Second, the line

b = np.require(a, requirements=['ALIGNED'])

does not do anything because np.require currently ignores the "ALIGNED" flag. This is a bug.

I hope to look into it further.

@NeilGirdhar
Copy link
Contributor Author

Thanks for looking into this!

If align is not accepted with an array interface description, then the
dtype constructor should raise.

Best,

Neil

On Mon, Jan 26, 2015 at 5:34 PM, ahaldane notifications@github.com wrote:

Some preliminary notes:

This isn't a bug in recarray, but in ndarray. (recarray happens to have a
completely duplicate format_parser, but that isn't what is being used here).

First of all, according to the docs
http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html, "align"
has no effect in this line:

a = np.zeros(4, dtype=np.dtype([('t', '<f8'), ('d', [('a', '<i4'), ('b', '<f8')], (100,))], align=True))

since this uses the "list" form of dtype specification, for which the docs
say:

This style does not accept align in the dtype constructor as it is assumed
that all of the memory is accounted for by the array interface description.

I'm not sure exactly what this means yet.

Second, the line

b = np.require(a, requirements=['ALIGNED'])

does not do anything because np.require currently ignores the "ALIGNED"
flag. This is a bug.

I hope to look into it further.


Reply to this email directly or view it on GitHub
#5224 (comment).

@NeilGirdhar
Copy link
Contributor Author

This appears to be fixed now! Should we close this? (Might be nice to link to the changelist if someone knows it.)

@charris
Copy link
Member

charris commented Sep 22, 2015

I don't see any obvious candidates for the fix PR, but there are a number of recarray fixups. I'll close this now, thanks for the heads up.

@charris charris closed this as completed Sep 22, 2015
@charris
Copy link
Member

charris commented Sep 22, 2015

I'd guess @ahaldane may be responsible.

@ahaldane
Copy link
Member

I'm looking into it. I believe there is a bug here: Now the alignment flag gets set when it should not be. Eg, try the first example here without align=True.

I don't remember touching alignment though. I'm running git bisect right now to figure it out..

@ahaldane
Copy link
Member

Git bisect says 3351dbf is the first bad commit, in PR #5316. I haven't investigated much, but @juliantaylor may be able to help!

The bisect test was

a = np.zeros(4, dtype=np.dtype([('t', '<f8'), ('d', [('a', '<i4'), ('b', '<f8')], (100,))]))
if a.flags['ALIGNED']:
    sys.exit(1)
sys.exit(0)

@NeilGirdhar
Copy link
Contributor Author

@ahaldane Would you mind showing me how you did this bisect?

@ahaldane
Copy link
Member

Sure. Actually my technique is a homemade cobbled thing, maybe somebody knows something better.

But in summary I have a bash script bisect-script.sh

#!/usr/bin/env bash
rm -rf build   # needed since sometimes stale build objects screw further builds up
python2 ./runtests.py --python bisect-test.py

Then I have a python script bisect-test.py:

#!/usr/bin/env python2
import numpy as np
import sys

a = np.zeros(4, dtype=np.dtype([('t', '<f8'), ('d', [('a', '<i4'), ('b', '<f8')], (100,))]))
if a.flags['ALIGNED']:
    sys.exit(1)
sys.exit(0)

Then I look through git log to a point I guess is probably ok, in this case a0ea059, check that commit out, and run python2 ./runtests.py --python bisect-test.py to check it returns 0 (echo $?), which it does. I also happened to try 524d6f9, which returned 1. Then I did

git bisect start
git bisect bad 524d6f9
git bisect good a0ea059
git bisect run ./bisect-script.sh

and waited. It took about 8 builds, or 20 minutes, to finish.

@NeilGirdhar
Copy link
Contributor Author

@ahaldane Awesome, thanks!!

@charris charris reopened this Sep 22, 2015
@ahaldane
Copy link
Member

I looked a little more at this case. While #5316 introduced new behavior, it might be that the new behavior is OK. There is still a bug to fix here, but it is different from what's going on in #5316.

I think the bug is that the align=True keyword is ignored when using the 'list' form of structured dtype specification, as I mentioned above.

What may not be a bug is the behavior change in 1.9.2, where structured dtypes now have alignment of 1 unless you specifically request align=True (when that's possible). This somewhat confusingly means that arrays with [dtypes created without align=True] count as 'aligned'.

I think the way to interpret this is:

  • The ALIGNED flag (a property of an array) means "this array satisfies its dtype's alignment requirements",
  • align=True in the dtype constructor (a property of a dtype) means "align the fields of this structured dtype to memory boundaries and record the alignment constraint in dtype->alignment".

So, in the example given in this issue, it used to be that the 'align' keyword was ignored , and the ALIGNED flag defaulted to False, but now the 'align' keyword is still ignored, but ALIGNED defaults to True.

Also, in this comment @njsmith suggests that dtype('i2,i2') should have alignment of 2, although it currently has alignment of 1. It possibly makes sense to me that alignment of 1 is right: We did not ask for align=True, which means the dtype does not care about alignment, so arrays should feel free to put items with this dtype at arbitrary memory locations.

@ahaldane
Copy link
Member

It's possible the dtype constructor should have both an align keyword and an alignment keyword. align tells numpy to align the fields within the dtype to mem boundaries. alignment tells numpy to only put items with this dtype at certain memory positions. It's conceivable a user could want one and not the other.

@NeilGirdhar
Copy link
Contributor Author

@ahaldane Wouldn't you be able to accomplish that by setting alignment on the dtypes of the fields?

@ahaldane
Copy link
Member

I'm not sure how that would work. Currently fields do have alignments: dtype('u4').alignment is 4, yet dtype('u1,u4') does not align the second field to 4 bytes.

@NeilGirdhar
Copy link
Contributor Author

@ahaldane Maybe it would be good to start by making a list of use cases? My use case is OpenGL vertex data. See: e.g. https://developer.apple.com/library/ios/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/TechniquesforWorkingwithVertexData/TechniquesforWorkingwithVertexData.html#//apple_ref/doc/uid/TP40008793-CH107-SW7

Each record must be aligned to 4 bytes. The fields within a record do not need to be aligned; they can start anywhere.

@ahaldane
Copy link
Member

Interesting. This is a use-case where we might indeed want to separate the two effects of align=True. What I was suggesting was allowing

>>> dtype(dict(names=['x','y','z'], formats=['u2','u2','u2'],
...            offsets=[0,2,5], alignment=4, itemsize=8), align=False)

Normally setting align=True would complain about the offset of 5, yet setting align=False sets the alignment to 1. But this way we can still set alignment.

(Although, I have trouble imagining a real-world case where alignment=1 would give us problems. Most arrays would create the items at 8-byte boundaries anyway since that is the itemsize).

@NeilGirdhar
Copy link
Contributor Author

This appears to be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants