REV: 8daf144 and following. Strides must be clean to be contiguous #2735

Closed
wants to merge 2 commits into
from

Projects

None yet

8 participants

@seberg
Member
seberg commented Nov 14, 2012

This reverts those parts of PR gh-2694 that were directly related to the change in flag logic, since it seems that too much code relies at least on mostly clean strides. See also the discussion there.

Some changes remain:

  1. 0-sized arrays are handled somewhat more stricter, forcing them to have matching strides including the dimension of size 0. (This means an array with itemsize=8, strides=(16,) and shape=(0,) is not contiguous). This is mostly so that its really always true that if C-Contiguous then strides[-1] == itemsize, and I can't see it hurting, and it "fixes" a corner case in scikits-learn.
  2. 0-sized arrays can be both C- and F-Contiguous.
  3. 1-sized arrays will be both C- and F-Contiguous (or neither).

Though I would consider 2 and 3 fixes.

On a side note, I think there should be code to clean up strides a little (better). This applies to:

  1. Newaxis in slicing operations
  2. Copy with Keeporder
  3. reduce operations with Keepdims=True

Oh, I think the ISONESEGMENT should at least a bit more accurately check for SIZE < 1, the NDIM check is superfluous anyway.

seberg added some commits Nov 13, 2012
@seberg seberg REV: 8daf144 and following. Strides must be clean to be contiguous
This reverts mostly 8daf144 and c48156d. It leaves some small changes
from the original behavior. Size 1 arrays are both C- and F-contiguous,
and size 0 array can one, both or neither, however it is a bit stricter
enforcing strides are clean including the (first) 0 dimension.
8c99ef1
@seberg seberg BUG: Fix small issue with flags setting in ctors. fa09b2d
@seberg seberg closed this Nov 14, 2012
@seberg seberg reopened this Nov 14, 2012
@charris
Member
charris commented Nov 25, 2012

@rgommers, @njsmith, it looks like this PR should go in.

@seberg
Member
seberg commented Nov 26, 2012

Maybe there should be some a better explanation when exactly the flags are set. Possibly noting that it is still better not to rely on clean strides and that relying on it may brake (and most of the code that this broke already has problems) espeacially for older numpy if the arrays size is 0 or 1. The C-Api docu is generated from source? I can write a bit in a few days.

@GaelVaroquaux
Contributor

A vote from the scikit-learn: without this PR, scikit-learn is fairly broken under numpy master. This PR fixes it. ๐Ÿ‘ for merge on our side!

@charris
Member
charris commented Nov 26, 2012

@seberg The c-api isn't automatically generated, you need to write something in the appropriate c-api*.rst file in doc/source. I don't know which one, though :-(.

@seberg
Member
seberg commented Nov 27, 2012

Ok, thanks I think I will look at writing a little documentation later. However if that is the only reason to keep this open, I will just do a seperate PR for it.

@charris
Member
charris commented Nov 27, 2012

That seems appropriate. I'm not sure what is holding this up.

@seberg
Member
seberg commented Nov 28, 2012

I am not sure, I guess it just needs a bit review. Or is there any discussion needed for the actual remaining changes? I mean it may sound weird that a size 1 array can be non contiguous and size 0 arrays are stricter, but that is consistent with allowing third party software to rely on what they already rely on (even if I don't like it). Also the old behaviour was a bit odd with allowing only one of the two flags being set (unless ndim < 2) which is removed.

@charris
Member
charris commented Nov 28, 2012

I agree that it sounds a bit weird. To some extent, I think we are being driven by unwarranted assumptions in user code. @GaelVaroquaux Gael, what would it take to fix scikits-learn? I know that you have a lot of working sets out in the field and don't want to invalidate them, but is it possible to fix things up so that some time in the future we can finish rationalizing the behavior of these flags>

@stefanv
Contributor
stefanv commented Nov 28, 2012

A size 1 array can and should be both C and F-contiguous.

@seberg
Member
seberg commented Nov 28, 2012

@stefanv OK, I can change it to that I really don't care. The deal is, people complained about scipy and sk-learn (and probably others out there) being broken. And in reality all of these are already broken for corner cases because of rules like "A size 1 array should be (always) both C and F-contiguous". From my point of view, the change here "fixes" the misuse in other code, because I thought it nonsense to support people relying on clean strides for most cases but not for all cases. Now of course I will gladly change it back, then I have more reason to say that these are all bugs in scipy and sk-learn, because without this change, they are, they just rarely surface.

I do think that this needs to be changed back (it just breaks too much suddenly), but if numpy should or should not make these abuses at least work consistently for the moment, you guys tell me.

@stefanv
Contributor
stefanv commented Nov 28, 2012

@seberg I feel that we should try to be consistent with the mental model of what contiguous means; I'm not sure why scikit-learn breaks, unless they depend on broken behavior inside of numpy? Perhaps I should investigate more closely; perhaps you have already?

@seberg
Member
seberg commented Nov 28, 2012

@stefanv to be honest I am a bit sick of this... There are two models (the first is the case in current master):

  1. The memory is in a contiguous layout
  2. and also things like strides[-1] == itemsize are true (this is the only point I saw yet, I don't know if anyone might rely on more, but I doubt it)

All things break in current master rely on 2. Now numpy 1.7., etc. actually provide 2. in most cases.

If your mental model is 1. then yes, they all rely on broken behaviour in numpy. Now in numpy 1.7. this is not even consistently possible (for some rarer cases namely size 1 and 0 arrays), in current master you cannot rely on 2. in many more cases (so things break). This PR basically makes numpy consistently broken if your model is 1. but thus "fixes" the other packages.

The fixes are all very easy, all you need to do is if you use the fact that an array is contiguous do not use strides, because all you need is itemsize and shape. Actually it really boils down to replacing arr.strides[-1] (or [0] for F-order) with arr.itemsize.

@charris
Member
charris commented Nov 28, 2012

@stefanv I believe current scipy also breaks without this code. I'm inclined to merge this fix, but I would also like sk-learn and scipy to fix things up so we can do things properly at some point in the future. Now, precisely how we get to that desired end point is a bit hazy. Suggestions on that point welcome.

@GaelVaroquaux
Contributor

To some extent, I think we are being driven by unwarranted assumptions
in user code.

That's probably true.

@GaelVaroquaux Gael, what would it take to fix scikits-learn?

I am not sure: I didn't write the code that is breaking. I had a quick
look and could fix it in an hour's work. However, it probably deserves
more work. I am current travelling (amongst other things teaching Python
and scipy); @amueller, @glouppe, @pprett, you who know the tree module,
do you think that you could take a look (and also read this thread on the
numpy tracker to see the suggestions). This is related to the following
tree failure in sklearn:
scikit-learn/scikit-learn#1406
which I believe I have linked to strides being now possibly 0 in
C-contiguous arrays.

but is it possible to fix things up so that some time in the future we
can finish rationalizing the behavior of these flags

+1, we'll have a look and figure out what we can easily fix.

@erg
erg commented Nov 28, 2012

The python community doesn't pay enough attention to detail regarding array shapes, types, and corner cases imho. +1 for anything that would improve the situation.

@seberg
Member
seberg commented Nov 29, 2012

What might mitigate this problem a little would be if we clean the strides (also with the buffer protocol, I think that is done already like that) when a contiguous array is requested. I do not know if Cython, etc. use if not array.flags.c_contiguous: array = array.copy() or array = np.array(array, copy=False, order='C').

If cython, etc. use the latter method, that should actually solve all existing problems in scipy and sk-learn. It might be a little confusing though and will certainly not solve all occurrences of this usage out there. However I actually think that it would be consistent to do it like that, and maybe its "good enough" to avoid massive problems out there and still make the flags more consistent to what most expect.

@seberg
Member
seberg commented Nov 29, 2012

Actually it seems to me that cython uses the buffer protocol, which should probably return cleaned up strides in any case. I actually like the above idea, even if there is still a risk of breaking something out there. Does anyone have an idea how likely that should be?

@njsmith
Member
njsmith commented Nov 29, 2012

I agree with @charris... this is a mess and we really want to get rid of it (especially this nonsense where we need heuristics to guess whether people want their row vectors to be C-contiguous or F-contiguous based on the contiguity of the arrays they came from). But the fact that it has broken all the major C-API-using packages we've looked at so far suggests that we're not going to get away with just changing it in one fell swoop :-(.

(I don't much like any of the partial fixes, since none of them allow a row vector to be flagged as both C- and F-contiguous simultaneously, and that's the most common situation that requires nasty hard-to-test-and-maintain heuristics.)

Here's one possible migration path, very conservative:

  • Add new flags to carry the new semantics, like NPY_ARRAY_C_CONTIGUOUS_FIXED = 0x2000. (Fortunately we have many flag bits still to spare.)
  • Deprecate the old flags. For GCC and MSVC we can do this directly with something like static const int NPY_ARRAY_C_CONTIGUOUS = 0x0001 __attribute__(deprecated); resp. __declspec(deprecated), and then everyone who uses these flags will get warnings.
  • Tell everyone that the way to fix these warnings is to audit their code to make sure it's not making unwarranted assumptions about strides, and then switch to using the new un-deprecated flags.
  • Do something similar with NPY_CORDER_FIXED, np.array(..., order="C_FIXED"), arr.flags.c_contiguous_fixed ... Ugh, this is looking uglier the more I think about it.
  • After a while, make using the old stuff an error.
  • Optional: After a while, re-add the original names, but now with the new semantics. (Optional because, if we can find a name that is less horrible than ..._FIXED in the first place, then we might be able to skip this step.)

Hmm. This wouldn't be so bad if it were just a matter of having to remember for a few releases that, say, NPY_ARRAY_C_CONTIG and NPY_ARRAY_C_CONTIGUOUS were subtly different things. But there's really no acceptable alternative to order="C", is there? And there is almost certainly code out there that does np.array(..., order="C") and then calls a C function which blithely assumes the strides are 'clean'. I don't see how to detect such code and issue a warning except by changing how order="C" works somehow.

@seberg: You found a bunch of these problems in sklearn with a "quick look" -- did it give you any ideas for how we can find more of these problems semi-automatically?

Anyone else have any ideas on heuristics we could use to issue warnings at appropriate points without drowning the user in warnings?

@seberg
Member
seberg commented Nov 30, 2012

I actually now think that Cython code (and with that sk-learn) is in a way OK, because Cython seems to use the buffer interface and that probably should (and easily can) clean up strides of the returned buffer instead of just setting it to the original strides.

Numpy could ensure clean strides for np.array(..., order='C') and equivalent C-API functions, though it wastes a few CPU cycles and is a bit tedious, it does not seem too bad to me even for the future. So maybe it may be unnecessary to deprecate order. If users out there only check flags (the code we currently saw break doesn't) and then rely on clean strides of course a deprecation process with the strides flags would be necessary I guess.

I have started a bit on hacking that the buffer interface returns cleaned up strides and also numpy tidies up strides when explicitly asked for it (note that ie. np.array(..., order='C', copy='False') would still have both flags as now, but the strides are clean when interpreted as a C-Contiguous array.

@njsmith I just grepped for .strides[ PyArray_STRIDES(, etc. ... its all not too common, and if its not used with something like -1, etc. I can assume its fine... You could also make things crash more often by changing numpy. But semi-automatic, I am not sure. Its probably not hard for someone who knows what is going on to find if a project should be OK... for example I also grepped through Theano code and I think it should be fine.

@stefanv
Contributor
stefanv commented Nov 30, 2012

@njsmith I'm just trying to clarify this in my mind: a row-vector that comes from slicing a C-contiguous array must be F-contiguous False and C-contiguous False, correct?

@seberg
Member
seberg commented Nov 30, 2012

Its a work in progress, but https://github.com/seberg/numpy/compare/clean_requested_strides fixes those issues I know in scipy and sk-learn (by cleaning up strides when buffer/array is specifically requested as contiguous), though I did not test the buffer interface change yet. This doesn't clean anything in the python API yet though, etc. Just to note that change would leave the flags as they are in master.

@nouiz
Contributor
nouiz commented Feb 1, 2013

@seberg on gh-2949 asked my opinion of the strides stuff. I agree that in an ideal world, program shouldn't use the strides without checking them for dimensions of size 1. But I think that having a simple deterministic behavior is important.

I also think that such change have far implication that are hard to see. Just the number of code broken by this is an example. So I think that if there is such change, it should be done for NumPy 2.0. People don't want to re-validate all code they did to check that. Also, I think we should make it easy for people to detect problem in there code with this. For example, by being able to compile NumPy that use MAX_SSIZE_T value for those strides will probably make the program crash.

What ever convention is decided, I think that those arrays MUST have the align/c contiguous/f contiguous flag don't check the strides in those dimensions. i.e. This can be done by having as strides some value that make those flags computed correctly as in the past or that we update the computation of those flags to check for the new convention.

So here the list of possible value that can be affected as strides to dimensions of 1:

  1. Always use 0. This is useful for parallel algorithm that just multiple the index * strides for each dimension. There isn't any special cases for broadcasted dimensions. In Theano GPU code, we set the broadcasted dimensions to 0 before calling the GPU code.
  2. Always some dummy value like MAX_SSIZE_T (probably wrong macro, this is just the idea). This help detect user that rely on the strides for broadcatable dimensions as this will probably make the program crash.
  3. Use a strides that would make multiple for loop do the broadcasting automatically. I think an example is best to describe this. I want to add 2 ndarray of float64 with shape(10, 10) and (5,1) and strides (80, 8) and (-80, 8) respectively.
    ptr_in1 = in1.data
    ptr_in2 = in2.data
    ptr_out = out.data
    for (int i=0; i<out_shape[0]; i++, ptr_in1+=in1.strides[0], ptr_in2+=in2.strides[0], ptr_out+=out.strides[0])
    for (int j=0; j<out_shape[0]; j++, ptr_in1+=in1.strides[0], ptr_in2+=in2.strides[0], ptr_out+=out.strides[0])
    *ptr_out = *ptr_in + *ptr_in2

Here, the "ptr_in2+=in2.strides[0]" will automatically reset the ptr_in2 at the right place. So there isn't any special cases for broadcasting in this cases. This cause less special case codes, simplify the code and make it faster then having condition in the loop.

This convention is only good for serial code. As we go to parallel architecture, I don't think it is wise to do this. The code can pre arrange the strides it use to do this.

  1. Keep the past value: the previous strides used. This allow to don't change how we compute the align/c/f flags. Also, this don't break much code.

Also, making 1d array c and f contiguous at the same time or neither (as for 1sized array) would simplifies implementation.

As conclusion, I think we shouldn't break other people code by change to strides until NumPy 2.0. Those are hard to find and debug. I consider the numpy strides for broadcating dimensions as part of the NumPy API as this is something needed when we do performance code. People that write c code with NumPy do it for only one reason: performance! So even if it wasn't writen formally somewhere, I think we should consider this part of the NumPy interface and deal with this as like that: warn people, allow them to identify problem and later do changes that break code.

This discussion seam splited. If I missed important part of it, can you point them to me for reading?

In all cases, thank all for your hard work. I know that improving a library without breaking stuff is not trivial.

@pv
Member
pv commented Feb 1, 2013

@seberg: as a way forward --- would it be simplest to first completely revert all the commits in gh-2694 and get that merged ASAP? After that, the "good" parts could go in a new PR (once we have a clearer idea what 3rd party code actually expects). I think it's not good to keep the master branch in a "known to be sort of wonky" state waiting for a decision on how to fix it or which parts to revert.

@seberg
Member
seberg commented Feb 1, 2013

Well... I honestly was not aware that this caused much trouble out there (I did not know about those cython memoryview issues). To be honest, this PR basically just reverts to the old behaviour.
Yes it keeps more, but those are bug fixes and making the old behavior sane. At this time your cython example probably already fails for corner cases (think 0-sized arrays), and this would "fix" that (though maybe not quite, because for 0-sized arrays numpy does sometimes multiply by 0 but for array creation multiplies the stride with 1).

So no, I personally think it would be better to just review this and then merge it if it should be done fast. That was once the intention anyway. (even if after that I thought that just fixing the strides for requested contiguous may be a better option to push forward, which fails to "fix" np.ndarray[...] cython misuses)

Thanks for the comment nouize I like the point about the stride value being 0 or such, definitely something to keep in mind.
At this time it seems to me like the only thing that may be done to try to push to ultimately changing the behaviour would be providing something like NPY_ARRAY_C_MEM_CONTIGUOUS and using that inside of numpy (it seems to me more elegant in some occasions) as well as saying that ultimately the definition may be adopted as only one.

@pv
Member
pv commented Feb 1, 2013

@seberg: indeed, my point was more of a organizational one (I didn't take a close look at the code changes, except that the full diff is not that big) --- a full revert is easier to understand and review, as it doesn't really involve making decisions, and it removes the unnecessary pressure to do something fast. But I'm just "cheering" from the sidelines here, I trust your judgment :)

@njsmith
Member
njsmith commented Feb 2, 2013

Well, here's a not great but still maybe better than the alternatives
proposal:

  • we rearrange the code as necessary so that it has what's necessary for
    both the old style strict-contiguity requirement plus ordering heuristics,
    and the new-style realistic-contiguity with 1 and 0 length axes defaulting
    to bizarro strides whenever not otherwise specified.
  • we make the choice of these two modes dependent on an environment
    variable (checked once at import time and then stashed in a global)
  • for the first beta, we default to the new-style mode (and put a big
    earning in the release announcement, with a note that we will revert the
    changes if they cause too much of a problem but seriously please fix your
    code)
  • starting with rc1, we switch the default to the old-style mode (because
    we know perfectly well that people will complain)
  • repeat in the next release, and the one after that, until enough people
    have fixed their code that we stop getting these complaints. Then do a
    release with the new mode enabled by default, hooray. But, crucially, at
    this point we're still providing a fallback for those who missed all the
    warnings - they can set an environment variable as a temporary workaround.
  • 6 months later or so, rip out the old code; rejoice.

It doesn't make me happy but all the alternatives I can think of seem worse.
On 1 Feb 2013 12:07, "Pauli Virtanen" notifications@github.com wrote:

@seberg https://github.com/seberg: indeed, my point was more of a
organizational one (I didn't take a close look at the code changes, except
that the full diff is not that big) --- a full revert is easier to
understand and review, as it doesn't really involve making decisions, and
not doing it causes unnecessary pressure to do something fast. But I'm just
"cheering" from the sidelines here, I trust your judgment :)

โ€”
Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/2735#issuecomment-13011649.

@GaelVaroquaux
Contributor
  • we make the choice of these two modes dependent on an environment
    variable (checked once at import time and then stashed in a global)

Not an option, IMHO. Think about the side effects and race-conditions of
mixing code from different libraries wanting the 2 different behaviors.

@njsmith
Member
njsmith commented Feb 2, 2013

There are no race-conditions; this is something the user would have to set
explicitly, either because they have some annoying broken library and are
waiting for it to be fixed (after we finally make this the default), or
because they want to test whether their stack is fixed yet (before we make
this the default).

(I suppose some libraries/installs might go to extreme lengths to mess with
the env var directly, but at that point it will still have accomplished the
purpose of making the relevant developers aware of the issue, and aware
that they're going to have to find a better solution if they want support
from us.)

Also, there are no libraries that need the new behaviour: there are only
libraries that need the old behaviour, and libraries that are indifferent
to which behaviour is in effect (since the new behaviour provides strictly
fewer guarantees than the old, and anyway no-one's going to immediately
drop support for the immediately previous version of numpy).

So I think these particular objections are not an issue, though there may
be others.

On Sat, Feb 2, 2013 at 9:49 AM, Gael Varoquaux notifications@github.comwrote:

  • we make the choice of these two modes dependent on an environment
    variable (checked once at import time and then stashed in a global)

Not an option, IMHO. Think about the side effects and race-conditions of
mixing code from different libraries wanting the 2 different behaviors.

โ€”
Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/2735#issuecomment-13034298.

@GaelVaroquaux
Contributor

So I think these particular objections are not an issue, though there may
be others.

I disagree. By creating this situation, you are opening the door to a
mess. Don't do it.

@njsmith
Member
njsmith commented Feb 2, 2013

I'm not sure how to respond to this. We have a mess now and we're trying to
figure out how to get out of it. My proposal is clunky in a number of ways,
but it's the best idea I have. Even if you can't provide a better
suggestion, can you at least say why you disagree, it opens the door to a
mess, and we shouldn't do it, so that your reasoning might point the way
towards something more constructive? I've already explained why your
previous objections AFAICT don't seem to apply when you look at the details
of the situation.

On Sat, Feb 2, 2013 at 10:46 AM, Gael Varoquaux notifications@github.comwrote:

So I think these particular objections are not an issue, though there may
be others.

I disagree. By creating this situation, you are opening the door to a
mess. Don't do it.

โ€”
Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/2735#issuecomment-13035158.

@GaelVaroquaux
Contributor

Even if you can't provide a better suggestion, can you at least say
why you disagree, it opens the door to a mess, and we shouldn't do
it, so that your reasoning might point the way towards something more
constructive? I've already explained why your previous objections
AFAICT don't seem to apply when you look at the details of the
situation.

Well, I don't agree with your explaination. My experience with software
tell me that have a global state that switch a core behavior of a library
is a very bad idea. Someone is bound to mess with it at some point, and
there will be crazy bug reports flying here and there.

I don't have a better suggestion. I might say: suck it up and induce at
some point in the development of numpy a forced change in behavior. Don't
have the two coincide, or if you do, make it a compilation flag of numpy,
as a temporary solution to help people porting their code. In some sens
this is more violent, but it will create errors that are easier to trace.

Ideally, the change of behavior should happen in numpy 2.0, and the
compilation flag should be around before, but only to enable easier
forward porting.

I understand that you want to clean the mess, but the mess is mostly in
code that you do not control (code using numpy, and not code in numpy).
So your best lever here is communication, social engineering, and tools
in numpy to help track problems.

@njsmith
Member
njsmith commented Feb 2, 2013

Okay, it sounds like we actually agree much more than we disagree :-). As I
understand it, you like the basic plan described above, except that:

  1. You don't think we should provide a temporary opt-out switch after we
    make the change; we should just change the default and remove the
    compatibility code at the same time (collapsing my last two stages).
  2. You agree about shipping both behaviours from the same source code for a
    while, just you think that it should be more difficult for end-users to
    choose which version of the library they are using (compiling twice in
    different virtualenvs with different switches, versus setting an envvar).

I don't feel strongly about either of these points, so great, some
consensus is certainly possible :-). I do kind of like having the option of
telling people "hmm, your library crashed with the new numpy? Can you try
with NUMPY_FORCE_OLD_STRIDE_COMPAT=1? If that fixes it then it's your bug
not ours, go read this page and fix your code" -- easy debugging is useful!
-- but I can see the trade-offs too.

Before taking the time to get into that debate in detail now, though, I'd
rather first see if everyone else thinks that this general approach is a
viable one?

On Sat, Feb 2, 2013 at 11:03 AM, Gael Varoquaux notifications@github.comwrote:

Even if you can't provide a better suggestion, can you at least say
why you disagree, it opens the door to a mess, and we shouldn't do
it, so that your reasoning might point the way towards something more
constructive? I've already explained why your previous objections
AFAICT don't seem to apply when you look at the details of the
situation.

Well, I don't agree with your explaination. My experience with software
tell me that have a global state that switch a core behavior of a library
is a very bad idea. Someone is bound to mess with it at some point, and
there will be crazy bug reports flying here and there.

I don't have a better suggestion. I might say: suck it up and induce at
some point in the development of numpy a forced change in behavior. Don't
have the two coincide, or if you do, make it a compilation flag of numpy,
as a temporary solution to help people porting their code. In some sens
this is more violent, but it will create errors that are easier to trace.

Ideally, the change of behavior should happen in numpy 2.0, and the
compilation flag should be around before, but only to enable easier
forward porting.

I understand that you want to clean the mess, but the mess is mostly in
code that you do not control (code using numpy, and not code in numpy).
So your best lever here is communication, social engineering, and tools
in numpy to help track problems.

โ€”
Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/pull/2735#issuecomment-13035417.

@GaelVaroquaux
Contributor

Okay, it sounds like we actually agree much more than we disagree :-). As I
understand it, you like the basic plan described above, except that:

  1. You don't think we should provide a temporary opt-out switch after we
    make the change; we should just change the default and remove the
    compatibility code at the same time (collapsing my last two stages).
  2. You agree about shipping both behaviours from the same source code for a
    while, just you think that it should be more difficult for end-users to
    choose which version of the library they are using (compiling twice in
    different virtualenvs with different switches, versus setting an envvar).

Yes, absolutely. I do think that your overall plan is good. I was just
reacting on a minor detail.

@seberg
Member
seberg commented Feb 3, 2013

Well, this would be very simple... Though I do not know how to do such flagging (I guess a macro, but then the environment flag, etc...). In any case, it would be basically just this code with the flags setting changes staying in optionally (the other changes are just things that would get unnecessary but doesn't hurt either).

In any case, I somewhat think it is a decent option (hoping that with cython changing its logic there is not much out there that gets broken), but I really have no idea about how much headache it might cause out there, so I don't have a real opinion...

@nouiz
Contributor
nouiz commented Feb 4, 2013

I like the idea of an env variable to help people debug there code. Don't forget, few people compile numpy and compiling it with optimized blas isn't trivial. So forcing people to do it is wrong I think.

But I agree that we probably need to swap the behavior in NumPy 2.0. @GaelVaroquaux, do you also object of having an env variable in the 1.* series to enable the new behavior to allow easy testing of libraries with the new behavior? Also, I don't think people will try to mess with that as you told. They don't have any insentive to enable the new behavior in release version, just when they debug/develop.

@seberg
Member
seberg commented Feb 4, 2013

Ok, just so that I know where we stand here. First there is an agreement that it is worth to make this change in the long term?
Second we should create an environment variable to help debugging this. So that would mean numpy will revert to old behaviour (unless the variable is set, at which point it would even actively give weird strides for many cases so that errors get triggered with a higher probability during testing).

Numpy may then start defaulting to it during beta test phase in a while to hopefully make devs aware if they may have problems. That would certainly be only after the cython issues get fixed though, as before that it would trigger too often without the actual developer having any influence.
Then, with 2.0. probably, it would be possible to do the actual change.

Both compile time or runtime variable seem fine to me, but while I can code the rest, I would need a hint on how to implement either one to get that variable or whatever.

@charris
Member
charris commented Mar 2, 2013

@seberg Can this be closed now?

@seberg
Member
seberg commented Mar 2, 2013

I wish. No this is still a problem and must be fixed in this or some other form before releasing 1.8...

@njsmith
Member
njsmith commented Mar 2, 2013

@seberg: no-one seems to have objected to that summary of things, so yes, I guess we have agreement!

@njsmith
Member
njsmith commented Mar 6, 2013

@seberg: Now that we're gearing up for 1.8, what's the status on this? Do you think you'll have time to implement the switch logic discussed above soon?

@seberg
Member
seberg commented Mar 6, 2013

@njsmith, I think I can get it ready. Was it a compile time flag? Basically need to figure out where to add that flag/global variable, after that it is just mixing a bit of this PR with the other... shouldn't be all that huge amount of work.

@seberg
Member
seberg commented Mar 21, 2013

Closing this. For anyone who is interested, gh-3162 is the new pull request, which supersedes this. It is basically this pull request but introducing a flag to keep (plus additional debug help) the current master behaviour.

@seberg seberg closed this Mar 21, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment