MAINT: Remove python <2.7,<3.3 string/unicode workarounds #8832

eric-wieser · 2017-03-25T10:14:18Z

This transforms strings to use the u and b prefixes for unicode and bytes:

asbytes('hello') → b'hello'
asbytes_nested(['a', 'b']) → [b'a', b'b']
asunicode('hello') → u'hello'
unicode('hello') → u'hello'
sixu('hello') → u'hello'

This is fine because b is supported on 2.7 and 3.x, and u is supported on 2.x and 3.3+. Our minimum versions are now 2.7 and 3.4?

What we can't do is transform asbytes("tests %d" % num), because %-formatting
fails on bytes in python 3.x < 3.5.

As a result, ~~asunicode and~~ sixu are now not used anywhere.

Since we only need to support python 2, we can remove any case where we just pass a single string literal and use the b prefix instead. What we can't do is transform asbytes("tests %d" % num), because %-formatting fails on bytes in python 3.x < 3.5.

u prefixes are supported in Python 2.7 and 3.3+

juliantaylor · 2017-03-25T13:46:52Z

thanks, nice cleanup
though seeing this used so often just reminds me of the poor state our io functions are in...

eric-wieser · 2017-03-25T14:00:08Z

though seeing this used so often just reminds me of the poor state our io functions are in

Tell me about it. Perhaps now that we don't use these conversion/hack functions much in testing, they can give a deprecation warning on python 3?

eric-wieser · 2017-03-25T14:00:42Z

Also, is removing functions from numpy.compat ok, or is that public api?

juliantaylor · 2017-03-25T14:12:13Z

technically public api, we could slap deprecation warnings on them but I don't think its worth the effort.

eric-wieser · 2017-04-10T12:49:34Z

Hmmm... This had a big performance hit on bench_core.CountNonzero.time_count_nonzero(1, 1000000, <type 'str'>), which is a little surprising.

Especially since none of the things this touches should have any effect on the behaviour of count_nonzero...

eric-wieser · 2017-04-10T13:02:13Z

Ah, that regression is clearly from 9791f20, and is due to a deliberate benchmark change.

@pv: Am I reading ASV incorrectly, or does it look like there's a bug in attaching commit ids to benchmarks?

pv · 2017-04-10T13:06:28Z

ASV does not try to track if the *benchmark* code itself is changed. If benchmarks are changed, old results need to be invalidated manually. The results repository in question is here: https://github.com/pv/numpy-bench

eric-wieser · 2017-04-10T13:08:33Z

ASV does not try to track if the benchmark code itself is changed

Sure, that explains why the regression is there. But why does it attach the regression to this commit, and not to your commit? It has benchmark data for both, and mine came before yours!

On a related note, does this mean this was a bad merge, and the benchmark should have been renamed to avoid creating these false regressions?

pv · 2017-04-10T13:16:17Z

The benchmark setup runs it once per day, using the latest benchmark suite in the master branch. . The benchmark suite and the numpy code should be thought as independent entities --- that they are stored in the same repository is just a matter of convenience for pull requests. . In particular, the benchmark code used to benchmark each commit is independent of the commit benchmarked (otherwise, it would not be possible to fix mistakes in benchmarks afterward). Hence, the benchmark suite and code commit ids are not comparable in general. . In principle changes in benchmarks could be tracked automatically by looking for source code changes. However, this can result to false positives, so currently you have to do the invalidation manually.

eric-wieser · 2017-04-10T13:24:32Z

Gotcha, that clarifies things

Does this mean that if a pull request introduces a benchmark and a fix, that the benchmark will be tested on the previous commit as well?

Or did that only happen in this case because the two commits were merged on the same day?

pv · 2017-04-10T13:30:12Z

Since it's run only once per day, benchmark additions or modifications apply also to other commits made during the same day --- including also any such commits *preceding* the change in the benchmarks. . If you are making changes to benchmark suite that changes results, either the benchmark name should be changed or the old results be invalidate. . Alternatively, you can send a PR to asv that implements automatic invalidation of results :) I don't see how to do this reliably however, so probably it should be optional.

juliantaylor · 2017-04-10T13:35:15Z

automatic invalidation worked pretty well in the vbench suite. It took the checksum of the benchmark and invalided the results when it changed.

pv · 2017-04-10T13:37:54Z

Probably the 90+% solution indeed is just to take a hash of the inspect.getsourcelines of the benchmark and setup methods.

pv · 2017-04-10T13:39:52Z

... and the hash (or the source lines themselves) can be stored in the result json files, and comparisons be done at publish stage. Sounds simple.

eric-wieser · 2017-04-10T13:42:12Z

Perhaps a simple "version" field in benchmarks would do the job too

eric-wieser added 05 - Testing 03 - Maintenance component: numpy._core component: numpy.lib component: numpy.linalg labels Mar 25, 2017

eric-wieser force-pushed the avoid-as-bytes branch from 954e4e9 to 02d6854 Compare March 25, 2017 10:32

eric-wieser added 2 commits March 25, 2017 10:36

MAINT: Remove asbytes_nested where b prefixes would suffice

0960eed

eric-wieser force-pushed the avoid-as-bytes branch from 02d6854 to 0960eed Compare March 25, 2017 10:36

eric-wieser changed the title ~~MAINT: Remove asbytes where a b prefix would suffice~~ MAINT: Remove python <2.7,<3.3 string/unicode workarounds Mar 25, 2017

eric-wieser added the component: numpy.ma masked arrays label Mar 25, 2017

eric-wieser added 2 commits March 25, 2017 11:54

MAINT: Remove asunicode where a u prefix would suffice

09a21de

u prefixes are supported in Python 2.7 and 3.3+

MAINT: Stop using sixu instead of a u prefix

91548b5

eric-wieser force-pushed the avoid-as-bytes branch 2 times, most recently from c340754 to 6ad2ce7 Compare March 25, 2017 11:56

eric-wieser added 2 commits March 25, 2017 12:52

MAINT: Replace unicode() with u prefix

ac77cf5

MAINT: Use new syntax for exceptions

1dc20cb

eric-wieser force-pushed the avoid-as-bytes branch from 6ad2ce7 to 1dc20cb Compare March 25, 2017 12:54

juliantaylor merged commit ab49be1 into numpy:master Mar 25, 2017

This was referenced Mar 25, 2017

ENH: add a warning to np.save that saved object arrays are not necessarily portable #5641

Closed

ENH: Make dtype iterable #8814

Closed

mwtoews mentioned this pull request Oct 26, 2022

MAINT: remove u-prefix for former Unicode strings #22479

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Remove python <2.7,<3.3 string/unicode workarounds #8832

MAINT: Remove python <2.7,<3.3 string/unicode workarounds #8832

eric-wieser commented Mar 25, 2017 •

edited

juliantaylor commented Mar 25, 2017

eric-wieser commented Mar 25, 2017 •

edited

eric-wieser commented Mar 25, 2017

juliantaylor commented Mar 25, 2017

eric-wieser commented Apr 10, 2017 •

edited

eric-wieser commented Apr 10, 2017 •

edited

pv commented Apr 10, 2017 via email

eric-wieser commented Apr 10, 2017 •

edited

pv commented Apr 10, 2017 via email

eric-wieser commented Apr 10, 2017 •

edited

pv commented Apr 10, 2017 via email

juliantaylor commented Apr 10, 2017

pv commented Apr 10, 2017 via email

pv commented Apr 10, 2017 via email

eric-wieser commented Apr 10, 2017

MAINT: Remove python <2.7,<3.3 string/unicode workarounds #8832

MAINT: Remove python <2.7,<3.3 string/unicode workarounds #8832

Conversation

eric-wieser commented Mar 25, 2017 • edited

juliantaylor commented Mar 25, 2017

eric-wieser commented Mar 25, 2017 • edited

eric-wieser commented Mar 25, 2017

juliantaylor commented Mar 25, 2017

eric-wieser commented Apr 10, 2017 • edited

eric-wieser commented Apr 10, 2017 • edited

pv commented Apr 10, 2017 via email

eric-wieser commented Apr 10, 2017 • edited

pv commented Apr 10, 2017 via email

eric-wieser commented Apr 10, 2017 • edited

pv commented Apr 10, 2017 via email

juliantaylor commented Apr 10, 2017

pv commented Apr 10, 2017 via email

pv commented Apr 10, 2017 via email

eric-wieser commented Apr 10, 2017

eric-wieser commented Mar 25, 2017 •

edited

eric-wieser commented Mar 25, 2017 •

edited

eric-wieser commented Apr 10, 2017 •

edited

eric-wieser commented Apr 10, 2017 •

edited

eric-wieser commented Apr 10, 2017 •

edited

eric-wieser commented Apr 10, 2017 •

edited