Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[numpy_pick_utils] BinaryZlibFile couldn't handle unicode filenames. #384

Merged
merged 9 commits into from Aug 2, 2016

Conversation

@bloody76
Copy link
Contributor

bloody76 commented Aug 2, 2016

Hi everyone,

It seems that BinaryZlibFile cannot handle unicode filenames in python2. I just added the unicode type to check, but it's not python3 compliant now I guess .. Does it need to be ? Is there a workaround otherwise to make it works for both py2 and py3 ?

@bloody76

This comment has been minimized.

Copy link
Contributor Author

bloody76 commented Aug 2, 2016

Tell me if that fix is enough for you and if you think its maintainable enough.

@@ -299,7 +299,7 @@ def __init__(self, filename, mode="rb", compresslevel=9):
else:
raise ValueError("Invalid mode: %r" % (mode,))

if isinstance(filename, (str, bytes)):
if isinstance(filename, (str, bytes, (sys.version_info < (3,) and unicode) or str)):

This comment has been minimized.

Copy link
@lesteve

lesteve Aug 2, 2016

Member

Using isinstance(filename, _compat._bytes_or_unicode) is the right way to go about it I reckon.

It would be good to add a regression test in joblib/test/test_numpy_pickle_utils.py to make sure that b'myfilename', 'myfilename' and u'myfilename' can be used in BinaryZlibFile.

This comment has been minimized.

Copy link
@lesteve

lesteve Aug 2, 2016

Member

Hmm, maybe better to do

from ._compat import _basestring

if isinstance(filename, _basestring):

to be more consistent with other places in the code.

This comment has been minimized.

Copy link
@bloody76

bloody76 Aug 2, 2016

Author Contributor

Alright ! I will make a new commit soon

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Aug 2, 2016

Another small thing, it is considered best practice to create a feature branch when you do a PR i.e. not to use master. Try to remember this for your next PR.

@bloody76

This comment has been minimized.

Copy link
Contributor Author

bloody76 commented Aug 2, 2016

Yeah sorry I am not used to fork/PR, thanks for remembering me.

In the end I used _bytes_or_unicode because the _basic_string type doesn't seem to include the bytes type. I have put the tests in a separate commit also.

Is there anything else ? Do you want me to squash the commits correctly ? (one for code and one for tests?)

def teardown_module():
"""Test teardown."""
shutil.rmtree(env['dir'])

This comment has been minimized.

Copy link
@bloody76

bloody76 Aug 2, 2016

Author Contributor

I wil add another newline here when I will squash my commits.

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Aug 2, 2016

Don't worry about squashing your commits too much, we tend to do it via the GitHub web interface when we merge the PR.

You need Travis to be green, you still have some flake8 violations, look at https://travis-ci.org/joblib/joblib/jobs/149166066 for more details. You can also run: ./continuous_integration/flake8_diff.sh locally until you get rid of all the flake8 violations.

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Aug 2, 2016

In the end I used _bytes_or_unicode because the _basic_string type doesn't seem to include the bytes type.

Please stick with _basestring to be consistent with the rest of the codebase. We can think about use _bytes_or_unicode in a separate PR. To be honest, I don't think it is very common to pass b'some_filename' in Python3 for filenames, but open does support it.

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Aug 2, 2016

Also if you could add an entry in CHANGES.rst that would be great.

@bloody76

This comment has been minimized.

Copy link
Contributor Author

bloody76 commented Aug 2, 2016

Oh ok I won't squash then. Ok I thought it would be better consistency to use _bytes_or_unicodes, but I will use _basic_string then.

For the b'filename', do I keep it ? I only put it to check that the three possibilities are working.

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Aug 2, 2016

For the b'filename', do I keep it ? I only put it to check that the three possibilities are working.

Remove it since it is not going to work with Python 3 I reckon. We can reintegrate it later when we revisit whether we should not use _bytes_or_unicode everywhere.

…ryZlibFile constructor.
@bloody76

This comment has been minimized.

Copy link
Contributor Author

bloody76 commented Aug 2, 2016

Ok so I remove it from the tests and code.

CHANGES.rst Outdated
Vincent Latrouite

FIX a bug in the constructor of BinaryZlibFile that would throw an
exception when passing unicode filename.

This comment has been minimized.

Copy link
@lesteve

lesteve Aug 2, 2016

Member

Add "(Python 2 only)" or something like this at the end of the sentence.

@lesteve lesteve merged commit faeaab3 into joblib:master Aug 2, 2016
3 checks passed
3 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.03%) to 89.791%
Details
@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Aug 2, 2016

LGTM, merging

@lesteve

This comment has been minimized.

Copy link
Member

lesteve commented Aug 2, 2016

Thanks a lot! Feel free to have a look at using _bytes_or_unicode vs _basestring in a consistent manner across the source tree if you have some spare time!

@bloody76

This comment has been minimized.

Copy link
Contributor Author

bloody76 commented Aug 2, 2016

No problem ! Yeah I will look at the differences and how it's implemented as soon as I can :)

yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Sep 30, 2016
* tag '0.10.2': (55 commits)
  Bump up __version__
  Update release to 0.10.2 in CHANGES.rst
  nosetests should run tests only from the joblib folder.
  API expose joblib.parallel.ParallelBackendBase and joblib.parallel.AutoBatchingMixin
  Update to numpydoc 0.6
  [numpy_pick_utils] Handle unicode filenames in BinaryZlibFile (joblib#384)
  Fix format_stack with compiled code (e.g. .so or .pyd)
  PEP8: cosmit fix (joblib#376)
  FIX typo
  Release 0.10.0
  FIX: __all__ should hold symbol names as strings
  Fix bench_auto_batching.py
  [MRG] Persistence in/from file objects (joblib#351)
  Minor tweaks in auto batching benchmark script
  Improve flake8_diff.sh (joblib#371)
  FIX numpy array persistence with pickle HIGHEST_PROTOCOL (joblib#370)
  DOC: remove easy_install from joblib installation documentation (joblib#363)
  MAINT fix typo
  DOC Add documentation of mmap_mode
  Explicit handling of job cancellation on first collected exception (joblib#361)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.