Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[numpy_pick_utils] BinaryZlibFile couldn't handle unicode filenames. #384

Merged
merged 9 commits into from
Aug 2, 2016

Conversation

bloody76
Copy link

@bloody76 bloody76 commented Aug 2, 2016

Hi everyone,

It seems that BinaryZlibFile cannot handle unicode filenames in python2. I just added the unicode type to check, but it's not python3 compliant now I guess .. Does it need to be ? Is there a workaround otherwise to make it works for both py2 and py3 ?

@bloody76
Copy link
Author

bloody76 commented Aug 2, 2016

Tell me if that fix is enough for you and if you think its maintainable enough.

@@ -299,7 +299,7 @@ def __init__(self, filename, mode="rb", compresslevel=9):
else:
raise ValueError("Invalid mode: %r" % (mode,))

if isinstance(filename, (str, bytes)):
if isinstance(filename, (str, bytes, (sys.version_info < (3,) and unicode) or str)):
Copy link
Member

@lesteve lesteve Aug 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using isinstance(filename, _compat._bytes_or_unicode) is the right way to go about it I reckon.

It would be good to add a regression test in joblib/test/test_numpy_pickle_utils.py to make sure that b'myfilename', 'myfilename' and u'myfilename' can be used in BinaryZlibFile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe better to do

from ._compat import _basestring

if isinstance(filename, _basestring):

to be more consistent with other places in the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright ! I will make a new commit soon

@lesteve
Copy link
Member

lesteve commented Aug 2, 2016

Another small thing, it is considered best practice to create a feature branch when you do a PR i.e. not to use master. Try to remember this for your next PR.

@bloody76
Copy link
Author

bloody76 commented Aug 2, 2016

Yeah sorry I am not used to fork/PR, thanks for remembering me.

In the end I used _bytes_or_unicode because the _basic_string type doesn't seem to include the bytes type. I have put the tests in a separate commit also.

Is there anything else ? Do you want me to squash the commits correctly ? (one for code and one for tests?)

def teardown_module():
"""Test teardown."""
shutil.rmtree(env['dir'])

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wil add another newline here when I will squash my commits.

@lesteve
Copy link
Member

lesteve commented Aug 2, 2016

Don't worry about squashing your commits too much, we tend to do it via the GitHub web interface when we merge the PR.

You need Travis to be green, you still have some flake8 violations, look at https://travis-ci.org/joblib/joblib/jobs/149166066 for more details. You can also run: ./continuous_integration/flake8_diff.sh locally until you get rid of all the flake8 violations.

@lesteve
Copy link
Member

lesteve commented Aug 2, 2016

In the end I used _bytes_or_unicode because the _basic_string type doesn't seem to include the bytes type.

Please stick with _basestring to be consistent with the rest of the codebase. We can think about use _bytes_or_unicode in a separate PR. To be honest, I don't think it is very common to pass b'some_filename' in Python3 for filenames, but open does support it.

@lesteve
Copy link
Member

lesteve commented Aug 2, 2016

Also if you could add an entry in CHANGES.rst that would be great.

@bloody76
Copy link
Author

bloody76 commented Aug 2, 2016

Oh ok I won't squash then. Ok I thought it would be better consistency to use _bytes_or_unicodes, but I will use _basic_string then.

For the b'filename', do I keep it ? I only put it to check that the three possibilities are working.

@lesteve
Copy link
Member

lesteve commented Aug 2, 2016

For the b'filename', do I keep it ? I only put it to check that the three possibilities are working.

Remove it since it is not going to work with Python 3 I reckon. We can reintegrate it later when we revisit whether we should not use _bytes_or_unicode everywhere.

@bloody76
Copy link
Author

bloody76 commented Aug 2, 2016

Ok so I remove it from the tests and code.

Vincent Latrouite

FIX a bug in the constructor of BinaryZlibFile that would throw an
exception when passing unicode filename.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "(Python 2 only)" or something like this at the end of the sentence.

@lesteve lesteve merged commit faeaab3 into joblib:master Aug 2, 2016
@lesteve
Copy link
Member

lesteve commented Aug 2, 2016

LGTM, merging

@lesteve
Copy link
Member

lesteve commented Aug 2, 2016

Thanks a lot! Feel free to have a look at using _bytes_or_unicode vs _basestring in a consistent manner across the source tree if you have some spare time!

@bloody76
Copy link
Author

bloody76 commented Aug 2, 2016

No problem ! Yeah I will look at the differences and how it's implemented as soon as I can :)

yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Sep 30, 2016
* tag '0.10.2': (55 commits)
  Bump up __version__
  Update release to 0.10.2 in CHANGES.rst
  nosetests should run tests only from the joblib folder.
  API expose joblib.parallel.ParallelBackendBase and joblib.parallel.AutoBatchingMixin
  Update to numpydoc 0.6
  [numpy_pick_utils] Handle unicode filenames in BinaryZlibFile (joblib#384)
  Fix format_stack with compiled code (e.g. .so or .pyd)
  PEP8: cosmit fix (joblib#376)
  FIX typo
  Release 0.10.0
  FIX: __all__ should hold symbol names as strings
  Fix bench_auto_batching.py
  [MRG] Persistence in/from file objects (joblib#351)
  Minor tweaks in auto batching benchmark script
  Improve flake8_diff.sh (joblib#371)
  FIX numpy array persistence with pickle HIGHEST_PROTOCOL (joblib#370)
  DOC: remove easy_install from joblib installation documentation (joblib#363)
  MAINT fix typo
  DOC Add documentation of mmap_mode
  Explicit handling of job cancellation on first collected exception (joblib#361)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants