Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pathlib.Path. Fix #170 #175

Merged
merged 9 commits into from
Mar 22, 2018

Conversation

clintval
Copy link
Contributor

This closes #170. First time contributing so please let me know if this needs fixing! Thanks!

@@ -163,6 +164,14 @@ def smart_open(uri, mode="rb", **kw):
if not isinstance(mode, six.string_types):
raise TypeError('mode should be a string')

# Support opening ``pathlib.Path`` objects by casting them to strings.
try:
from pathlib import Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, pathlib available only for python>=3.4, that's sad

Copy link
Contributor Author

@clintval clintval Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking for python>=3.4 and assuming pathlib support, I opted for the try-except pattern since there is a backport of pathlib that users may wish to use for python<3.4:

https://pypi.python.org/pypi/pathlib2/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fpath = Path(os.path.join(CURR_DIR, 'test_data/cp852.tsv.txt'))
with smart_open.smart_open(fpath, encoding='cp852') as fin:
fin.read()
except ImportError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bad practice to silently pass the test, better to add a condition about python version for test https://docs.python.org/2/library/unittest.html#skipping-tests-and-expected-failures

@clintval clintval force-pushed the cv_issue_170 branch 5 times, most recently from b295f04 to dacbe67 Compare March 15, 2018 18:53
@clintval
Copy link
Contributor Author

clintval commented Mar 15, 2018

Hi @menshikh-iv ready for another review.

Interestingly pathlib.Path was introduced in Python 3.4 but support for opening Path objects using the builtin open() was not introduced until Python 3.6. The test now supports these cases.

from pathlib import Path
fpath = Path(os.path.join(CURR_DIR, 'test_data/cp852.tsv.txt'))
with smart_open.smart_open(fpath, encoding='cp852') as fin:
fin.read()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to test for equality with the expected file content here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mpenkov this commit is outdated. There is a more recent commit addressing review from @menshikh-iv.

@mpenkov mpenkov dismissed their stale review March 16, 2018 03:02

Reviewed outdated commit. Newer commit addresses problem.

@@ -191,6 +191,24 @@ def test_open_with_keywords_explicit_r(self):
actual = fin.read()
self.assertEqual(expected, actual)

@unittest.skipIf(
sys.version_info < (3, 4),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to check that pathlib can be imported or not (not python version, because I can have installed pathlib from backport.

@menshikh-iv
Copy link
Contributor

@mpenkov suspicious dependency conflict for 3.3 in CI https://travis-ci.org/RaRe-Technologies/smart_open/jobs/353985032, any ideas?

@clintval please merge the latest master into PR.

@clintval clintval force-pushed the cv_issue_170 branch 3 times, most recently from d0df9b1 to 9c57e82 Compare March 16, 2018 17:42
@clintval
Copy link
Contributor Author

clintval commented Mar 16, 2018

Ok @mpenkov and @menshikh-iv, I now skip the unit test if the module pathlib cannot be found or imported.

I rebased onto origin/master too.

Did you want me to add pathlib2 to test_require to test this library against the backport as well?

@@ -191,6 +192,24 @@ def test_open_with_keywords_explicit_r(self):
actual = fin.read()
self.assertEqual(expected, actual)

@unittest.skipIf(
pkgutil.find_loader('pathlib') is None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, I didn't know about pkgutil, @clintval 👍

@menshikh-iv
Copy link
Contributor

@clintval yes, add it (and change skipping condition, because of pathlib and pathlib2 (2 different names, we need only one (any))

@menshikh-iv
Copy link
Contributor

btw, really suspicious fail on Travis for 3.3, I'll investigate it later.

@clintval clintval force-pushed the cv_issue_170 branch 2 times, most recently from 5cbc331 to d9f60b5 Compare March 17, 2018 22:15
@clintval
Copy link
Contributor Author

clintval commented Mar 17, 2018

@menshikh-iv, opening pathlib.Path and pathlib2.Path instances is now supported!

The tests reflect this support.

See these test logs for how this PR acts without pathlib2 installed:
https://travis-ci.org/RaRe-Technologies/smart_open/builds/354849770

And these more recent logs for how this PR acts with pathlib2 installed:
https://travis-ci.org/RaRe-Technologies/smart_open/builds/354850402

The only unsupported condition is if you have both pathlib and pathlib2 backport installed, smart_open will only open paths from the builtin library and not from the backport. This is because:

>>> isinstance(pathlib.Path(), pathlib2.Path)
False

If you want concurrent pathlib and pathlib2 backport support let me know and I can modify the conditions (although this sounds like an awfully contrived scenario).

@menshikh-iv
Copy link
Contributor

@clintval I'm +1 for using built-in pathlib even pathlib2 installed.

@menshikh-iv
Copy link
Contributor

@clintval I resolved CI problem, now all fine
LGTM for me, wdyt @mpenkov?

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the commits. They look good, but I suggest a few minor changes to remove duplication of effort and simplify expression.

# Unit test will skip if either module is unavailable so it's safe
# to assume we can import _at least_ one working ``pathlib``.
except ImportError:
pass

# builtin open() supports pathlib.Path in python>=3.6 only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to bother with this is in the tests. From the point of view of the test, it's an unnecessary detail. In this particular case, we just want to read the expected data in the quickest and simplest possible way. So in this case, I think it's better to just do:

with open(fpath) as fin:
    expected = fin.read().decode('cp852')

@@ -193,16 +194,28 @@ def test_open_with_keywords_explicit_r(self):
self.assertEqual(expected, actual)

@unittest.skipIf(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simpler way of writing this is:

@unittest.skipIf(smart_open_lib.PATHLIB_SUPPORT is False, "your reason here")

"""If ``pathlib.Path`` is available we should be able to open and read."""
fpath = os.path.join(CURR_DIR, 'test_data/cp852.tsv.txt')

# Import ``pathlib`` if the builtin ``pathlib`` or the backport
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're duplicating unnecessary work here. We've already gone through the pain of importing the necessary pathlib module in smart_open_lib. So, we can just get rid of all this import stuff, and use:

from smart_open.smart_open_lib import pathlib
path = pathlib.Path('/foo/bar')

whenever you need access to the actual pathlib module.

…s==39.0.0`. Fix piskvorky#176 (piskvorky#178)

* fix author/maintainer fields

* attemp to resolve problem with botocore

* try other workaround for botocore

* next attemp
@clintval clintval force-pushed the cv_issue_170 branch 4 times, most recently from 7483fb9 to 452b194 Compare March 21, 2018 02:06
@clintval
Copy link
Contributor Author

clintval commented Mar 21, 2018

Thanks @mpenkov! Great review.

I completely agree and pushed another commit slimming the test down.

Here are the Path instances we support opening:

Python Version pathlib2.Path pathlib.Path
<3.4
≥3.4

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my previous comment wasn't clear enough, so let me clarify here.

actual = fin.read()
self.assertEqual(expected, actual)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to check that the contents are equal. We just don't want to read the expected file via open/pathlib, because that won't work for all Python versions, as you've correctly pointed out.

So the correct thing to do here is (untested pseudo-code):

expected = open(fpath).read().decode('cp852')
self.assertEqual(expected, actual)

You can use a with statement if you like, I guess that's more Pythonic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that back in, I misunderstood your previous comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpenkov thanks for your patience. I amended the commit.

@mpenkov
Copy link
Collaborator

mpenkov commented Mar 22, 2018

@menshikh-iv LGTM, please merge when you're ready.

@menshikh-iv menshikh-iv changed the title Support opening pathlib.Path Add support for pathlib.Path. Fix #170 Mar 22, 2018
@menshikh-iv menshikh-iv merged commit 0033cb9 into piskvorky:master Mar 22, 2018
@menshikh-iv
Copy link
Contributor

Thanks @clintval 👍, good work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for opening pathlib.Path
3 participants