Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn when creating driver that points to an empty directory #77

Closed
wants to merge 6 commits into from

Conversation

pzdkn
Copy link
Contributor

@pzdkn pzdkn commented Aug 4, 2022

Description

A warning is shown when we iterate over a driver, that points to an empty or non-existent directory.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring including code style reformatting
  • Other (please describe):

Checklist:

  • I have read the contributing guideline doc (external contributors only)
  • Lint and unit tests pass locally with my changes
  • I have kept the PR small so that it can be easily reviewed
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • All dependency changes have been reflected in the pip requirement files.

squirrel/fsspec/fs.py Outdated Show resolved Hide resolved
squirrel/driver/store_driver.py Outdated Show resolved Hide resolved
squirrel/fsspec/fs.py Outdated Show resolved Hide resolved
test/test_driver/test_msgpack.py Show resolved Hide resolved
squirrel/fsspec/fs.py Outdated Show resolved Hide resolved
@AlpAribal
Copy link
Contributor

Shall we base this branch onto #72, there are some common changes in both, it might be safer to see how the changes combined work together.

@AlirezaSohofi
Copy link
Contributor

Shall we base this branch onto #72, there are some common changes in both, it might be safer to see how the changes combined work together.

agree, can you please rebase this branch please @pzdkn?

@pzdkn pzdkn force-pushed the peng-warn-about-empty-url branch from 8dca81c to dcc60a5 Compare August 9, 2022 17:38
@pzdkn
Copy link
Contributor Author

pzdkn commented Aug 10, 2022

I have rebased and tests for both branches pass. So we can merge #72 first, and then take care of this one @AlirezaSohofi @AlpAribal.

@pzdkn pzdkn changed the title Throw error when creating driver that points to an empty directory Warn when creating driver that points to an empty directory Aug 10, 2022
@pzdkn pzdkn changed the base branch from main to peng-store-create-path-when-non-existent August 12, 2022 09:44
Base automatically changed from peng-store-create-path-when-non-existent to main August 12, 2022 09:53
test store creation with empty path and non-empty path

create directory when inserting shards

Update squirrel/store/squirrel_store.py

Co-authored-by: Alireza Sohofi <a.sohofi@gmail.com>

Update test/test_driver/test_sq_store.py

Co-authored-by: Alp Arıbal <AlpAribal@users.noreply.github.com>

move dir-exist check out of set operation

add set operation in test; adapt dir exist flag, because dir doesnt exist after clean

remove created_dir_if_not_exist function

bump minor version

throw error when instantiating a driver that points to an empty or invalid url

refactor warning into FilePathGenerator
Comment on lines 100 to 101
if not self._returned_url:
self._returned_url = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the same check for the nested case as well. I think, right now, we will get an empty warning for a dir containing only dirs but no files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to cover the nested case though? We are not reading in a nested way in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FilePathGenerator is not only menat to be used in SquirrelStore, so it should be general and and able to handle nested cases too.

Comment on lines 25 to 27
with warnings.catch_warnings():
warnings.simplefilter("ignore")
_ = MessagepackDriver(url=tmp_dir).get_iter().collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we catch/filter warnings here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to check that when there are items in the store, then no warnings shall be printed.
This .simplefilter asserts that no warning is shown:

https://stackoverflow.com/questions/45671803/how-to-use-pytest-to-assert-no-warning-is-raised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is in fact unintuitive, since a directory containing empty directories will no yield warnings

Comment on lines 37 to 38
with warnings.catch_warnings():
warnings.simplefilter("ignore")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we catch/filter warnings here?

@@ -10,6 +11,34 @@
from squirrel.store import SquirrelStore


def test_invalid_url(local_msgpack_url: URL) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rename this test, it does not only use invalid urls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add nested cases here, perhaps via parametrize

yield f"{self.protocol}{url}"
if not self._returned_url:
self._returned_url = True
yield f"{self.protocol}{url}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yield f"{self.protocol}{url}"
yield f"{self.protocol}{url}"

Comment on lines +102 to +103
if not self._returned_file_url and not self.fs.isdir(path=url):
self._returned_file_url = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not self._returned_file_url and not self.fs.isdir(path=url):
self._returned_file_url = True
if not self._returned_file_url:
self._returned_file_url = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need not self.fs.isdir(path=url)?

@pzdkn pzdkn closed this Sep 7, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Sep 7, 2022
@pzdkn pzdkn deleted the peng-warn-about-empty-url branch September 7, 2022 16:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants