Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/CLN: pandas/io/pytables.py #36859

Merged
merged 14 commits into from Oct 6, 2020
Merged

Conversation

ivanovmg
Copy link
Member

@ivanovmg ivanovmg commented Oct 4, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Clean-up and minor refactor of pandas/io/pytables.py.

  • Extract method _identify_group
  • Clean-up some docstrings (spacing)
  • Use suppress instead of try/except/pass
  • Format error message using dedent
  • Remove unnecessary empty lines as they compromise readability of some blocks

@@ -716,11 +712,9 @@ def open(self, mode: str = "a", **kwargs):
"which allows\n"
"files to be opened multiple times at once\n"
)

raise err

except Exception as err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're up for it, narrowing down what we catch here would be really helpful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that, looking at the source code of tables. But it turns out that there was (is) no test for this exception in pandas code. So I will need to figure out how to do it, but it will require some mocking, including package version and stuff.

Copy link
Member Author

@ivanovmg ivanovmg Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account that pytables version >= 3.4.3 in current installation requirements (https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html), then I can probably remove ver >= 3.1 statement.
In fact, the whole message can be updated to a more relevant one.

where : list, default None
List of Term (or convertible) objects, optional.
List of Term (or convertible) objects, optional.
start : int, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -> int or None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here and in the adjacent strings.

@@ -1099,7 +1091,7 @@ def put(
<https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#query-via-data-columns>`__.
encoding : str, default None
Provide an encoding for strings.
dropna : bool, default False, do not write an ALL nan row to
dropna : bool, default False, do not write an ALL nan row to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line after "default False"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that this parameter is not even used in the function. Deleted the corresponding strings.

@@ -1807,6 +1771,45 @@ def _read_group(self, group: "Node"):
s.infer_axes()
return s.read()

def _identify_group(self, key: str, append: bool) -> "Node":
"""Identify HDF5 group based on key, delete/create group if needed."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

""" on separate lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, but is there a local policy for this? I mean according to PEP257 (https://www.python.org/dev/peps/pep-0257/#one-line-docstrings) if it fits in one line, then no need to put triple quotes on multiple lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @datapythonista am i remembering this wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For private docstrings it doesn't make a difference, and both are correct based on Python recommendations. For public docstrings we try to standardize them to what @jbrockmendel says. We shouldn't have single liners anyway, and in our autosummaries we take the first line of the docstring (the short summary), which we want to have of similar sizes to show correctly in the html version. Having it in the line after the """ allows for 3 more characters, so we decided to standardize all public docstrings to that, to be consistent.

for m in masks[1:]:
mask = mask & m
combine_masks = lambda first, second: first & second
mask = reduce(combine_masks, masks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a deal-breaker, but i dont find this clearer than what we have now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I supposed it may be a questionable change. I will revert this.

@@ -5032,14 +5015,12 @@ def _maybe_adjust_name(name: str, version) -> str:
-------
str
"""
try:
with suppress(IndexError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do appropriate checks for this and avoid the exception altogether

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that the version number is a correct sequence of three elements. I also added test as this check never hit.

Comment on lines -724 to -728
# trying to read from a non-existent file causes an error which
# is not part of IOError, make it one
if self._mode == "r" and "Unable to open/create file" in str(err):
raise OSError(str(err)) from err
raise
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this error message is not raised in the current version of tables.file. Not even 3.1. So, I deleted this error raise.

Comment on lines -694 to -701
try:
self._handle = tables.open_file(self._path, self._mode, **kwargs)
except OSError as err: # pragma: no cover
if "can not be written" in str(err):
print(f"Opening {self._path} in read-only mode")
self._handle = tables.open_file(self._path, "r", **kwargs)
else:
raise
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the current code of tables.open_file and did not find error messages containing can not be written. Even in 3.1 there is no such message. Considering the current minimal pytables version (>= 3.4.3), this error handing seems to be irrelevant (if "can not be written" in str(err)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this could have changed, was hard to test :->

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, just a question.

Comment on lines -694 to -701
try:
self._handle = tables.open_file(self._path, self._mode, **kwargs)
except OSError as err: # pragma: no cover
if "can not be written" in str(err):
print(f"Opening {self._path} in read-only mode")
self._handle = tables.open_file(self._path, "r", **kwargs)
else:
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this could have changed, was hard to test :->


# remove the node if we are not appending
if group is not None and not append:
self._handle.remove_node(group, recursive=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in the method _identify_group below.

@ivanovmg ivanovmg requested a review from jreback October 6, 2020 05:49
@jreback jreback added IO HDF5 read_hdf, HDFStore Refactor Internal refactoring of code labels Oct 6, 2020
@jreback jreback added this to the 1.2 milestone Oct 6, 2020
@jreback jreback merged commit 98af5d7 into pandas-dev:master Oct 6, 2020
@jreback
Copy link
Contributor

jreback commented Oct 6, 2020

thanks @ivanovmg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants