Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/CLN: pandas/io/parsers.py #36852

Merged
merged 13 commits into from
Oct 14, 2020
Merged

Conversation

ivanovmg
Copy link
Member

@ivanovmg ivanovmg commented Oct 4, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Refactor/cleanup pandas/io/parsers.py

  • Extract method _refresh_kwargs_based_on_dialect
  • Extract method _validate_skipfooter
  • Drop local variable engine_specified
  • Clean-up FutureWarning issue

@jbrockmendel
Copy link
Member

big picture: whats the goal?

@ivanovmg
Copy link
Member Author

ivanovmg commented Oct 7, 2020

big picture: whats the goal?

Extract some methods to make any further refactoring easier.

if result.get(arg, depr_default) != depr_default:
depr_warning += msg + "\n\n"
msg = (
f"The {repr(arg)} argument has been deprecated and will be "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt repr here redundant?

@jbrockmendel
Copy link
Member

Extract some methods to make any further refactoring easier.

Is there further refactoring you have in mind?

@@ -951,6 +903,61 @@ def __init__(self, f, engine=None, **kwds):
def close(self):
self._engine.close()

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use staticmethods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to module level

kwds[param] = dialect_val
return kwds

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

return kwds

@staticmethod
def _validate_skipfooter(kwds):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -951,6 +903,61 @@ def __init__(self, f, engine=None, **kwds):
def close(self):
self._engine.close()

@staticmethod
def _refresh_kwargs_based_on_dialect(kwds, dialect):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type and add a doc-string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self._engine_specified = kwds.get("engine_specified", engine_specified)

if kwds.get("dialect") is not None:
dialect = kwds["dialect"]
if dialect in csv.list_dialects():
dialect = csv.get_dialect(dialect)
kwds = self._refresh_kwargs_based_on_dialect(kwds, dialect)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge master as some of this updated; and rename this like check_defaults_dialect (or similar)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
I renamed the function to _merge_with_dialect_properties as it actually merges kwargs with those provided by dialect (and resolve some conflicts).

@ivanovmg
Copy link
Member Author

ivanovmg commented Oct 8, 2020

Extract some methods to make any further refactoring easier.

Is there further refactoring you have in mind?

I am still trying to figure out what the code does. By making small PRs on refactoring/cleanup I get to understand it better. Probably later I will have some suggestion on the refactoring.

@jreback jreback added IO CSV read_csv, to_csv Refactor Internal refactoring of code labels Oct 10, 2020
@@ -3791,3 +3735,93 @@ def _check_defaults_read(
kwds["engine_specified"] = False

return kwds


def _merge_with_dialect_properties(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse these (always put kwds things last).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename kwds -> defaults

so similar to the _check_defaults_read

also not averse to renaming _check_defaults_read to something better.

Copy link
Member Author

@ivanovmg ivanovmg Oct 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done it.
Also, I found the issue with the ValueError message, when dialect is incorrect.
Originally, kwds['dialect'], but I decided to use dialect.__name__ for better readability, which is not accepted by mypy (throws error "Dialect" has no attribute "__name__" [attr-defined]).
So, now I use string representation of dialect instance, however I can still pass one extra parameter (dialect_name) to _merge_with_dialect_properties and use it in the error message.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. cc @gfyoung if any comments, pls merge when ready.

@jreback jreback added this to the 1.2 milestone Oct 14, 2020
@jreback
Copy link
Contributor

jreback commented Oct 14, 2020

cc @twoertwein if any comments.

@twoertwein
Copy link
Member

LGTM, outsourcing the dialect validation makes the constructor of TextFileReader easier to read.

@gfyoung gfyoung merged commit 66f3917 into pandas-dev:master Oct 14, 2020
@gfyoung
Copy link
Member

gfyoung commented Oct 14, 2020

Thanks @ivanovmg!

gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 14, 2020
gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 14, 2020
jreback pushed a commit that referenced this pull request Oct 15, 2020
@ivanovmg ivanovmg deleted the refactor/parsers branch October 22, 2020 07:31
@ivanovmg ivanovmg mentioned this pull request Oct 22, 2020
5 tasks
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants