Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up profiling first pass #559

Merged
merged 4 commits into from
Aug 1, 2019
Merged

Speed up profiling first pass #559

merged 4 commits into from
Aug 1, 2019

Conversation

Aylr
Copy link
Contributor

@Aylr Aylr commented Jul 26, 2019

  • disable and enable evaluation to speed up profiling
  • print out columns so users see something happening rather than waiting\

This sped up a few tests by 30-40%.

Results for a 10000 row csv wtih 80 columns

before after
14 21
17 24
15 27

@coveralls
Copy link

coveralls commented Jul 26, 2019

Coverage Status

Coverage increased (+0.05%) to 76.808% when pulling 6b4d7f4 on Aylr:aylr/profiling-progress-bar into 06a7966 on great-expectations:develop.

Aylr added 2 commits July 30, 2019 12:23
* print out columns so users see something happening rather than waiting
Copy link
Member

@jcampbell jcampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic, and I also observe a significant speedup. Will definitely be able to make it into this release; requested a minor change to preserve py2 compatibility and be consistent with the overall approach to output.

for column in df.get_table_columns():
columns = df.get_table_columns()
number_of_columns = len(columns)
for i, column in enumerate(columns):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding column-level info here really increases the verbosity of the overall output, but I do like having the names of the columns present.

Outside the CLI module, we route all such messages through the logger. So two changes requested here: change the print statement to a log output, and ensure you don't use f-string style formatting to maintain py2 compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely increases the verbosity which is IMO far batter than indeterminate silence. Eventually I'd like to see a progress bar so this is a small step in that direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed both recommendations!

@jcampbell jcampbell merged commit 8dd3f9e into great-expectations:develop Aug 1, 2019
@Aylr Aylr deleted the aylr/profiling-progress-bar branch August 1, 2019 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants