-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up profiling first pass #559
Speed up profiling first pass #559
Conversation
* print out columns so users see something happening rather than waiting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic, and I also observe a significant speedup. Will definitely be able to make it into this release; requested a minor change to preserve py2 compatibility and be consistent with the overall approach to output.
for column in df.get_table_columns(): | ||
columns = df.get_table_columns() | ||
number_of_columns = len(columns) | ||
for i, column in enumerate(columns): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding column-level info here really increases the verbosity of the overall output, but I do like having the names of the columns present.
Outside the CLI module, we route all such messages through the logger. So two changes requested here: change the print statement to a log output, and ensure you don't use f-string style formatting to maintain py2 compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely increases the verbosity which is IMO far batter than indeterminate silence. Eventually I'd like to see a progress bar so this is a small step in that direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed both recommendations!
* proper logger for output
This sped up a few tests by 30-40%.
Results for a 10000 row csv wtih 80 columns