-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve save_csv string formatting #948
Conversation
Using .format can take up to 2x as long as using %. Also add a test covering an additional line of code.
CodeSee Review Map:Review in an interactive map View more CodeSee Maps Legend |
Codecov Report
@@ Coverage Diff @@
## main #948 +/- ##
==========================================
+ Coverage 95.34% 95.36% +0.02%
==========================================
Files 64 64
Lines 9898 9898
==========================================
+ Hits 9437 9439 +2
+ Misses 461 459 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@@ -1004,16 +1004,16 @@ def save_csv( | |||
decimals = 0 | |||
dec_sep = 0 | |||
if sign == 1: | |||
fmt = "{: %dd}" % (pre_point_digits + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format strings are different depending on whether you use str.format() or string interpolation. String interpolation is significantly faster than str.format(). Therefore, this PR switches to string interpolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried f-strings? How do they compete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tried them. Seems a bit convoluted to create them. f-strings are intended to be in code as literals, whereas I dynamically generate the format string depending on desired precision etc. You can give it a try if you like. For me personally, it wasn't quite worth the effort for an output format that is far from ideal in terms of performance anyway. It may turn out that they don't even work in the intended way.
@@ -1033,11 +1033,11 @@ def save_csv( | |||
for i in range(data.lshape[0]): | |||
# if lshape is of the form (x,), then there will only be a single element per row | |||
if len(data.lshape) == 1: | |||
row = fmt.format(data.larray[i]) | |||
row = fmt % (data.larray[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format strings have to be used differently now.
@@ -195,6 +195,21 @@ def test_save_csv(self): | |||
if data.comm.rank == 0: | |||
os.unlink(filename) | |||
|
|||
data = ht.random.randint(0, 100, size=(150,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This captures a special case that had not been covered before.
@@ -1004,16 +1004,16 @@ def save_csv( | |||
decimals = 0 | |||
dec_sep = 0 | |||
if sign == 1: | |||
fmt = "{: %dd}" % (pre_point_digits + 1) | |||
fmt = "%%%-dd" % (pre_point_digits + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt = "%%%-dd" % (pre_point_digits + 1) | |
fmt = f"{pre_point_digits + 1}" |
else: | ||
fmt = "{:%dd}" % (pre_point_digits) | ||
fmt = "%%%dd" % (pre_point_digits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt = "%%%dd" % (pre_point_digits) | |
fmt = f"{pre_point_digits}" |
elif types.issubdtype(data.dtype, types.floating): | ||
if decimals == -1: | ||
decimals = 7 if data.dtype is types.float32 else 15 | ||
if sign == 1: | ||
fmt = "{: %d.%df}" % (pre_point_digits + decimals + 2, decimals) | ||
fmt = "%%%-d.%df" % (pre_point_digits + decimals + 2, decimals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt = "%%%-d.%df" % (pre_point_digits + decimals + 2, decimals) | |
fmt = f"{pre_point_digits + decimals + 2}.{decimals}f" |
else: | ||
fmt = "{:%d.%df}" % (pre_point_digits + decimals + 1, decimals) | ||
fmt = "%%%d.%df" % (pre_point_digits + decimals + 1, decimals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"{pre_point_digits + decimals + 1}.{decimals}f"
else: | ||
fmt = "{:%d.%df}" % (pre_point_digits + decimals + 1, decimals) | ||
fmt = "%%%d.%df" % (pre_point_digits + decimals + 1, decimals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt = "%%%d.%df" % (pre_point_digits + decimals + 1, decimals) | |
fmt = f"{pre_point_digits + decimals + 1}.{decimals}f" |
@@ -1033,11 +1033,11 @@ def save_csv( | |||
for i in range(data.lshape[0]): | |||
# if lshape is of the form (x,), then there will only be a single element per row | |||
if len(data.lshape) == 1: | |||
row = fmt.format(data.larray[i]) | |||
row = fmt % (data.larray[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row = fmt % (data.larray[i]) | |
row = f"{data.larray[i]:{fmt}}" |
else: | ||
if data.lshape[1] == 0: | ||
break | ||
row = sep.join(fmt.format(item) for item in data.larray[i]) | ||
row = sep.join(fmt % (item) for item in data.larray[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row = sep.join(fmt % (item) for item in data.larray[i]) | |
row = sep.join(f"{item:{fmt}}" for item in data.larray[i]) |
run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @bhagemeier !
Using .format can take up to 2x as long as using %.
Also add a test covering an additional line of code.
Description
The title says it all. Improve write performance.
Issue/s resolved: #947
Changes proposed:
Type of change
Memory requirements
Performance
Up to 2x as fast in overall runtime than using str.format(), e.g. cutting down from 139s to 66s for the same data. See performance comparison on #947 .
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
no