-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed a bug in which totals doesn't work for column-indexed tables #79
Fixed a bug in which totals doesn't work for column-indexed tables #79
Conversation
|
||
querystring = str(query) | ||
logger.info("Executing query:\n----START----\n{query}\n-----END-----".format(query=querystring)) | ||
|
||
dataframe = database.fetch_dataframe(querystring) | ||
|
||
for dimension_key in rollup: | ||
dataframe[dimension_key].replace([np.nan], [Totals.key], inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky because there could be real null values in the query. I'm not sure off the top of my head how it works with rollup but I think they might get merged together with the totals. This fixes one problem with null values be labeled as totals when they're not, though.
Could you maybe look into using totals on a dimension with null values in the database? The currently solution is to expect the user to use Coaesce on the dimension, but would be nice if we could make this automatic, but that isn't ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twheys Rollup returns totals as none/NaN, so any null value will get merged as you mentioned. Apparently there is no way to address that in the server. One workaround in python would be to indirectly calculate the amount for null entries by creating columns which are equal to the max value, which will definitely be the total, minus all the other ids/categories. Either right before or after that replace above. By the way, It's worth mentioning that the previous implementation also had the same problem. It just tried to set the totals label in a different place while maintaining the keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's not a new issue, just bringing it up
for level in dataframe.index.levels[1]: | ||
metric_data = dict(self._recurse_dimensions(dataframe[:, level], dimensions[1:], metrics)) | ||
|
||
if not metric_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this check is necessary here, wouldn't it be necessary in the above loop on line 283?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twheys Good catch. I fixed that.
if value and not (isinstance(value, (float, int)) and np.isnan(value)) | ||
else 'Totals' | ||
for value in dataframe.index] | ||
return [display_options.get(value, value) for value in dataframe.index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much cleaner, thanks
@@ -31,7 +33,7 @@ def rollup(dataframe, levels): | |||
'd': 'D', | |||
'y': 'Y', | |||
'z': 'Z', | |||
np.nan: 'Total', | |||
'_total': 'Total', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please use the constant for this here and everywhere else in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twheys Done.
44d5622
to
b76e143
Compare
b76e143
to
909e8c4
Compare
for key, display in zip(*dataframe.index.levels[1:3])] | ||
levels = zip(*dataframe.index.levels[1:3]) | ||
format_key = lambda level: level[0] | ||
generate_dataframe = lambda level: dataframe[:, level[0], level[1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be called something like slice_metric_data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twheys Done.
generate_dataframe = lambda level: dataframe[:, level[0], level[1]] | ||
else: | ||
levels = dataframe.index.levels[1] | ||
format_key = lambda level: str(level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises a codacy issue. You could just set format_key = str to avoid the lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twheys Done.
909e8c4
to
8465272
Compare
In a nutshell, this fixes fixes #49. While doing that I tried to make the design a bit simpler in terms of handling totals. For such, I got rid of most hardcoded Totals strings I found and passed a display label for Totals as part of display_options, so that the Total implementation have less corner cases and look more like an ordinary metric. That will also make future refactoring easier. Besides, the NaN in columns with rollups are being replaced with Totals.key right at the source (i.e. query_data), even before setting the indexes. Some tests were also fixed, especially test_rollup_cont_cat_cat_dim_multi_metric() in test_datatables, which had all values for Totals set to None and therefore was essentially wrong.