Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug in which totals doesn't work for column-indexed tables #79

Merged

Conversation

x8lucas8x
Copy link
Contributor

@x8lucas8x x8lucas8x commented Dec 12, 2016

In a nutshell, this fixes fixes #49. While doing that I tried to make the design a bit simpler in terms of handling totals. For such, I got rid of most hardcoded Totals strings I found and passed a display label for Totals as part of display_options, so that the Total implementation have less corner cases and look more like an ordinary metric. That will also make future refactoring easier. Besides, the NaN in columns with rollups are being replaced with Totals.key right at the source (i.e. query_data), even before setting the indexes. Some tests were also fixed, especially test_rollup_cont_cat_cat_dim_multi_metric() in test_datatables, which had all values for Totals set to None and therefore was essentially wrong.

@coveralls
Copy link

coveralls commented Dec 12, 2016

Coverage Status

Coverage decreased (-0.05%) to 97.505% when pulling 44d5622 on x8lucas8x:fix-totals-for-column-indexed-tables into 96f1493 on kayak:master.


querystring = str(query)
logger.info("Executing query:\n----START----\n{query}\n-----END-----".format(query=querystring))

dataframe = database.fetch_dataframe(querystring)

for dimension_key in rollup:
dataframe[dimension_key].replace([np.nan], [Totals.key], inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because there could be real null values in the query. I'm not sure off the top of my head how it works with rollup but I think they might get merged together with the totals. This fixes one problem with null values be labeled as totals when they're not, though.

Could you maybe look into using totals on a dimension with null values in the database? The currently solution is to expect the user to use Coaesce on the dimension, but would be nice if we could make this automatic, but that isn't ideal.

Copy link
Contributor Author

@x8lucas8x x8lucas8x Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twheys Rollup returns totals as none/NaN, so any null value will get merged as you mentioned. Apparently there is no way to address that in the server. One workaround in python would be to indirectly calculate the amount for null entries by creating columns which are equal to the max value, which will definitely be the total, minus all the other ids/categories. Either right before or after that replace above. By the way, It's worth mentioning that the previous implementation also had the same problem. It just tried to set the totals label in a different place while maintaining the keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's not a new issue, just bringing it up

for level in dataframe.index.levels[1]:
metric_data = dict(self._recurse_dimensions(dataframe[:, level], dimensions[1:], metrics))

if not metric_data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this check is necessary here, wouldn't it be necessary in the above loop on line 283?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twheys Good catch. I fixed that.

if value and not (isinstance(value, (float, int)) and np.isnan(value))
else 'Totals'
for value in dataframe.index]
return [display_options.get(value, value) for value in dataframe.index]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much cleaner, thanks

@@ -31,7 +33,7 @@ def rollup(dataframe, levels):
'd': 'D',
'y': 'Y',
'z': 'Z',
np.nan: 'Total',
'_total': 'Total',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please use the constant for this here and everywhere else in the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twheys Done.

@x8lucas8x x8lucas8x force-pushed the fix-totals-for-column-indexed-tables branch from 44d5622 to b76e143 Compare December 12, 2016 15:21
@coveralls
Copy link

coveralls commented Dec 12, 2016

Coverage Status

Coverage decreased (-0.04%) to 97.511% when pulling b76e143 on x8lucas8x:fix-totals-for-column-indexed-tables into 96f1493 on kayak:master.

@x8lucas8x x8lucas8x force-pushed the fix-totals-for-column-indexed-tables branch from b76e143 to 909e8c4 Compare December 12, 2016 15:28
@coveralls
Copy link

coveralls commented Dec 12, 2016

Coverage Status

Coverage decreased (-0.04%) to 97.511% when pulling 909e8c4 on x8lucas8x:fix-totals-for-column-indexed-tables into 96f1493 on kayak:master.

for key, display in zip(*dataframe.index.levels[1:3])]
levels = zip(*dataframe.index.levels[1:3])
format_key = lambda level: level[0]
generate_dataframe = lambda level: dataframe[:, level[0], level[1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be called something like slice_metric_data ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twheys Done.

generate_dataframe = lambda level: dataframe[:, level[0], level[1]]
else:
levels = dataframe.index.levels[1]
format_key = lambda level: str(level)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises a codacy issue. You could just set format_key = str to avoid the lambda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twheys Done.

@x8lucas8x x8lucas8x force-pushed the fix-totals-for-column-indexed-tables branch from 909e8c4 to 8465272 Compare December 14, 2016 13:10
@coveralls
Copy link

coveralls commented Dec 14, 2016

Coverage Status

Coverage decreased (-0.04%) to 97.52% when pulling 8465272 on x8lucas8x:fix-totals-for-column-indexed-tables into 2c3f528 on kayak:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Totals doesn't work for column-indexed tables
4 participants