Aggregation on annotated field fails if queryset ordered by other values + set default value #5

PJCampi · 2018-04-29T13:59:25Z

Hey Marts,

First of all thanks for the package, it simple but very useful :)

I have a couple of comments/questions:

1. Aggregation on annotated field
I have noticed that you do not reorder the QuerySet before aggregation, which causes problems when aggregating on an annotated field, if it is already ordered by another field.

For example, consider the model:

class Blog(models.Model):
    date = models.DateField()
    category = models.CharField(max_length=50)
    content = models.TextField(default='')
    
    class Meta:
        order_by = ['date']

If I pivot on an annotated field like:

qs = Blog.objects.annotate(period=Trunc('date', 'year'))
pivot(qs, 'period', 'category', 'content', Count)

Aggregation does not work because the 'date' field must be part of the query's group by for the ordering to work.

Reordering fixes the problem, so this will work:

qs = Blog.objects.annotate(period=Trunc('date', 'year')).order_by('period')
pivot(qs, 'period', 'category', 'content', Count)

I think it would make sense to implement this in the package.

2. Default value
The default column value is None atm. Why not using 0? Alternatively could one set the default value at module level? Something like:

def _get_annotations(column, column_values, data, aggregation):
    value = data if hasattr(data, 'resolve_expression') else F(data)
    return {
        display_value: aggregation(Case(When(Q(**{column: column_value}), then=value), default=DEFAULT_VALUE))
        for column_value, display_value in column_values
    }

3. Cumulative aggregation
It would be fairly easy to allow for cumulative aggregation as well, which can be useful. It can be done using the window function (+distinct):

window = Window(aggregator(data), order_by=row)

It does not work on sqlite though...
I'd be happy to have a look at it if you think it could be useful.

The text was updated successfully, but these errors were encountered:

martsberger · 2018-05-01T15:30:13Z

Thanks for your feedback, these are good comments!

#1 Aggregating annotated field and order_by

If it works to remove the ordering altogether, I could implement that, .e.g., if

qs = Blog.objects.annotate(period=Trunc('date', 'year')).order_by()
pivot(qs, 'period', 'category', 'content', Count)

fixes the issue, I could make the pivot function remove the ordering. I think this should work as the result of pivoting should be independent of order_by.

If a specific ordering is required in this case it's a bit trickier, because adding an ordering that isn't necessary in other cases would degrade performance in those cases.

#2 Default value
There is a difference between None and 0 that is meaningful. If I find a bunch of records to sum up and the sum is zero, I should return the value 0. If I look for records to sum up and find no records at all, this is different. If I report 0 in the second case I'm indicating I found records that sum to zero, while if I return None I'm indicating no records were found.

Granted, in some cases people prefer to have the number 0 even in the second case I described above. There are two available ways to achieve this

Convert None to 0 after you get the result from pivot
Use an aggregation that converts Null to 0 in the database (this might not be the greatest implementation):

from django.db.models import Count
class CountDefaultZero(Count):
    template = 'COALESCE({}, 0)'.format(Count.template)

pivot(qs, 'period', 'category', 'content', CountDefaultZero)

Possibly allowing a default passed into the pivot function would be nicer:

pivot(qs, 'period', 'category', 'content', Count, default=0)

or maybe it unnecessarily clutters the api. I'll give it some thought.

#3 Cumulative aggregation
Does MySQL support the window function or just postgres? I'm okay with having db specific extensions, just need to think about the right way to do it. Is it possible to pass in an aggregation that does this?

If you have an implementation in mind, I'd be interested in seeing it.

PJCampi · 2018-05-02T05:51:03Z

#1: order_by() works and it is better than using an arbitrary ordering.

#2: that's what I figured, and that's what Excel pivot does by default too (I think).

It's a valid argument for Sum, less so for e.g. Count. In practice, like in your shirt sales example, if a shirt of a particular style has not been sold, it will be represented with no row present in an inventory table. The total income for that style should ideally return 0eur.

3# I think that most SQL languages support the OVER (PARTITION BY x ORDER BY y) clause which is used by the Window function (see here for MySql). I have not tested it though, I use Postgres.

You cannot pass it as an aggregation because you need to call distinct() afterwards.
Basically instead of:

qs.annotate(**{column: aggregator(data)})

you need to call:

qs.annotate(**{column: Window(aggregator(data), order_by: F(row).asc())}).distinct()

It also requires some kind of support for passing ordering on the row field. Let me know if you want something more concrete.

martsberger · 2019-03-15T17:35:18Z

Opened #9 to address point 2 above.

martsberger · 2023-09-26T16:51:38Z

I opened #31 for cumulative aggregation. With that, I'm going to close this issue as all the other parts have been resolved.

martsberger mentioned this issue Mar 15, 2019

Allow default values if no records are found #9

Closed

This was referenced Sep 25, 2023

Queryset and pivot table ordering #28

Closed

Ordered querysets #29

Merged

Cumulative aggregation #31

Open

martsberger closed this as completed Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation on annotated field fails if queryset ordered by other values + set default value #5

Aggregation on annotated field fails if queryset ordered by other values + set default value #5

PJCampi commented Apr 29, 2018

martsberger commented May 1, 2018

PJCampi commented May 2, 2018

martsberger commented Mar 15, 2019

martsberger commented Sep 26, 2023

Aggregation on annotated field fails if queryset ordered by other values + set default value #5

Aggregation on annotated field fails if queryset ordered by other values + set default value #5

Comments

PJCampi commented Apr 29, 2018

martsberger commented May 1, 2018

PJCampi commented May 2, 2018

martsberger commented Mar 15, 2019

martsberger commented Sep 26, 2023