groupBy method should allow more control over method option #202

Open
adambom opened this Issue Jun 13, 2013 · 2 comments

2 participants

@adambom

I want to be able provide a function for each of the columns I specify. Let me provide an example.

The following code doesn't work because it just does sum on the year column.

ds.where(function (row) {
    return row.year === 2010;
}).groupBy('race', ['population', 'year']).columns(['year', 'race', 'population']).toJSON();

I end up with a view that looks like this:

[ { _id: 39155,
    race: 'White',
    population: 77036790.6869098,
    year: 643200 },
  { _id: 39157,
    race: 'Black',
    population: 11941573.990469573,
    year: 643200 },
  { _id: 39159,
    race: 'Asian',
    population: 5038865.1221144805,
    year: 643200 },
  { _id: 39161,
    race: 'Hispanic',
    population: 13029068.243787775,
    year: 643200 },
  { _id: 39163,
    race: 'Other',
    population: 6686024.6175516695,
    year: 643200 } ]

Notice how the value for year is not 2010, which is what I want.

I'd like to propose an update to the method option in groupBy that allows me to pass an object with key being the column I'm aggregating by, and value being a function that determines the aggregation method. If none is specified, you get sum. I could rewrite my example above like so:

ds.where(function (row) {
    return row.year === 2010;
}).groupBy('race', ['population', 'year'], {
    method: {
        year: _.identity
        population: _.sum // optional, but showing here to be explicit
    }
})
.columns(['year', 'race', 'population']).toJSON();

This would return:

[ { _id: 39155,
    race: 'White',
    population: 77036790.6869098,
    year: 2010 },
  { _id: 39157,
    race: 'Black',
    population: 11941573.990469573,
    year: 2010 },
  { _id: 39159,
    race: 'Asian',
    population: 5038865.1221144805,
    year: 2010 },
  { _id: 39161,
    race: 'Hispanic',
    population: 13029068.243787775,
    year: 2010 },
  { _id: 39163,
    race: 'Other',
    population: 6686024.6175516695,
    year: 2010 } ]

Whatcha think? I'm open to other approaches too. What I'm trying to emulate is this SQL

SELECT year, race, sum(population) 
FROM mytable 
WHERE year=2010 
GROUP BY race;
@iros
The Miso Project member
@adambom

Yes, the year is being added up. It would be nice if there was a way to specify aggregators for all the columns, so this didin't happen.

The data looks like this:

year | race     | region | population
-------------------------------------
2010 | hispanic | south  | 100
2010 | hispanic | east   | 100
2010 | hispanic | west   | 100
2011 | hispanic | east   | 200
2011 | hispanic | south  | 200
2011 | hispanic | west   | 200
2012 | hispanic | west   | 300
2012 | hispanic | east   | 300
2012 | hispanic | south  | 300
2010 | white    | south  | 200
2010 | white    | east   | 200
2010 | white    | west   | 200
2011 | white    | east   | 300
2011 | white    | south  | 300
2011 | white    | west   | 300
2012 | white    | west   | 400
2012 | white    | east   | 400
2012 | white    | south  | 400
2010 | black    | south  | 150
2010 | black    | east   | 150
2010 | black    | west   | 150
2011 | black    | east   | 160
2011 | black    | south  | 160
2011 | black    | west   | 160
2012 | black    | west   | 170
2012 | black    | east   | 170
2012 | black    | south  | 170

What I'd like to get is data that looks like this:

year | race     | sum(population)
-----------------------------
2010 | hispanic | 300
2010 | white    | 600
2010 | black    | 450
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment