-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback on unexpected behavior in groupBy #34
Comments
Ah yes this is a bit of a strange situation. tidy has been designed to work primarily on objects and generally it is unexpected for group keys to disappear on them. For example: tidy([
{ group: 'group1', b: 2 },
{ group: 'group1', b: 3 },
{ group: 'group2', b: 4 },
],
groupBy(['group'], [
summarize({
n: n(),
}),
]),
);
// output:
[{"n": 2, "group": "group1"}, {"n": 1, "group": "group2"}] vs tidy([
{ group: 'group1', b: 2 },
{ group: 'group1', b: 3 },
{ group: 'group2', b: 4 },
],
groupBy(['group'], [
summarize({
n: n(),
}),
], { addGroupKeys: false }),
);
// output:
[{"n": 2}, {"n": 1}] Using on non-object inputs produces these strange results you're seeing, which can be circumvented (as you've discovered) by not adding group keys. I'm not sure there is a reasonable default behavior to do here. I could potentially inspect each object to see if it is an object before adding in the group keys. This may be a bit expensive (probably would need to verify it is not an array too and then what about Map or Set or something else, but maybe they don't matter as much). Might be worth it though! Another idea may be to not add group keys by default when using a function as the accessor (e.g. d => d.count === 0). Might be a good choice here. What do you think? Some alternative approaches to what you did that involve first converting to objects (for the most part): tidy(
Object.entries(states).map((d) => ({ id: d[0], ...d[1] })),
groupBy([(d) => d.count === 0], [], groupBy.entries())
);
// output:
[
[true, [{ id: '01', name: 'Alabama', count: 0, group_0: true }]],
[
false,
[
{ id: '02', name: 'Alaska', count: 1, group_0: false },
{ id: '04', name: 'Arizona', count: 2, group_0: false },
{ id: '05', name: 'Arkansas', count: 20, group_0: false },
],
],
]; tidy(
Object.entries(states).map((d) => ({ id: d[0], data: d[1] })),
groupBy(
[(d) => d.data.count === 0],
[],
groupBy.entries({ addGroupKeys: false })
)
);
// output:
[
[true, [{ id: '01', data: { name: 'Alabama', count: 0 } }]],
[
false,
[
{ id: '02', data: { name: 'Alaska', count: 1 } },
{ id: '04', data: { name: 'Arizona', count: 2 } },
{ id: '05', data: { name: 'Arkansas', count: 20 } },
],
],
]; tidy(
Object.entries(states),
groupBy(
[(d) => d[1].count === 0],
[],
groupBy.entries({
mapLeaves: (leaves) => Object.fromEntries(leaves),
})
)
);
// output:
[
[true, { '01': { name: 'Alabama', count: 0 } }],
[
false,
{
'02': { name: 'Alaska', count: 1 },
'04': { name: 'Arizona', count: 2 },
'05': { name: 'Arkansas', count: 20 },
},
],
]; tidy(
Object.entries(states),
groupBy(
[(d) => d[1].count === 0],
[],
groupBy.entries({
mapLeaves: (leaves) => Object.fromEntries(leaves),
})
)
);
// output:
[
{ '01': { name: 'Alabama', count: 0 } },
{
'02': { name: 'Alaska', count: 1 },
'04': { name: 'Arizona', count: 2 },
'05': { name: 'Arkansas', count: 20 },
},
]; |
The main problem really is that |
Thanks for the quick and thorough response! Could be just a little warning in the docs is the best solution for now such as:
That's a bit vague and can be improved. To understand the problem more generally, do you think that setting |
Interesting. That may be related to something I was seeing where the order of the keys was unexpected. I don't have a sanitized example at the moment but let me see if I can put something together. Edit: Actually, I see the default playground shows the workaround: // {T.*} - all Tidy.js functions are available directly and as T.*
// {input} - `input` variable matches mtcars or iris based on input selection
// {output} - put your output in the predefined output variable
output = tidy(input,
groupBy(['cyl', 'gear'], [
summarize({
n: n(),
mpg: mean('mpg'),
hp: mean('hp'),
wt: mean('wt'),
}),
]),
select(['cyl', 'gear', everything()]),
arrange([desc('n'), desc('mpg')])
); I thought it was surprising that the original columns |
Thanks yea at the very least I'll try to make the docs more clear. I do think setting is to false is an okay solution and it's what I've done a couple of times as proof of concept when working on flat arrays of values. Each function is completely independent, so I wouldn't expect you to have a cascade of problems... but each function is designed to work with objects. To get around this, you can often pass an accessor that is an identity function e.g. I tend to keep things as objects until I'm done wrangling them. And yes, right now the group keys are appended, which ensures they overwrite an identically named property on the object, but has the side effect of moving them to the end of the object's key list. Perhaps they should be pre-pended which would keep them at the front or in the original location if they were not removed from the object. FWIW if you're working with just simple values and not doing much wrangling, you may find it convenient enough to just use the d3-array groups or rollups commands. e.g. d3.groups(Object.entries(states), d => d[1].count === 0)
// output:
[
[true, [["01", {"name": "Alabama", "count": 0}]]],
[
false,
[
["02", {"name": "Alaska", "count": 1}],
["04", {"name": "Arizona", "count": 2}],
["05", {"name": "Arkansas", "count": 20}]
]
]
] |
oh yeah ... interesting. okay here's a thought: the concept of "adding group keys" is a little weird since the "adding" that generally happens is really just a side effect of the fact that summarize strips them away and they need to be reintroduced (as you point out @pbeshai). From a user's perspective, it looks like they never left. But accessors are different, and I think it's a good idea to treat them differently. While I would expect keys to be maintained, I would want to opt in to adding the result of an accessor as a new field. SO what if Related to the appending vs prepending question: I've wondered if something to assist with exporting for download could be a good addition. Something like: const csvData = tidy(input,
groupBy(['cyl', 'gear'], [
summarize({
n: n(),
mpg: mean('mpg'),
hp: mean('hp'),
wt: mean('wt'),
}),
]),
arrange([desc('n'), desc('mpg')]),
toCSV([
{ 'cyl': 'cylinders'},
'gear',
everything(),
{ n: 'total'}
])
); ^^ that array would both order and rename columns so you can have easy control over the output. |
Maybe this is my d3.nestbrain talking but I think the default behavior I would expect is not to add the group keys back. I feel like once I've nested the objects and started the flow, if I want to transform the leaf array into something fun and cool that loses the keys in the process, that's my business! Even when I "lose" the key by summarizing I don't really lose it, because I will typically export into my preferred shape with .entries(), .object(), etc. If I really just want an array of summary objects that keeps the keys, I do have options like adding To me adding the keys should feel more like a type of export you choose, i.e. This also avoids accessor vs. string problems, and makes it so summarize() produces the same summary object both inside and outside a groupBy flow. Also, in an example like: tidy(input,
groupBy(['cyl'], [
summarize({
n: n(),
mpg: mean('mpg'),
}),
], groupBy.object({ single: true })),
); I don't want the value objects in my dict to have redundant In a case where the group keys are going to get added back, I'm team prepend for two reasons:
|
Thanks all for your input on this. I think this is the course of action I will take: To add in now
For laterThese need more discussion and won't happen for a while, maybe never... but I'm thinking:
Noah kinda convinced me I shouldn't be adding them in by default, but it would be a big breaking change for them to disappear, and I'm not sure it will ever be worth causing that kind of havoc. |
- prepends group keys instead of appends when assigning them - doesn't assign group keys to non objects - doesn't ever assign function group keys - also doesn't break when a falsy fn is passed to a groupby flow
@mhkeller your svelte repl now... works? ;) v2.4.0 has the fixes described above. I'm going to close this issue. I'll open another one to handle discussion about switching to never adding group keys. And now you can just write: tidy(
Object.keys(states),
groupBy(d => states[d].count === 0, groupBy.entries())
); |
Hi, Thanks for making this library. I've wanted it to exist for years. I've been trying to incorporate into some of my projects and I came across something surprising today.
Here's a REPL reproduction: https://svelte.dev/repl/3d3126f8ea994d3d866427cab0642e3b?version=3.38.2
I wanted to group a list of states based on a value, in this case
count
. I was getting a really weird output, though. After poking around a bit, I discovered that I fixed the problem by setting{ addGroupKeys: false }
in the group export. The issue seems to be that because my input data is a list of strings, the default behavior of adding a key onto the element is converting it to an object.I'm not sure what the best solution is for it – maybe a change in the docs, a console warning or possibly a change in the default behavior so that it doesn't mutate the data by default. My expectation was definitely that it wouldn't mutate the original object, for what it's worth.
The text was updated successfully, but these errors were encountered: