Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group sorting hierarchy not working correctly #109

Closed
katush opened this issue Nov 26, 2018 · 23 comments
Closed

Group sorting hierarchy not working correctly #109

katush opened this issue Nov 26, 2018 · 23 comments
Assignees
Labels
lineup: v4 All issues related to LineUp v4 status: help wanted Extra attention is needed type: bug Something isn't working
Projects

Comments

@katush
Copy link

katush commented Nov 26, 2018

Sorting groups by numerical values (median, mean) is not behaving correctly if the there is something above it in the hierarchy (e.g., group name/categorical attributes)

Example:

  • group by Continent
  • group by Ppl knowing they have HIV (split into 4 bins)
  • sort groups by continent
  • sort groups by Ppl knowing they have HIV (descending, median)

The group with the highest median should be first, but is actually last.
image

@mstreit
Copy link
Contributor

mstreit commented Nov 26, 2018

@katush, please add the information with which deployed version you have tested this. Please also try with https://lineup.js.org/app_develop/.

@katush
Copy link
Author

katush commented Nov 26, 2018

Originally found in: LineUp.js v4.0.0-alpha.7 (20181115-103904)
Also present in current develop:
group by Curren League
group by Age (split into 3 bins)
sort groups by Curren League
sort groups by Age (descending, median)
The top 3 groups are clearly not sorted by median, even though the League name is the same
image

@sgratzl
Copy link
Member

sgratzl commented Nov 26, 2018

according to the current implementation of how categorical groups are compared, it behaves correctly:

https://github.com/datavisyn/lineupjs/blob/d06c42af30281f08aa461790655a7a87c03ef06e/src/model/ICategoricalColumn.ts#L112-L134

i.e find the most frequent category, compare them, if they are the same, sort by frequency of most frequent one

@mstreit
Copy link
Contributor

mstreit commented Nov 26, 2018

@sgratzl, OK, but is there a reason why it needs to use the frequency only for the comparison and not use whatever the user has specific in the 'sort group by' dialog?

@sgratzl
Copy link
Member

sgratzl commented Nov 26, 2018

it does use what is in the hierarchy. However, according to this implementation, the first sorting criteria is already enough to sort the groups thus the second one has no effect

@katush
Copy link
Author

katush commented Nov 26, 2018

To be clear 'sort groups by' on categorical column means 'sort groups by frequency of categories'? I didn't know that (and I don't think it's very clear), but I get what happend now.

@mstreit
Copy link
Contributor

mstreit commented Nov 26, 2018

Aha, yes. I think it would be useful to let the user decide what the grouping criteria should be. Now the default for categorical categories is obviously frequency (number of items). I can think of cases where I would like to sort alphabetically by the category name.

Loosely related: For the group sorting of numerical columns the user can select the statistical measures, which is great. Sorting by frequency (number of items) would also be useful, right?
image

@sgratzl
Copy link
Member

sgratzl commented Nov 26, 2018

. Sorting by frequency (number of items) would also be useful, right?

isn't that why there is the group column for? since why is the number of items in the field specific to a numerical column?

@sgratzl
Copy link
Member

sgratzl commented Nov 26, 2018

Now the default for categorical categories is obviously frequency (number of items).

no it is the name and frequency of the most frequent category in the group. e.g. if you sort the group by a categorical attribute that is not the grouping criteria.

@mstreit
Copy link
Contributor

mstreit commented Nov 26, 2018

. Sorting by frequency (number of items) would also be useful, right?

isn't that why there is the group column for? since why is the number of items in the field specific to a numerical column?

Oh yes, I forgot about that one because it is not added by default anymore. What was the reason to not add it by default? Too wide?

@mstreit
Copy link
Contributor

mstreit commented Nov 26, 2018

Now the default for categorical categories is obviously frequency (number of items).

no it is the name and frequency of the most frequent category in the group. e.g. if you sort the group by a categorical attribute that is not the grouping criteria.

sorry, without an example I cannot follow anymore...

@katush
Copy link
Author

katush commented Nov 26, 2018

The group column however doesn't contain the option to sort by number of items right now.

@lineupjs lineupjs deleted a comment from katush Nov 26, 2018
@mstreit
Copy link
Contributor

mstreit commented Nov 26, 2018

(@katush, I deleted your last comment because it was a reply to my already deleted comment).

@mstreit
Copy link
Contributor

mstreit commented Nov 26, 2018

. Sorting by frequency (number of items) would also be useful, right?

isn't that why there is the group column for? since why is the number of items in the field specific to a numerical column?

@sgratzl, how can I sort the groups by frequency? If it works, I am not aware of that feature. The 'Group Name' column sorts the groups alphabetically.

@sgratzl sgratzl added this to Icebox in lineup Dec 4, 2018
@mstreit
Copy link
Contributor

mstreit commented Jan 3, 2019

@sgratzl, can you clarify my last question? I need to know this for the paper writing.

@sgratzl
Copy link
Member

sgratzl commented Jan 4, 2019

image

@mstreit
Copy link
Contributor

mstreit commented Jan 4, 2019

OK, nice.

Btw, because it came up during paper writing:
By what are categorical attribute groups sorting if they are NOT the stratification criteria?
image

@sgratzl
Copy link
Member

sgratzl commented Jan 4, 2019

the same as it would be the grouping criteria

  • name of the most frequent category in the group
  • frequency of the most frequent category in the group

when it is the grouping criteria the most frequent category is easy since there is only one in the group

@mstreit
Copy link
Contributor

mstreit commented Jan 4, 2019

OK

@thinkh thinkh added lineup: v4 All issues related to LineUp v4 type: bug Something isn't working labels Jan 14, 2020
@sgratzl
Copy link
Member

sgratzl commented Feb 19, 2020

@thinkh please specify the desired behavior since there was no consensus in the previous discussion.

@sgratzl sgratzl added the status: help wanted Extra attention is needed label Feb 19, 2020
@sgratzl sgratzl moved this from Icebox to Discussion / Waiting in lineup Mar 20, 2020
@thinkh
Copy link
Member

thinkh commented Mar 23, 2020

Coming from issue #161, where we came to the same conclusion, I'd would perfer the implementation that @sgratzl showed in this comment.

By what are categorical attribute groups sorting if they are NOT the stratification criteria?

I know it has been a long time, but I cannot imagine a use case where this would make sense. In my opinion, group sorting requires enabled grouping in the first place. I would even suggest to hide the Sort Groups by action and only show it when the column is grouped.

@sgratzl What is your opinion? Does this clarify your questions so that you can start implementing it?

@sgratzl
Copy link
Member

sgratzl commented Mar 25, 2020

I know it has been a long time, but I cannot imagine a use case where this would make sense. In my opinion, group sorting requires enabled grouping in the first place. I would even suggest to hide the Sort Groups by action and only show it when the column is grouped.

I'm not a fan of showing/hiding a button based on the state of the current column. And an use case in general is: group by categorical, sort groups by mean of a numerical.

@sgratzl
Copy link
Member

sgratzl commented Mar 25, 2020

closing in favor of #161

@sgratzl sgratzl closed this as completed Mar 25, 2020
lineup automation moved this from Discussion / Waiting to Done Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lineup: v4 All issues related to LineUp v4 status: help wanted Extra attention is needed type: bug Something isn't working
Projects
No open projects
lineup
  
Done
Development

No branches or pull requests

4 participants