Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial table flattening #4

Merged
merged 2 commits into from
Dec 20, 2016
Merged

Add initial table flattening #4

merged 2 commits into from
Dec 20, 2016

Conversation

snikch
Copy link
Owner

@snikch snikch commented Dec 19, 2016

This seems to be working ok, however has highlighted the lack of sorting in our initial result set implementation. Being a map, it's unable to retain ordering. This probably isn't an issue, as what we could be doing now is working out how we want to be able to sort and filter data (sort by mean, 10 results only) sorta thing.

Not sure where that would live as given a "table" you may want to sort on a total (i.e. the row). This is easy to do for max etc. however average wouldn't be possible without valueCount for example.

May need a think, but we can merge this in the meantime, just the tests are flaky because the input is non deterministic (maps don't order).

}

// lookupKeyDelimiter is used to flatten a string array to a single key.
const lookupKeyDelimiter = "🔑🗝😡🗝🔑"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may actually be a really badly performant way of splitting this, but I wanted a delimiter that had the least chance of collision with row / column string values as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! 😂

Copy link
Collaborator

@mattevans mattevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good man. Plugged it in and had a play 👍

A couple of questions:

  • Is there a specific reason why the deepest aggregation becomes the column titles? Lance would prefer the first aggregation to be column titles, and the rest row titles. This prob makes more sense in conjunction with the point below. Playing around locally, I've switched them around and doesn't seem to have an impact - but thought I'd check.

  • We need the ability to group the rows by headers within the aggregation (see example screeny below). In your opinion, should this be done on the backend, or should the FE handle grouping it as it needs? As an example, I've take some real data and bunged it into a table 'as is'. Then I've also replicated it in the group format that's preferred (all with some hacky JS 😂 ).

screen shot 2016-12-20 at 13 46 05

}

// lookupKeyDelimiter is used to flatten a string array to a single key.
const lookupKeyDelimiter = "🔑🗝😡🗝🔑"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! 😂

@snikch
Copy link
Owner Author

snikch commented Dec 20, 2016

With the examples you have, I don't think there's even a column aggregation to be honest - it's just rows right? An example with columns would be to then add a time component, by month for example, and I'm pretty sure you'd have the time as the deepest aggregate on the query but I may be wrong. Your example would be three different metrics for each cell, one for sum of each of the three different totals.

This for example would be a single metric report with a column aggregated by month.

Planned Jan Feb March April
Operations, Spaces 1 7 5 4
Operations, IT 7 5 3 8
People & Culture, Talent 8 4 2 1

I think that column vs row is literally irrelevant as you can swap them can't you? Trying to figure that one out in my head and I can't see a reason not to. Anyway, easy enough to go either way, but figured you'd generally be bucketing on the x axis, then narrowing on the y. Does that make sense?

As for the headers, I think that's best left to the front end, it's a view issue rather than a data issue. It will however mean we need to lock on to a sorting process to ensure that the grouping doesn't end up out of place.

@snikch
Copy link
Owner Author

snikch commented Dec 20, 2016

Actually I can see I'm wrong now, the status is the column and it's just a single metric. Ignore that part of my comments ;)

So you're saying Lance wants to aggregate by "status" first, then by "Department" then by "Team" (or whatever those are)?

@mattevans
Copy link
Collaborator

mattevans commented Dec 20, 2016

So you're saying Lance wants to aggregate by "status" first, then by "Department" then by "Team" (or whatever those are)?

Yeah, bang on, in the example used, he would want to aggregate by status first, then dept, then team. Essentially, follow the order they come through in the request payload.

Here’s an example request payload from the FE:

{
    "query_type": "job",
    "aggregation":{
        "type":"string",
        "field": "job_status.name",
        "aggregation":{
        	 "type":"string",
        	 "field": "department.name",
        	 "aggregation":{
        	 	"type":"string",
        	 	"field": "team.name"
        	 }
        }
    },
    "metrics":[{
        "type": "count",
        "field": "job.id"
    },{
        "type": "sum",
        "field": "job.weekly_hours"
   }]
}

@snikch
Copy link
Owner Author

snikch commented Dec 20, 2016

Ok, so my gut reaction is that that is wrong and the initial implementation is correct. Row first is pretty much everywhere, from matrix ordering, array ordering etc. in computer science. It would also mean we're being little endian over big endian which feels wrong.

How strongly does Lance feel about it? Feel free to bring him into the convo here.

@mattevans
Copy link
Collaborator

Cool. As discussed in slack, let's leave this as is!
It's working well and I'll chat with Lance around keeping order of aggregations (keeping row first).

@mattevans mattevans merged commit 3cedcf3 into master Dec 20, 2016
@mattevans mattevans deleted the result-tables branch December 20, 2016 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants