-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a datamation for binary outcomes #98
Comments
@jhofman , I created this Its frames are:
I think we don't need two datamations, only one, like this. What do you think? simpson.mov |
Very cool on the quick turnaround for this @giorgi-ghviniashvili! I think it's a very good start. Small details:
As for the two vs. one damation, I see what you mean. At the same time, I think it's nice to have them separately as well so you can compare them. That's what we did w/ the salary data and I think it was effective. Can you generate each separately so we could see them side-by-side? |
Hi @jhofman,
About the 2 datamations side by side: to make this work, I must wrap whole app.js into a closure function (or class), otherwise it is not possible to have two instances at the same time. When Must be changed to this: (notice |
Great on the fixes. No worries about making it actually side by side. Would
be fine to just make two separate movies, we can view them in separate
browser tabs to compare.
…On Thu, Sep 30, 2021 at 9:50 AM Giorgi Ghviniashvili < ***@***.***> wrote:
Hi @jhofman <https://github.com/jhofman>,
- switched year to be x axis and player to be y axis
- fixed typo "Betting"
- open vs filled circle, I made this work.
About the 2 datamations side by side: to make this work, I must wrap whole
app.js into a closure function (or class), otherwise it is not possible to
have two instances at the same time. When init is called second times, it
overwrites old values because they are in global scope.
So this:
[image: image]
<https://user-images.githubusercontent.com/6615532/135466184-abc34eb3-9cdb-45c9-90c7-33e550a73514.png>
Must be changed to this: (notice function App() {} declaration, which
encloses the code).
[image: image]
<https://user-images.githubusercontent.com/6615532/135467701-80642092-6c32-489a-8fa0-fe4a4727bd64.png>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#98 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAATNS4FVHNIGHHZ4QRMSN3UERTJPANCNFSM5E75WV2A>
.
|
I just modified it to make side-by-side. @jhofman let me know what you think. side-by-side.mov |
@jhofman , having it wrapped by a function, into a private scope is best practice. I hope @sharlagelfand can fix modify htmlwidgets to support this. Otherwise I can revert that change. Not it does not create any problems, because it is in a separate branch. |
Gotcha. Is there a way to get better spacing on the circles so they don't overlap? |
@jhofman good point. I reduced circle radius: |
Smaller radius looks much better. Can you update colors to be consistent as well, so that Derek Jeter's points are all orange from the first frame to the last? Also, do you think it's worth changing the orientation of the initial frames so the players are side-by-side (1 row, 2 columns) instead of stacked on top of each other (1 column, 2 rows)? That should transition more naturally to the final frame where players' names are on the x axis, right? Once those are set can you render the full animation to see what it looks like? (I'm seeing just the key frames at the moment.) |
I can't make the colors consistent on second frame, because I am adjusting fill and stroke colors based on hit. So if hit === yes, then fill blue, otherwise #fff. I am not able to add expression like that: About placing players side by side, I chose stacked, because of space.. But now I did this: datamations.mov |
This is a nice update! On the colors, what can we do as a workaround for this? I think it will be generally important to have this kind of functionality. Is it a limitation due to Gemini or Vega, or something about the stack you've built on top of them? |
I could not figure out in Vega. I still can try out some workarounds. |
Ah that's too bad. I didn't realize this would be so difficult. Would using different shapes (instead of filled vs. empty circles) be any easier? |
Yes, using different shapes will be easier. Can I try triangle and circle? |
Sure, let’s see what it looks like! If it’s not good, circle and square
could be another option.
…On Thu, Oct 7, 2021 at 2:40 PM Giorgi Ghviniashvili < ***@***.***> wrote:
Yes, using different shapes will be easier. Can I try triangle and circle?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#98 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAATNS3K2EYCYXI56BQZN63UFXSSRANCNFSM5E75WV2A>
.
|
Ah yes, square and circle will be ok. I will try tomorrow. |
@jhofman I easily made it with shapes: |
Hi @jhofman @giorgi-ghviniashvili! Just catching up on this thread 👋🏻 I think we need to back up a bit and consider how we can generate these visualizations within the existing datamations framework and API - right now we aren't even able to handle non-numeric (i.e. binary or categorical) response variables, even though it was technically possible to generate these visualizations via custom specs as Giorgi has done. I'm going to put together some thoughts / adjustments on how we can start to handle this, hopefully over the next day or two. Just wanted to say my eyes are back on! 👀 |
I'll have to detect it on my end whether the response variable is categorical (e.g. character / factor in R) or binary (0/1 or TRUE/FALSE) and generate another info grid at the "display the response variable" step, instead of the jittered scatter plot that we're currently generating at that step (which would still be used when the variable is numeric). If we want to map shape to categorical / binary variables, then the grid generation function needs to be able to take I was thinking too, re: the comment about the spacing on the circles that the number of rows in the grid generation could be dynamic - right now it is fixed to 10, but we could base it on how much data there instead. e.g. there are 551 points for Justice, 630 for Jeter, so with only 10 rows that means about 55 and 63 columns respectively. If we base it on the number of points to try to get a "square" grid for the biggest group, e.g. Thanks! |
@giorgi-ghviniashvili will work on hacking things to get filled vs. empty circles, and adjust grid to fill from top to bottom and left to right. +1 to the idea of making an adaptive spacing square grid that @sharlagelfand suggested. when not a perfect square let's err on the side of more columns than rows for a "wider" grid, so wrt @sharlagelfand's question about passing shape to @sharlagelfand will work on the first paragraph, detecting whether we have a binary outcome. this could include:
and then make the updates for the subsequent |
@giorgi-ghviniashvili, just for reference, here's some ggplot2 code that creates the desired player + hit/no-hit split:
it looks like vegalite doesn't have a fillOpacity legend, unclear if vega does. vega spec below to play with, doesn't behave as expected. some related issues: vega/vega-lite#4982
|
@jhofman I was playing with the batting averages Then we need to have years as column facets in previous facets as well, otherwise animation looks ugly. ugly-datamation.movThis is better I think: adjustment.movLmk your thoughts. |
@jhofman hit yes/no legend achieved using shape encoding and legend symbolFillColor 😎 : -- -- hit.legend.movP.S. code pushed to |
there is a gist for solution spec. |
@giorgi-ghviniashvili I think this solution only works when there is a variable mapped to color: Here is your example pared down: And how it looks with just the colour mapping removed: So unfortunately I'm not sure if this solution is generalizable to specs that don't have a variable mapped to color - do you have thoughts? |
@sharlagelfand yes , good catch. It seems like that vega-lite tries to give default color to shape encoding which is blue and ignores all legend parameters. As long as we will have color for sure, I think this solution will work. We just need to make sure that no shape is passed when color is missing. |
I don't think we can guarantee that color will be present @giorgi-ghviniashvili so we will need to find some solution for when it is missing |
@sharlagelfand if you don't have color, then use this approach with If color, then use shape, fillOpacity, stroke with color. Will that work? The problem we had for the fill vs non filled was color, if there was not color, then we could achieve this with fill and stroke. That was my first solution. |
Thanks @giorgi-ghviniashvili! Just want to share where things are at with the Simpson's Paradox example now since I have made pretty good progress. Just a note that I have sampled the data (~30%) since we cannot support that many points (#51), so the actual numbers might not look as you expect @jhofman grouping by player onlyplayer.only.movgrouping by player and yeargroup.by.player.and.year.movThings are a bit off here - in the frame with is_hit, the legend for it seems to appear twice - once overlapping the colour player legend Also, the placement of the mean and errorbar are off in the mean / errorbar frames - definitely off in the X values, but I think off in Y too when you compare the second last and last frames - and the y axis values are not even showing up! @giorgi-ghviniashvili could you please help me figure out why? thanks! |
@giorgi-ghviniashvili is going to hide the faked legend, which should take care of that problem. sounded like there was a css fix for the x axis misalignment, and something that needed to be added to the spec to fix the y-axis annotations? |
To hide a faked legend, please include css: .vega-vis-wrapper .vega-for-axis .role-legend {
display: none;
} About the second issue of being Here is how it looks fixed: fixed-errorbars.mov |
thanks @giorgi-ghviniashvili, that works great now! Just want to confirm that we will never need to see the faked legend, and that we only use real ones? so using
will never hide something that we actually need to see. Here is how the datamations look now (cc @jhofman) - I think they look pretty good!! group by playerone.movThere is one slight issue with timing here where the y-axis values show up at the end of the animation between the "is_hit" frame and "mean is_hit" frames, if that's something that can be fixed. There doesn't seem to be the same issue in the second animation! group by player and yeartwo.mov |
And just wanted to share how categorical values looks, with shape!
categorical.mov |
@sharlagelfand yes, I confirm that the css only hides faked legend. About the axis issue, seems like it is gemini issue: animating from A to B, where A does not have y axis and B has, causes this issue. Seems like that when we have facets, this issue is gone, because axis is drawn via faked axis layer and not the actual axis. Tested gemini recommendations and all missing y axis. |
@giorgi-ghviniashvili it looks like there's an issue with the grid generation - e.g. this data set:
I send this spec for showing the values of is_hit
but the real spec that the JS code produces has no values with is_hit = 1, they are all 0: |
@sharlagelfand there has been a small but with index. Fixed it: |
Thanks @giorgi-ghviniashvili, that case is fixed. It does not seem to be generalizable though - here is a small variant, where the only change is "is_hit": 1 comes before "is_hit": 0.
Here are the specs it produces - all of the values of "is_hit" are 0 when there should be 1s. |
@sharlagelfand good point, fixed. |
This is a followup to #97, which is complicated because it deals with low base rates.
So to simplify things we'll start w/ visualizing Simpson's Paradox in batting averages instead, as explained by this example on Wikipedia. This example compares two players and shows that while one has a higher batting average than the other within each year, the trend reverses if you look across both years. This happens because of the uneven number of at-bats that each player has in each year.
Below is some R code to make the final plot versions, and the task here is to brainstorm what the datamation will look like.
Right now here's what we're thinking for the overall datamation (across both years):
The datamation that breaks out each year will be similar, but the grid will be a 2-by-2 (player + year).
The text was updated successfully, but these errors were encountered: