Weighted distribution across multiple variables #4

soooh · 2017-04-13T15:49:05Z

Could be a useful addition to your library. As an example, I'm interested in getting stats on race and gender in a group over time. Something like:

data_by_year = data.groupby(['year'])
race_gender_demographics = calc.distribution(data_by_year, ['race', 'gender']).round(3)

jsvine · 2017-04-13T16:02:04Z

Hi @soooh! You should actually be able to do this with the current code. It'll depend on what, exactly, you're looking to calculate. But lets say you're looking for the weighted distribution of race, by gender and over time. In that case, this should work:

grouped = data.groupby([ "year", "gender" ])
dist = calc.distribution(grouped, "race").round(3)

Does that work? Are you aiming for something slightly different?

soooh · 2017-04-13T16:09:11Z

Ah, so what that does is give me the racial demographics of women and men separately. E.g., of all the women, 20% are white, 10% are black, and so on. What I want is something like, 10% of the group is white men, 8% white women, etc. Does that make sense?

jsvine · 2017-04-13T16:18:11Z

Ah, so what that does is give me the racial demographics of women and men separately. E.g., of all the women, 20% are white, 10% are black, and so on.

Yep.

What I want is something like, 10% of the group is white men, 8% white women, etc. Does that make sense?

Ah, sounds like I misunderstood the goal. In that case, the easiest way might be like so:

data["race_x_gender"] = data[[ "race", "gender" ]].apply(" x ".join, axis=1)
dist = calc.distribution(data.groupby("year"), "race_x_gender").round(3)

Does that achieve your goal? (It assumes that race and gender are strings.)

I'll also think about ways I could incorporate a generic feature like this into the library itself. Thanks for the suggestion!

soooh · 2017-04-13T16:20:18Z

Ah yes, that is actually what I am doing! 😄
I thought it could be a useful feature, though, which is why I suggested it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighted distribution across multiple variables #4

Weighted distribution across multiple variables #4

soooh commented Apr 13, 2017

jsvine commented Apr 13, 2017

soooh commented Apr 13, 2017

jsvine commented Apr 13, 2017

soooh commented Apr 13, 2017

Weighted distribution across multiple variables #4

Weighted distribution across multiple variables #4

Comments

soooh commented Apr 13, 2017

jsvine commented Apr 13, 2017

soooh commented Apr 13, 2017

jsvine commented Apr 13, 2017

soooh commented Apr 13, 2017