Inefficient Gini Coefficient calculation? #855

Closed
oliviaguest opened this Issue Jul 29, 2016 · 3 comments

Comments

Projects
None yet
2 participants
@oliviaguest

When I run:

gini_coef = pysal.inequality.gini.Gini(a).g

where a is any huge numpy array, e.g., with 300000 elements, it gets stuck, irreparably crashing my whole PC, trying to run this line of code (see here):

d = np.abs(np.array([x - xi for xi in x]))

This is not the case when I run it using my own code which does not make use of list comprehensions, and instead calculates Gini like this.
I wrote that just before I discovered you had a Gini function too, so I do not fully understand your code, although I am willing to help. The two functions produce very very similar output (indistinguishable before about 6 decimal places).

Anyway, I thought I would mention that it be worth considering using numpy to calculate the difference.

@sjsrey sjsrey self-assigned this Jul 29, 2016

@sjsrey

This comment has been minimized.

Show comment
Hide comment
@sjsrey

sjsrey Jul 30, 2016

Member

Thanks for raising this issue. I am just about to leave for a week of travel and will definitely be looking into this. At first glance the efficient approach is based on a ranking of the values, which assumes no ties. I originally ruled that out as we had to both handle ties (in an upstream application) and also needed to keep tabs on geographical positions of each observation. So the memory inefficient implementation is what we used.

That said, once I have a little time after vacation, I think it might be possible to refactor things with an eye towards a more efficient approach (as you point to) that also addresses our upstream needs.

Member

sjsrey commented Jul 30, 2016

Thanks for raising this issue. I am just about to leave for a week of travel and will definitely be looking into this. At first glance the efficient approach is based on a ranking of the values, which assumes no ties. I originally ruled that out as we had to both handle ties (in an upstream application) and also needed to keep tabs on geographical positions of each observation. So the memory inefficient implementation is what we used.

That said, once I have a little time after vacation, I think it might be possible to refactor things with an eye towards a more efficient approach (as you point to) that also addresses our upstream needs.

@oliviaguest

This comment has been minimized.

Show comment
Hide comment
@oliviaguest

oliviaguest Jul 30, 2016

Have a nice time during your travels! I'm curious to see how you solve it.
😄

Have a nice time during your travels! I'm curious to see how you solve it.
😄

@sjsrey sjsrey referenced this issue Aug 31, 2016

Merged

Memory efficient Gini and tests #862

0 of 1 task complete
@sjsrey

This comment has been minimized.

Show comment
Hide comment
@sjsrey

sjsrey Sep 21, 2016

Member

Closeed with #862

Member

sjsrey commented Sep 21, 2016

Closeed with #862

@sjsrey sjsrey closed this Sep 21, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment