Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mapclassify.Natural_Break() does not return the specified k classes #16

Closed
mingchau opened this issue Oct 25, 2018 · 1 comment
Closed

Comments

@mingchau
Copy link

mingchau commented Oct 25, 2018

Hi,

I use mapclassfiy.Natural_Break() to produce bins for my MapBox heatmap.

My code like this:
df = pd.read_table('./files_output/customer_qty.txt',sep=',',header=None).iloc[:,1]
mapclassify.Natural_Breaks(df.iloc[:,1], k=5)

In my thought it should return 5 classes, but it only returned 3 classes

.
The output is:

Natural_Breaks

Lower Upper Count
=============================================
x[i] <= 1.000 54428
1.000 < x[i] <= 26.000 2475
26.000 < x[i] <= 212.000 66

Attachment is the customer_qty.txt data file.
customer_qty.txt

@weikang9009
Copy link
Member

weikang9009 commented Oct 25, 2018

Thank you for opening the issue.

I think the problem lies in the kmeans function from scipy used in mapclassify.Natural_Breaks to cluster the input data. This issue is related to issues 1 and 2 opened in stackoverflow. The point is that k-means can fail in the sense that clusters can disappear if no data points are assigned to a cluster center in the iterative process. Therefore, a smarter initial selection of cluster center is important and one such initial smarter selection is implemented in sklearn (init=’k-means++’). I think it makes sense to switch from scipy to sklearn to make sure that the returned number of classification is identical to the number specified in the input @sjsrey @ljwolf ?

sjsrey added a commit to sjsrey/mapclassify that referenced this issue Oct 28, 2018
sjsrey added a commit to sjsrey/mapclassify that referenced this issue Oct 29, 2018
sjsrey added a commit to sjsrey/mapclassify that referenced this issue Oct 29, 2018
sjsrey added a commit to sjsrey/mapclassify that referenced this issue Oct 29, 2018
sjsrey added a commit to sjsrey/mapclassify that referenced this issue Apr 27, 2019
@sjsrey sjsrey mentioned this issue Apr 27, 2019
weikang9009 added a commit that referenced this issue May 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants