Skip to content

K-Means Clustering chapters: zipcode library retired #3

@enzedonline

Description

@enzedonline

The zipcode library used in the exercises was retired a while back making it tricky to follow the example.

I managed to get most of it loaded using zipcodeR and amending the code as follows:

install.packages("zipcodeR")
library(zipcodeR)
zipcode <- search_state('NY')
zipcode$city2 <- toupper(zipcode$major_city)
ds <- merge(ds, zipcode, by.x = "Zip.Code", by.y = "zipcode", all.x = T)

Additionally, if you want to use metric instead of miles, you can adjust the code as follows:

kilometres <- merge(data.cl, centers, by.x = "clust", by.y = "clust")
# create null vector
kms <- c()
# for each row in the kilometres table, calculate the distance in km from the point to the node centre
for(i in 1:nrow(kilometres)){
  kms.temp <- round(as.numeric(distVincentyEllipsoid(c(kilometres$x.x[i], kilometres$y.x[i]), c(kilometres$x.y[i], kilometres$y.y[i]))/1000),0)
  kms <- c(kms, kms.temp)
}
# push the distance data into the kilometres data frame
kilometres$kilometres <- kms

# calculate max distance and total distance for 2 node model
mx.dist2 <- max(kilometres$kilometres) 
tot.kms2 <- sum(kilometres$kilometres, na.rm = T)

Something curious/spurious with the k-means - both distance and max peaked at 6 nodes, something up with the algorithm there ...

totals<-c(tot.kms1, tot.kms2,tot.kms3,tot.kms4,tot.kms5,tot.kms6,tot.kms7,tot.kms8,tot.kms9,tot.kms10)
max.kms<-c(mx.dist1,mx.dist2,mx.dist3,mx.dist4,mx.dist5,mx.dist6,mx.dist7,mx.dist8,mx.dist9,mx.dist10)
df.analysis=data.frame(clusters=1:10, totals, max.kms)

ggplot(df.analysis) + 
  geom_bar(mapping = aes(x = clusters, y = totals/1000), stat = "identity", fill = "black") +
  geom_line(mapping = aes(x = clusters, y = max.kms*10), size = 2, color = "blue") + 
  scale_x_continuous(breaks=scales::breaks_width(1)) +
  scale_y_continuous(name = "Total distance ('000km's)", 
                     sec.axis = sec_axis(~ . / 10, name = "Max Distance")) + 
  theme(
    axis.title.y = element_text(color = "black"),
    axis.title.y.right = element_text(color = "blue"))

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions