Skip to content

kbodwin/celery

Repository files navigation

celery

Codecov test coverage R-CMD-check

The goal of celery is to provide a tidy, unified interface to clustering models. The packages is closely modeled after the parsnip package.

Installation

You can install the development version of celery from GitHub with:

# install.packages("devtools")
devtools::install_github("EmilHvitfeldt/celery")

Example

The first thing you do is to create a cluster specification. For this example we are creating a K-means model, using the stats engine.

library(celery)

kmeans_spec <- k_means(k = 3) %>%
  set_engine_celery("stats") 

kmeans_spec
#> K Means Cluster Specification (partition)
#> 
#> Main Arguments:
#>   k = 3
#> 
#> Computational engine: stats

This specification can then be fit using data.

kmeans_spec_fit <- kmeans_spec %>%
  fit(~., data = mtcars)
kmeans_spec_fit
#> celery cluster object
#> 
#> K-means clustering with 3 clusters of sizes 7, 11, 14
#> 
#> Cluster means:
#>        mpg cyl     disp        hp     drat       wt     qsec        vs
#> 1 19.74286   6 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286
#> 2 26.66364   4 105.1364  82.63636 4.070909 2.285727 19.13727 0.9090909
#> 3 15.10000   8 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000
#>          am     gear     carb
#> 1 0.4285714 3.857143 3.428571
#> 2 0.7272727 4.090909 1.545455
#> 3 0.1428571 3.285714 3.500000
#> 
#> Clustering vector:
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>                   1                   1                   2                   1 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>                   3                   1                   3                   2 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>                   2                   1                   1                   3 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>                   3                   3                   3                   3 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>                   3                   2                   2                   2 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>                   2                   3                   3                   3 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>                   3                   2                   2                   2 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>                   3                   1                   3                   2 
#> 
#> Within cluster sum of squares by cluster:
#> [1] 13954.34 11848.37 93643.90
#>  (between_SS / total_SS =  80.8 %)
#> 
#> Available components:
#> 
#> [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
#> [6] "betweenss"    "size"         "iter"         "ifault"

Once you have a fitted celery object, you can do a number of things. predict() returns the cluster a new observation belongs to

predict(kmeans_spec_fit, mtcars[1:4, ])
#> # A tibble: 4 × 1
#>   .pred_cluster
#>   <fct>        
#> 1 1            
#> 2 1            
#> 3 2            
#> 4 1

extract_cluster_assignment() returns the cluster assignments of the training observations

extract_cluster_assignment(kmeans_spec_fit)
#> # A tibble: 32 × 1
#>    .cluster
#>    <fct>   
#>  1 1       
#>  2 1       
#>  3 2       
#>  4 1       
#>  5 3       
#>  6 1       
#>  7 3       
#>  8 2       
#>  9 2       
#> 10 1       
#> # … with 22 more rows

and extract_clusters() returns the locations of the clusters

extract_clusters(kmeans_spec_fit)
#> # A tibble: 3 × 12
#>   .cluster   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <fct>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1         19.7     6  183. 122.   3.59  3.12  18.0 0.571 0.429  3.86  3.43
#> 2 2         26.7     4  105.  82.6  4.07  2.29  19.1 0.909 0.727  4.09  1.55
#> 3 3         15.1     8  353. 209.   3.23  4.00  16.8 0     0.143  3.29  3.5

About

What the Package Does (One Line, Title Case)

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages