In [5]:
library(data.table)
library(caret)

dt <- fread("data/cars.csv")
dt[, Origin := as.factor(Origin)]

# create train, test split
train.index <- createDataPartition(dt$Origin, p = 0.8, list = FALSE)
dt.train <- dt[train.index]
dt.test <- dt[-train.index]

# train control with cross validation
ctrl <- trainControl(method = "cv")

# train knn model with origin as target, variables are Weight, MPG, Cylinders, Horsepower
# data = dt.train
# data got preprocessed with centering and scaling
# tuneGrid to find the best k 
model <- train(Origin ~ Weight + MPG + Cylinders + Horsepower, 
               data = dt.train, method = "knn",
               preProcess = c("center", "scale"),
               tuneGrid = data.frame(k = c(3, 6, 8, 10)),
               trControl = ctrl)

# print the model
print(model)
print("---------")

# predict on test data and print confusionmatrix
pred.test <- predict(model, dt.test)
print(confusionMatrix(dt.test$Origin, pred.test))

k-Nearest Neighbors 

327 samples
  4 predictor
  3 classes: 'Europe', 'Japan', 'US' 

Pre-processing: centered (4), scaled (4) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 293, 294, 295, 293, 294, 295, ... 
Resampling results across tuning parameters:

  k   Accuracy   Kappa    
   3  0.7371212  0.5086317
   6  0.7371324  0.5110265
   8  0.7311609  0.4965310
  10  0.7282197  0.4938256

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 6.
[1] "---------"
Confusion Matrix and Statistics

          Reference
Prediction Europe Japan US
    Europe      7     2  5
    Japan       0    12  3
    US          4     1 45

Overall Statistics
                                          
               Accuracy : 0.8101          
                 95% CI : (0.7062, 0.8897)
    No Information Rate : 0.6709          
    P-Value [Acc > NIR] : 0.004485        
                                          
                  