All instructions are provided for R. I am going to reproduce them in Python as best as I can.

# Preface

From the textbook, p. 371:
> This problem involves the `OJ` data set which is part of the ISLR
package.

In [1]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC


sns.set()
%matplotlib inline

In [2]:
oj = pd.read_csv('https://raw.githubusercontent.com/dsnair/ISLR/master/data/csv/OJ.csv')
oj = pd.get_dummies(oj, drop_first=True)
oj.head(3)

Unnamed: 0,WeekofPurchase,StoreID,PriceCH,PriceMM,DiscCH,DiscMM,SpecialCH,SpecialMM,LoyalCH,SalePriceMM,SalePriceCH,PriceDiff,PctDiscMM,PctDiscCH,ListPriceDiff,STORE,Purchase_MM,Store7_Yes
0,237,1,1.75,1.99,0.0,0.0,0,0,0.5,1.99,1.75,0.24,0.0,0.0,0.24,1,0,0
1,239,1,1.75,1.99,0.0,0.3,0,1,0.6,1.69,1.75,-0.06,0.150754,0.0,0.24,1,0,0
2,245,1,1.86,2.09,0.17,0.0,0,0,0.68,2.09,1.69,0.4,0.0,0.091398,0.23,1,0,0


Columns:
* `Purchase` &mdash; a factor with levels CH and MM indicating whether the customer purchased Citrus Hill or Minute Maid Orange Juice;
* `WeekofPurchase` &mdash; week of purchase;
* `StoreID` &mdash; store ID;
* `PriceCH` &mdash; price charged for CH;
* `PriceMM` &mdash; price charged for MM;
* `DiscCH` &mdash; discount offered for CH;
* `DiscMM` &mdash; discount offered for MM;
* `SpecialCH` &mdash; indicator of special on CH;
* `SpecialMM` &mdash; indicator of special on MM;
* `LoyalCH` &mdash; customer brand loyalty for CH;
* `SalePriceMM` &mdash; sale price for MM;
* `SalePriceCH` &mdash; sale price for CH;
* `PriceDiff` &mdash; sale price of MM less sale price of CH;
* `Store7` &mdash; a factor with levels No and Yes indicating whether the sale is at Store 7;
* `PctDiscMM` &mdash; percentage discount for MM;
* `PctDiscCH` &mdash; percentage discount for CH;
* `ListPriceDiff` &mdash; list price of MM less list price of CH;
* `STORE` &mdash; which of 5 possible stores the sale occured at.

# (a)

From the textbook, p. 372:
> Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

In [3]:
x = oj.drop('Purchase_MM', axis='columns')
x = StandardScaler().fit_transform(x)
y = oj.Purchase_MM
np.random.seed(2)
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=800)

# (b)

From the textbook, p. 372:
> Fit a support vector classifier to the training data using `cost=0.01`, with `Purchase` as the response and the other variables as predictors. Use the `summary()` function to produce summary statistics, and describe the results obtained.

In [4]:
def svm_fit_summary(x, y, **kwargs):
  svm_model = SVC(**kwargs)
  svm_model.fit(x, y)
  print(
    f'Parameters:\n'
    + f'   SVM-Type:  C-classification\n'
    + f' SVM-Kernel:  {kwargs["kernel"]}\n'
    + f'       cost:  {kwargs["C"]}\n\n'
    + f'Number of Support Vectors: {len(svm_model.support_)}\n'
    + f'{svm_model.support_}\n\n'
    + f'Number of Classes:  {len(y.unique())}\n\n'
    + f'Levels:\n' 
    + f'{y.unique()}'
  )
  return svm_model

svm_linear = svm_fit_summary(x_train, y_train, kernel='linear', C=0.01)

Parameters:
   SVM-Type:  C-classification
 SVM-Kernel:  linear
       cost:  0.01

Number of Support Vectors: 449
[  3   5   7   8   9  12  15  16  19  21  23  25  30  32  35  38  46  50
  53  57  65  67  73  76  81  85  89  90  91  93  97 100 102 104 106 107
 110 111 112 116 119 121 126 127 138 139 143 144 149 152 155 156 163 165
 166 168 186 187 188 189 198 204 208 212 217 219 221 222 226 227 231 238
 241 247 250 251 267 269 273 274 276 278 285 287 290 294 297 310 318 324
 325 326 328 330 331 335 338 340 349 350 362 364 367 369 371 375 376 385
 387 399 403 405 407 416 417 421 424 425 429 439 442 445 453 454 459 461
 463 464 466 474 479 483 484 489 490 491 492 493 496 497 499 503 505 511
 514 518 528 532 538 539 547 548 550 553 556 557 560 567 568 578 579 582
 583 584 589 590 595 599 603 606 607 608 611 618 620 628 633 636 637 643
 647 648 654 660 664 669 671 672 673 678 679 682 686 692 695 696 701 702
 705 706 710 712 713 719 720 722 723 727 731 733 736 745 746 754 758 759
 763 771 

# (c)

From the textbook, p. 372:
> What are the training and test error rates?

In [5]:
print(f'Train error rate: {1 - svm_linear.score(x_train, y_train):.3f}\n'
      f'Test error rate: {1 - svm_linear.score(x_test, y_test):.3f}'
)

Train error rate: 0.179
Test error rate: 0.137


This is weird, but the test error rate is lower. Lucky split, I guess.

# (d)

From the textbook, p. 372:
> Use the `tune()` function to select an optimal cost. Consider values in the range 0.01 to 10.

I am using `GridSearchCV` from `sklearn`.

In [6]:
parameters = {'C' : np.linspace(0.01, 10, 1000)}
model = SVC(kernel='linear')
cv_model_linear = GridSearchCV(model, parameters)
cv_model_linear.fit(x_train, y_train)
cv_model_linear.best_params_

{'C': 5.16}

# (e)

From the textbook, p. 372:
> Compute the training and test error rates using this new value for cost.

In [7]:
print(
  f'Training error: {1 - cv_model_linear.score(x_train, y_train):.3f}\n'
  f'Test error : {1 - cv_model_linear.score(x_test, y_test):.3f}'
)

Training error: 0.176
Test error : 0.122


# (f)

From the textbook, p. 372:
> Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

The default value of gamma in sklearn's SVC is `1 / (n_features * X.var())`.

In [8]:
# (b)
svm_rbf = svm_fit_summary(x_train, y_train, C=0.01, kernel='rbf')

Parameters:
   SVM-Type:  C-classification
 SVM-Kernel:  rbf
       cost:  0.01

Number of Support Vectors: 633
[  3   5   6   7   8   9  10  12  14  15  16  19  21  23  25  27  30  32
  35  38  44  46  50  51  53  57  65  67  73  76  79  80  81  82  85  89
  90  91  93  94  97  98 100 102 104 106 107 110 111 112 114 116 118 119
 121 122 126 127 132 138 139 143 144 149 152 155 156 163 165 166 168 178
 184 186 187 188 189 197 198 204 208 211 212 213 217 219 221 222 226 227
 231 238 240 241 247 249 250 251 256 259 266 267 269 273 274 276 278 279
 285 287 289 290 294 297 298 303 310 312 318 322 324 325 326 328 329 330
 331 335 338 340 344 349 350 355 356 362 363 364 367 368 369 371 375 376
 379 381 384 385 386 387 399 403 405 407 408 412 416 417 419 421 424 425
 429 431 432 434 437 439 442 445 453 454 455 456 459 461 463 464 466 474
 479 483 484 485 486 489 490 491 492 493 496 497 499 503 505 511 514 515
 518 519 528 531 532 536 538 539 545 547 548 550 551 553 556 557 560 567
 568 569 578

In [9]:
# (c)
print(f'Train error rate: {1 - svm_rbf.score(x_train, y_train):.3f}\n'
      f'Test error rate: {1 - svm_rbf.score(x_test, y_test):.3f}'
)

Train error rate: 0.394
Test error rate: 0.378


In [10]:
# (d)
model = SVC(kernel='rbf')
cv_model_rbf = GridSearchCV(model, parameters)
cv_model_rbf.fit(x_train, y_train)
cv_model_rbf.best_params_

{'C': 0.25}

In [11]:
# (e)
print(
  f'Training error: {1 - cv_model_rbf.score(x_train, y_train):.3f}\n'
  f'Test error : {1 - cv_model_rbf.score(x_test, y_test):.3f}'
)

Training error: 0.159
Test error : 0.144


# (g)

From the textbook, p. 372:
> Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set `degree=2`.

In [12]:
# (b)
svm_poly = svm_fit_summary(x_train, y_train, C=0.01, kernel='poly', degree=2)

Parameters:
   SVM-Type:  C-classification
 SVM-Kernel:  poly
       cost:  0.01

Number of Support Vectors: 634
[  0   3   5   6   8   9  10  12  15  16  17  19  21  22  23  27  30  35
  36  38  39  44  46  47  49  51  53  54  57  58  59  65  67  68  73  76
  80  82  83  85  87  89  90  93  94  97 100 102 104 105 110 111 112 114
 118 122 126 127 132 137 138 141 143 144 149 154 155 156 161 163 165 166
 170 177 178 181 182 184 185 187 188 189 193 195 198 204 212 213 214 217
 219 221 222 227 231 236 241 247 249 250 251 253 256 273 274 276 278 279
 280 281 285 287 289 290 291 294 297 300 302 308 310 314 315 319 322 324
 329 330 331 335 338 339 340 343 344 349 350 354 359 360 361 362 367 368
 371 372 375 376 381 384 385 386 387 390 392 399 402 405 407 412 416 417
 419 424 425 429 430 432 434 437 439 445 451 453 454 455 458 459 463 466
 469 470 474 479 484 485 491 493 496 497 499 503 505 510 511 512 514 518
 519 526 527 528 529 531 532 536 538 539 543 544 545 548 550 551 552 553
 555 556 55

In [13]:
# (c)
print(f'Train error rate: {1 - svm_poly.score(x_train, y_train):.3f}\n'
      f'Test error rate: {1 - svm_poly.score(x_test, y_test):.3f}'
)

Train error rate: 0.381
Test error rate: 0.367


In [14]:
# (d)
model = SVC(kernel='poly', degree=2)
cv_model_poly = GridSearchCV(model, parameters)
cv_model_poly.fit(x_train, y_train)
cv_model_poly.best_params_

{'C': 7.33}

In [15]:
# (e)
print(
  f'Training error: {1 - cv_model_poly.score(x_train, y_train):.3f}\n'
  f'Test error : {1 - cv_model_poly.score(x_test, y_test):.3f}'
)

Training error: 0.199
Test error : 0.237


# (h)

From the textbook, p. 372:
> Overall, which approach seems to give the best results on this data?

In [18]:
pd.DataFrame(
  [
   [
    'linear'
    , 0.01
    , 1 - svm_linear.score(x_train, y_train)
    , 1 - svm_linear.score(x_train, y_train)
   ]
   ,
   [
    'linear'
    , cv_model_linear.best_params_['C']   
    , 1 - cv_model_linear.score(x_train, y_train)
    , 1 - cv_model_linear.score(x_test, y_test)
   ]
   ,
   [
    'rbf'
    , 0.01
    , 1 - svm_rbf.score(x_train, y_train)
    , 1 - svm_rbf.score(x_train, y_train)
   ]
   ,
   [
    'rbf'
    , cv_model_rbf.best_params_['C']   
    , 1 - cv_model_rbf.score(x_train, y_train)
    , 1 - cv_model_rbf.score(x_test, y_test)
   ]
   ,
   [
    'poly'
    , 0.01
    , 1 - svm_poly.score(x_train, y_train)
    , 1 - svm_poly.score(x_train, y_train)
   ]
   ,
   [
    'poly'
    , cv_model_poly.best_params_['C']   
    , 1 - cv_model_poly.score(x_train, y_train)
    , 1 - cv_model_poly.score(x_test, y_test)
   ]
  ]
  , index=[
           'Linear, default'
           , 'Linear, CV'
           , 'RBF, default'
           , 'RBF, CV'
           , 'Polynomial, default'
           , 'Polynomial, CV'
          ]
  , columns=[
             'kernel'
             , 'C'
             , 'train_err'
             , 'test_err'
  ]
).sort_values('test_err')

Unnamed: 0,kernel,C,train_err,test_err
"Linear, CV",linear,5.16,0.17625,0.122222
"RBF, CV",rbf,0.25,0.15875,0.144444
"Linear, default",linear,0.01,0.17875,0.17875
"Polynomial, CV",poly,7.33,0.19875,0.237037
"Polynomial, default",poly,0.01,0.38125,0.38125
"RBF, default",rbf,0.01,0.39375,0.39375


Linear model with C chosen by cross-validation gives the best results on the test data. Perhaps the true relationship between the predictors and `Purchase` is linear.