Skip to content
Annie Chaehong Lee edited this page Apr 2, 2018 · 9 revisions

Train and run a logistic regression model to evaluate the predictions using 10-fold cross-validation.

Include files

  • MultinomialLogRegression.py: Kernel Support Vector Machine
    -> Multiclass classification that used built-in function and newton-cg as solver

  • LogisticRegression.py: -> Binary classification that used built-in function on binary classified wine datasets

Usage

>> python MultinomialLogRegression.py <RED WINE DATASET_PATH> <WHITE WINE DATASET_PATH>
>> python LogisticRegression.py <RED WINE DATASET_PATH> <WHITE WINE DATASET_PATH>

Example Output

-------------Red Wine Evaluation-------------
Multiclass Logistic regression Train Accuracy :: 0.567470956211
Multiclass Logistic regression Test Accuracy :: 0.5375
CV-prediction error rate :: [ 0.50617284  0.61490683  0.52795031  0.50625     0.5375      0.6375      0.5
  0.525       0.53797468  0.51592357]
Binary Logistic regression Train Accuracy :: 0.959785522788
Binary Logistic regression Test Accuracy :: 0.964583333333
Binary CV-prediction error rate :: [ 0.95652174  0.95652174  0.95652174  0.9625      0.9625      0.9625
  0.96226415  0.96855346  0.96226415  0.96226415]
-------------White Wine Evaluation-------------
Multiclass Logistic regression Train Accuracy :: 0.566577301162
Multiclass Logistic regression Test Accuracy :: 0.520833333333
CV-prediction error rate :: [ 0.50617284  0.61490683  0.52795031  0.50625     0.5375      0.6375      0.5
  0.525       0.53797468  0.51592357]
Binary Logistic regression Train Accuracy :: 0.95799821269
Binary Logistic regression Test Accuracy :: 0.96875
Binary CV-prediction error rate :: [ 0.95652174  0.95652174  0.95652174  0.9625      0.9625      0.9625
  0.96226415  0.96855346  0.96226415  0.96226415]

Resources used

Results

Multinomial Class
Average CV-values for Multinomial Logistic Regression:: 0.5409

With the accuracies obtained from training and testing set, we figured that the approach of multiclass logistic regression is not an applicable linear classifier for the multiclass dataset. Hence, it is unavailable to compare the result of CV-average with the accuracies above.

Binary Class
Average of CV-values for Binary Logistic Regression:: 0.9612

With the binary transformed classes, the linear classifier was easily adapted to the modified dataset. Hence, the accuracies obtained show that training accuracy is a bit higher than the testing accuracy. And, the average of CV-values is smaller than the training accuracy and bigger than the test accuracy. Generally, the CV-prediction underestimates the test accuracy; however, in this case, there are fluctuations of error rates for CV-prediction(some are higher than test error, some are lower).Linear classifier as binary logistic regressions results in better accuracy, hence we conclude that binary classes are much better classified with linear classifier.

To Improve

Authors:

  • Andy Kim
  • Nigel Kim
  • Jay Ha