## Question 1 solution

By analysing this used case with the conditions given, this implies that when our model predicts that the electrical mini-substation is performing well while it is not, we will encounter a **false positive** which we need to avoid.

Therefore, **false positive** is worse for this used case which we need to minimize. And by minimizing false positive, we need to increase precision by setting a beta value close to 0 which will be suitable for an Fbeta score. We will use a beta value of 0.05 for this problem. 

## Question 2 solution

Load file and split date into training, validation, and test sets.

In [57]:
import random
import turicreate as tc
random.seed(0)

In [58]:
# Load data
data = tc.SFrame("0397198_data.csv")
data

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,float,float,float]
If parsing fails due to incorrect types, you can correct


the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


Condition,Voltage,Current,Temperature
1,81.41279041006895,487.88071256964497,38.75571639527504
1,78.77194750939918,491.53038862715465,54.614373673647655
1,81.04558533958897,488.5644736992426,43.14597383349057
1,80.26080505341892,490.0072309797693,45.68251018097634
0,79.02597007921467,488.66305932612033,39.91792414431104
1,78.08260893290515,485.6496989266094,42.91341806598235
1,79.68938666725457,488.8009873401819,48.7747618992723
1,77.94349177287891,485.5885857757599,52.42668469573381
1,78.49353599367441,486.07878458193966,49.48901945164625
1,79.79376099483481,489.56493291299273,53.78902045504981


In [59]:
# split the data
training_set, test_set = data.random_split(0.8, seed = 0)
test_set, validation_set = test_set.random_split(0.5, seed = 0)
print(f"trianing_set = {training_set.shape}")
print(f"test_set = {test_set.shape}")
print(f"validation_set = {validation_set.shape}")

trianing_set = (797, 4)
test_set = (100, 4)
validation_set = (103, 4)


## Question 3 solution

In [60]:
perceptron = tc.logistic_classifier.create(data, target='Condition')

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [61]:
perceptron.coefficients

name,index,class,value,stderr
(intercept),,1,2.4031939572898007,13.02820513647118
Voltage,,1,0.0077464607053791,0.1468674056840951
Current,,1,-0.0007365265428067,0.0026257643893514
Temperature,,1,-0.0107527987550909,0.0094129919361256


### NOTE

The function used above has feature rescaling turned on by default, meanwhile the coefficients are given in the original scale.

## Question 4 solution

In [62]:
#create the first perceptron
perceptron_1 = tc.logistic_classifier.create(
                               training_set, target='Condition', features = ['Voltage', 'Current', 'Temperature'],
                               l2_penalty=0.1, l1_penalty=0.02, 
                               step_size = 5, max_iterations = 120,
                               validation_set=validation_set)

In [63]:
# #create the second perceptron using 0.05 as the hyperparameter
perceptron_2 = tc.logistic_classifier.create(
                               training_set, target='Condition', features = ['Voltage', 'Current', 'Temperature'],
                               l1_penalty=0.01, l2_penalty=0.05, 
                               step_size = 5, max_iterations = 120,
                               validation_set=validation_set)

## Question 5 solution

**Display the accuracy for perceptron 1 and 2**

In [64]:
# perceptron 1 accuracies
print("Training set accuracy for first perceptron: ", perceptron_1.evaluate(training_set)['accuracy'])
print("validation set accuracy for first perceptron: ", perceptron_1.evaluate(validation_set)['accuracy'])

Training set accuracy for first perceptron:  0.9058971141781681
validation set accuracy for first perceptron:  0.9029126213592233


In [65]:
# perceptron 2 accuracies
print("Training set accuracy for second perceptron: ", perceptron_2.evaluate(training_set)['accuracy'])
print("validation set accuracy for second perceptron: ", perceptron_2.evaluate(validation_set)['accuracy'])

Training set accuracy for second perceptron:  0.9058971141781681
validation set accuracy for second perceptron:  0.9029126213592233


**Display the confusion matrix for perceptron 1 and 2**

In [66]:
# perceptron 1 confusion matrix
print("\nValidation set confusion matrix for first perceptron: \n", perceptron_1.evaluate(validation_set)['confusion_matrix'])


Validation set confusion matrix for first perceptron: 
 +--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      1       |        1        |   46  |
|      0       |        1        |   5   |
|      1       |        0        |   5   |
|      0       |        0        |   47  |
+--------------+-----------------+-------+
[4 rows x 3 columns]



In [67]:
# perceptron 2 confusion matrix
print("\nValidation set confusion matrix for second perceptron: \n", perceptron_2.evaluate(validation_set)['confusion_matrix'])


Validation set confusion matrix for second perceptron: 
 +--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      1       |        1        |   46  |
|      0       |        1        |   5   |
|      1       |        0        |   5   |
|      0       |        0        |   47  |
+--------------+-----------------+-------+
[4 rows x 3 columns]



**Calculate and display recall, precision, and F_beta score**

In [68]:
# perceptron 1 recall, precision, and f_beta values(using the value of beta to be 2)
recall_1 = perceptron_1.evaluate(validation_set)['recall']
precision_1 = perceptron_1.evaluate(validation_set)['precision']
beta = 0.05
f_beta1 = (1 + beta**2) * (recall_1 * precision_1)/((beta**2 * precision_1) + recall_1)

print("validation set recall value for perceptron 1: ", recall_1)
print("validation set precision value for perceptron 1: ", precision_1)
print("validation set f_beta value for perceptron 1: ", f_beta1)

validation set recall value for perceptron 1:  0.9019607843137255
validation set precision value for perceptron 1:  0.9019607843137255
validation set f_beta value for perceptron 1:  0.9019607843137255


In [69]:
# perceptron 1 recall, precision, and f_beta values(using the value of beta to be 2)
recall_2 = perceptron_2.evaluate(validation_set)['recall']
precision_2 = perceptron_2.evaluate(validation_set)['precision']
beta = 0.05
f_beta2 = (1 + beta**2) * (recall_2 * precision_1)/((beta**2 * precision_2) + recall_2)

print("validation set recall value for perceptron 2: ", recall_2)
print("validation set precision value for perceptron 2: ", precision_2)
print("validation set f_beta value for perceptron 2: ", f_beta2)

validation set recall value for perceptron 2:  0.9019607843137255
validation set precision value for perceptron 2:  0.9019607843137255
validation set f_beta value for perceptron 2:  0.9019607843137255


## QUESTION 6 SOLUTION

Both models **'perceptron 1 and 2'** have the same output values for accuracy, recall, precision, fbeta score and the confusion matrix for both models are also the same. Therefore, a tie is declared between both models.

## Question 7 solution

**Using the test set to calculate and display accuracy, confusion matrix, recall, precision, and f_beta**

In [70]:
# perceptron 1 accuracy, confusion matrix, recall, precision, and f_beta on test set
recall_3 = perceptron_1.evaluate(test_set)['recall']
precision_3 = perceptron_1.evaluate(test_set)['precision']
beta = 0.05
f_beta3 = (1 + beta**2) * (recall_3 * precision_3)/((beta**2 * precision_3) + recall_3)

print("Test set recall value for perceptron 1: ", recall_3)
print("Test set precision value for perceptron 1: ", precision_3)
print("Test set f_beta value for perceptron 1: ", f_beta3)
print("Test set set accuracy for first perceptron: ", perceptron_1.evaluate(test_set)['accuracy'])
print("Test set confusion matrix for first perceptron: \n", perceptron_1.evaluate(test_set)['confusion_matrix'])

Test set recall value for perceptron 1:  0.9411764705882353
Test set precision value for perceptron 1:  0.8888888888888888
Test set f_beta value for perceptron 1:  0.8890120548704447
Test set set accuracy for first perceptron:  0.91
Test set confusion matrix for first perceptron: 
 +--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      1       |        1        |   48  |
|      0       |        1        |   6   |
|      0       |        0        |   43  |
|      1       |        0        |   3   |
+--------------+-----------------+-------+
[4 rows x 3 columns]



### Contribution

Students names: Sodiq Akinbolaji and Simranpreet Singh.

Each student worked on all questions separately, then met in school to compare our solutions.
Formatting was done by Sodiq while sharing ideas with Simranpreet during the meeting.

Finally, the submission will be made by Sodiq.