# INFO-4604/5604 HW5: Semi-Supervised Learning 
## Deadline: Friday, December 14, 6:00pm MT

### Solution by: *Ben Niu* (and list any partners)

In this assignment you will implement the self-training algorithm for semi-supervised learning.

### What to hand in

You will submit the assignment on Canvas. Submit a single Jupyter notebook named `hw5lastname.ipynb`, where lastname is replaced with your last name.

If you have any output that is not part of your notebook, you may submit that as a separate document, in a single PDF named `hw5lastname.pdf`. 

When writing code in this notebook, you are encouraged to create additional cells in whatever way makes the presentation more organized and easy to follow. You are allowed to import additional Python libraries.

### Submission policies

- **Collaboration:** You are allowed to work with up to 3 people besides yourself. You are still expected to write up your own solution. Each individual must turn in their own submission, and list your collaborators after your name.
- **Late submissions:** We allow each student to use up to 5 late days over the semester. You have late days, not late hours. This means that if your submission is late by any amount of time past the deadline, then this will use up a late day. If it is late by any amount beyond 24 hours past the deadline, then this will use a second late, and so on. Once you have used up all late days, late assignments will be given at most 80% credit after one day and 60% credit after two days.

## Dataset

You will use a variant of the heart disease dataset from the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/datasets/heart+Disease). The data was taken from the [University of Granada's datasets for self-labeled learning](http://sci2s.ugr.es/SelfLabeled).

The first 13 columns are features (Age, Sex, ChestPainType, RestBloodPressure, SerumCholestoral, FastingBloodSugar, ResElectrocardiographic, MaxHeartRate, ExerciseInduced, Oldpeak, Slope, MajorVessels, Thal).

The last column is the class label, either $1$ or $2$ indicating the presence of heart disease in the patient if the label is known, or "unlabeled" otherwise. Only 10% of training instances are labeled in this dataset.

Run the code below to load the training data (separated as labeled or unlabeled) and test data.

In [1]:
import pandas as pd
import numpy 

df_train = pd.read_csv('http://cmci.colorado.edu/classes/INFO-4604/data/heart_train.csv', header=None)
df_test = pd.read_csv('http://cmci.colorado.edu/classes/INFO-4604/data/heart_test.csv', header=None)

df_labeled = df_train.loc[df_train[13] != 'unlabeled']
df_unlabeled = df_train.loc[df_train[13] == 'unlabeled']

Y_train_labeled = df_labeled.iloc[0:, -1].values.astype('int')
X_train_labeled = df_labeled.iloc[0:, :-1].values
X_train_unlabeled = df_unlabeled.iloc[0:, :-1].values

Y_test = df_test.iloc[0:, -1].values.astype('int')
X_test = df_test.iloc[0:, :-1].values

## Problem 1: Self-Training [16 points]

Implement the basic self-training algorithm introduced in lecture. Specifically: 

1. Train a classifier on the labeled training data only.
2. Apply the classifier to the unlabeled training data and treat the classifications as labels.
4. Combine the original labeled data with the classifications on the unlabeled dataset to create a new training set.
5. Train a second classifier on the new training set.

You will experiment with four types of classifiers: [`LogisticRegression`](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html), [`DecisionTreeClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html), [`MLPClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html), [`GaussianNB`](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html). The first three you have used in previous assignments. `GaussianNB` is `sklearn`'s implementation of Naive Bayes which uses a Gaussian (normal) distribution to model the probabilities of continuous-valued features.

Construct the classifiers using the default parameters. Unlike in the previous two assignments, you won't use cross-validation and you won't perform any hyperparameter tuning. You should just build the self-trained classifier on the training data and then evaluate it on the test data. 

#### Deliverable 1.1: Train each of the four classifiers on the labeled data only. Calculate the test accuracy for each classifier. These will be your baseline accuracies.

[results go here]
LogisticRegression: Test accuracy: 0.888889
DecisionTreeClassifier:Test accuracy: 0.740741
MLPClassifier:Test accuracy: 0.888889
GaussianNB:Test accuracy: 0.888889




#### Deliverable 1.2: Implement the self-training algorithm as described, and calculate the test accuracies for each of the four classifiers. How do the results compare to the accuracies in 1.1?

[results go here]
LogisticRegression: Test accuracy: 0.925926
DecisionTreeClassifier:Test accuracy: 0.740741
MLPClassifier:Test accuracy: 0.888889
GaussianNB:Test accuracy: 0.888889


#### Deliverable 1.3: You should have found that the multilayer perceptron (MLP) had poor accuracy in 1.1, which drops even more after self-training in 1.2. (a) Why do you think MLP performs poorly on this dataset, when it performed the best in HW3? (b) Why do you think performance dropped after self-training with MLP?

[response goes here]
(a)Because MLP is the most complicated classifier among those classifiers, and MLP need more data to accurately train the classifier. However, the dataset for this assignment is a small dataset, that maybe the reason why it performs poorly on this dataset. 
(b)Because the self_training use the dataset from the previous trained dataset, if the first classifier is not accurately trained, the second MLP classifier used inaccurately predicted dataset to train itself, therefore performance dropped

## Probelm 1.1

In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# code for 1.1 here


In [3]:
classifier = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial')
classifier.fit(X_train_labeled, Y_train_labeled)
print("Training accuracy: %0.6f" % accuracy_score(Y_train_labeled, classifier.predict(X_train_labeled)))
print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))

Training accuracy: 1.000000
Test accuracy: 0.888889




In [4]:
Ylog = classifier.predict(X_train_unlabeled)
Ylog

array([1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1,
       1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2,
       1, 2, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1,
       2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1,
       1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2,
       1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1,
       2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1,
       1, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2,
       1, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2])

In [5]:
#from sklearn import preprocessing
#le = preprocessing.LabelEncoder()
#X_train_labeled1= X_train_labeled.astype(int)
#Y_train_labeled1= Y_train_labeled.astype(int)

classifier = DecisionTreeClassifier()
classifier.fit(X_train_labeled, Y_train_labeled)
print("Training accuracy: %0.6f" % accuracy_score(Y_train_labeled, classifier.predict(X_train_labeled)))
print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))
Ytree = classifier.predict(X_train_unlabeled)
Ytree

Training accuracy: 1.000000
Test accuracy: 0.777778


array([1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 2, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1,
       2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
       2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1,
       2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2,
       1, 1, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1,
       2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2,
       2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2,
       2, 2, 1, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1])

In [6]:
classifier = MLPClassifier(hidden_layer_sizes= (100,) , random_state=None)
classifier.fit(X_train_labeled, Y_train_labeled)
print("Training accuracy: %0.6f" % accuracy_score(Y_train_labeled, classifier.predict(X_train_labeled)))
print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))
Ymlp = classifier.predict(X_train_unlabeled)
Ymlp

Training accuracy: 1.000000
Test accuracy: 0.888889




array([1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2,
       1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2,
       1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 1,
       2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1,
       1, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,
       1, 1, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2,
       1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 1, 2, 1,
       2, 1, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2,
       1, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 1, 2,
       1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 1, 1, 2])

In [7]:
classifier = GaussianNB()
classifier.fit(X_train_labeled, Y_train_labeled)
print("Training accuracy: %0.6f" % accuracy_score(Y_train_labeled, classifier.predict(X_train_labeled)))
print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))
YGa = classifier.predict(X_train_unlabeled)

Training accuracy: 0.875000
Test accuracy: 0.888889


## Problem 1.2

LogisticRegression

In [8]:
# code for 1.2 here
import numpy as np
Y_newlog = np.concatenate((Y_train_labeled,Ylog),axis=0)
X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)

classifier = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial')
classifier.fit(X_new,Y_newlog )
print("Training accuracy: %0.6f" % accuracy_score(Y_newlog, classifier.predict(X_new)))
print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))

Training accuracy: 1.000000
Test accuracy: 0.925926




In [9]:
Y_newlog

array([2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1,
       1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1,
       1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,
       1, 2, 1, 2, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1,
       1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 2, 2,
       2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 1, 2, 1,
       2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1,
       2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2,
       1, 1, 1, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 1,
       1, 2, 1, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1,
       2])

Decision Tree

In [10]:
import numpy as np
Y_newtree = np.concatenate((Y_train_labeled,Ytree),axis=0)
X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)
classifier = DecisionTreeClassifier()
classifier.fit(X_new,Y_newtree )
print("Training accuracy: %0.6f" % accuracy_score(Y_newtree, classifier.predict(X_new)))
print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))


Training accuracy: 1.000000
Test accuracy: 0.777778


MLP

In [11]:
Y_newmlp = np.concatenate((Y_train_labeled,Ymlp),axis=0)
X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)

classifier = MLPClassifier(hidden_layer_sizes= (100,) , random_state=None)
classifier.fit(X_new, Y_newmlp)
print("Training accuracy: %0.6f" % accuracy_score(Y_newmlp, classifier.predict(X_new)))
print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))


Training accuracy: 0.946502
Test accuracy: 0.740741




GaussianNB

In [12]:
Y_newGa = np.concatenate((Y_train_labeled,YGa),axis=0)
X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)

classifier = GaussianNB()
classifier.fit(X_new, Y_newGa)
print("Training accuracy: %0.6f" % accuracy_score(Y_newGa, classifier.predict(X_new)))
print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))


Training accuracy: 0.954733
Test accuracy: 0.888889


The classifier used for initially labeling the data (steps 1-2 of the algorithm above) does not necessarily need to be the same as the second classifier trained on the labels produced by the first classifier (step 4). For example, when label propagation was introduced in lecture, we learned that sometimes it is used for initially inferring labels, but then an additional classifier is trained on the inferred labels. Even though we are not using label propagation in this assignment, we will use the same idea of training a classifier that is not the same classifier as used to originally label the data.

For this next task, you should re-run your self-training algorithm where you try different combinations of classifiers to see how this affects test accuracy.

#### Deliverable 1.4: Repeat 1.2 for all $4 \times 3$ combinations of classifiers, such that the first classifier is different from the second classifier.

[results go here]

In [13]:
# code for 1.4 here
X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)
Y = [Ymlp,Ytree,YGa]
actest = []
for d in Y:
    Y_newlog4 = np.concatenate((Y_train_labeled,d),axis=0)


    classifier = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial')
    classifier.fit(X_new,Y_newlog4 )
    #print("Training accuracy: %0.6f" % accuracy_score(Y_newlog4, classifier.predict(X_new)))
    #print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))
    print()
    actest.append(accuracy_score(Y_test, classifier.predict(X_test)))
   
actest








[0.8148148148148148, 0.8518518518518519, 0.9259259259259259]

In [14]:
YMLP = [Ytree,YGa,Ylog]
actestMLP = []
for i in YMLP:
    Y_newmlp4 = np.concatenate((Y_train_labeled,i),axis=0)
    X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)

    classifier = MLPClassifier(hidden_layer_sizes= (100,) , random_state=None)
    classifier.fit(X_new, Y_newmlp4)
    #print("Training accuracy: %0.6f" % accuracy_score(Y_newmlp4, classifier.predict(X_new)))
    #print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))
    actestMLP.append(accuracy_score(Y_test, classifier.predict(X_test)))
actestMLP



[0.9259259259259259, 0.8888888888888888, 0.9629629629629629]

In [15]:
YGA = [Ymlp,Ytree,Ylog]
actestGA = []
for g in YGA:

    Y_newGa4 = np.concatenate((Y_train_labeled,g),axis=0)
    X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)

    classifier = GaussianNB()
    classifier.fit(X_new, Y_newGa)
    #print("Training accuracy: %0.6f" % accuracy_score(Y_newGa4, classifier.predict(X_new)))
    #print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))
    actestGA.append(accuracy_score(Y_test, classifier.predict(X_test)))
actestGA

[0.8888888888888888, 0.8888888888888888, 0.8888888888888888]

In [16]:
YTree = [Ymlp,Ytree,Ylog]
actestTree = []
for i in YTree:
    Y_newtree4 = np.concatenate((Y_train_labeled,i),axis=0)
    X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)
    classifier = DecisionTreeClassifier()
    classifier.fit(X_new,Y_newtree4 )
    #print("Training accuracy: %0.6f" % accuracy_score(Y_newtree, classifier.predict(X_new)))
    #print("Test accuracy: %0.6f" % accuracy_score(Y_test, classifier.predict(X_test)))
    actestTree.append(accuracy_score(Y_test, classifier.predict(X_test)))
actestTree

[0.7037037037037037, 0.7777777777777778, 0.8518518518518519]

# (Optional) Problem 2: Thresholding [+6 EC points]

Modify your self-training algorithm using the high-confidence variant described in lecture. Specifically, when you add classifications to the training set, only include instances where the classifier had a probability above some threshold. Recall from HW2 that you can obtain classifier probabilities using the `predict_proba` function. This algorithm can be repeated for multiple iterations.


#### Deliverable 2.1: Implement this version of self-training using a threshold probability that can be specified. Repeat problem 1.2 using this new version on each of the four classifiers. Calculate the test accuracy after running this algorithm for 1, 2, and 3 iterations. Experiment with the following probability thresholds: $0.99, 0.9, 0.8, 0.6$.

[results go here]


In [17]:
# code for 2.1 here
def threshold(probs, tau):
    return np.where(probs[:,0] > tau,1,2)

tau= [0.6,0.8,0.9,0.99]
actest = []
for t in tau:
    
#Y_newlog = np.concatenate((Y_train_labeled,Ylog),axis=0)
#X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)
    classifier = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial')
    classifier.fit(X_new,Y_newlog)
    actest.append(t)
    actest.append("Test accuracy for LogisticRegression Model: %0.6f" %accuracy_score(Y_test, threshold(classifier.predict_proba(X_test),t)))
    Y_newlog = threshold(classifier.predict_proba(X_new),t) 
actest



[0.6,
 'Test accuracy for LogisticRegression Model: 0.925926',
 0.8,
 'Test accuracy for LogisticRegression Model: 0.851852',
 0.9,
 'Test accuracy for LogisticRegression Model: 0.851852',
 0.99,
 'Test accuracy for LogisticRegression Model: 0.740741']

In [18]:
Y_newlog

array([2, 2, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1,
       1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1,
       2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1,
       1, 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1,
       1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
       2, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1,
       1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1,
       2, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1,
       2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2,
       1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 1,
       1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 1,
       2])

In [19]:
tau= [0.6,0.8,0.9,0.99]
actest = []
for t in tau:
#Y_newlog = np.concatenate((Y_train_labeled,Ylog),axis=0)
#X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)
    classifier = DecisionTreeClassifier()
    classifier.fit(X_new,Y_newtree)
    actest.append(t)
    actest.append("Test accuracy for DecisionTreeClassifier: %0.6f" %accuracy_score(Y_test, threshold(classifier.predict_proba(X_test),t)))
    Y_newtree = threshold(classifier.predict_proba(X_new),t)
actest

[0.6,
 'Test accuracy for DecisionTreeClassifier: 0.777778',
 0.8,
 'Test accuracy for DecisionTreeClassifier: 0.777778',
 0.9,
 'Test accuracy for DecisionTreeClassifier: 0.777778',
 0.99,
 'Test accuracy for DecisionTreeClassifier: 0.777778']

In [20]:
tau= [0.6,0.8,0.9,0.99]
actest = []
for t in tau:
#Y_newlog = np.concatenate((Y_train_labeled,Ylog),axis=0)
#X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)
    classifier = MLPClassifier()
    classifier.fit(X_new,Y_newmlp)
    actest.append(t)
    actest.append("Test accuracy for MLPClassifier: %0.6f" %accuracy_score(Y_test, threshold(classifier.predict_proba(X_test),t)))
    Y_newmlp = threshold(classifier.predict_proba(X_new),t)
actest



[0.6,
 'Test accuracy for MLPClassifier: 0.777778',
 0.8,
 'Test accuracy for MLPClassifier: 0.777778',
 0.9,
 'Test accuracy for MLPClassifier: 0.703704',
 0.99,
 'Test accuracy for MLPClassifier: 0.629630']

In [21]:
tau= [0.6,0.8,0.9,0.99]
actest = []
for t in tau:
#Y_newlog = np.concatenate((Y_train_labeled,Ylog),axis=0)
#X_new = np.concatenate((X_train_labeled,X_train_unlabeled),axis=0)
    classifier = GaussianNB()
    classifier.fit(X_new,Y_newGa)
    actest.append(t)
    actest.append("Test accuracy for GaussianNB: %0.6f" %accuracy_score(Y_test, threshold(classifier.predict_proba(X_test),t)))
    Y_newGa = threshold(classifier.predict_proba(X_new),t)
actest

[0.6,
 'Test accuracy for GaussianNB: 0.888889',
 0.8,
 'Test accuracy for GaussianNB: 0.888889',
 0.9,
 'Test accuracy for GaussianNB: 0.925926',
 0.99,
 'Test accuracy for GaussianNB: 0.740741']