Collect data – small labeled set + large unlabeled set.

Learn features – train on all data with an unsupervised method (e.g., autoencoder).

Add classifier – attach a prediction head and train on labeled data.

Use unlabeled data – apply pseudo‑labels or consistency regularization to improve training.

Evaluate – test on clean labeled validation/test set.

In [1]:
import os, certifi
os.environ['SSL_CERT_FILE'] = certifi.where()

In [2]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models

In [3]:
# Config
LABELED_SAMPLES   = 1000   # how many true labels you keep from training set
CONF_THRESH       = 0.95   # min confidence to accept a pseudo-label
# Higher = more reliable, but fewer pseudo-labels, Lower = more pseudo-labels but more noise
EPOCHS_BASE       = 5 # Number of epochs for initial supervised training
EPOCHS_FINETUNE   = 5 # Number of epochs for fine-tuning with pseudo-labels
BATCH_SIZE        = 128 
SEED              = 42
np.random.seed(SEED); tf.random.set_seed(SEED)

In [4]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() # load data
# Normalize and add channel dimension
x_train = (x_train.astype("float32") / 255.0)[..., None]
x_test  = (x_test.astype("float32") / 255.0)[..., None]

# Split small labeled subset ; rest is unlabeled
idx = np.random.permutation(len(x_train))
lab_idx, unlab_idx = idx[:LABELED_SAMPLES], idx[LABELED_SAMPLES:]
x_lab, y_lab = x_train[lab_idx], y_train[lab_idx]
x_unlab      = x_train[unlab_idx]

def make_cnn():
    m = models.Sequential([
        layers.Conv2D(32, 3, activation='relu', input_shape=(28,28,1)),
        layers.MaxPool2D(),
        layers.Conv2D(64, 3, activation='relu'),
        layers.MaxPool2D(),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    m.compile(optimizer=tf.keras.optimizers.Adam(1e-3),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
    return m

# Step 1: Initial supervised training on small labeled set
base = make_cnn()
base.fit(x_lab, y_lab, epochs=EPOCHS_BASE, batch_size=BATCH_SIZE, 
         validation_data=(x_test, y_test))
base_test_acc = base.evaluate(x_test, y_test, verbose=0)[1]
print(f"Base test accuracy after supervised training on {LABELED_SAMPLES} samples: {base_test_acc:.4f}")

# Step 2: Generate pseudo-labels for unlabeled data
probs = base.predict(x_unlab, batch_size=BATCH_SIZE, verbose=0)
conf  = probs.max(axis=1)
y_pl  = probs.argmax(axis=1)
mask  = conf >= CONF_THRESH

x_pseudo, y_pseudo = x_unlab[mask], y_pl[mask]
print(f"Accepted pseudo-labeled samples: {len(x_pseudo)} / {len(x_unlab)} "
      f"({100*len(x_pseudo)/len(x_unlab):.1f}%)")

# Step 3: Fine-tune model on combined labeled + pseudo-labeled data
x_mix = np.concatenate([x_lab, x_pseudo], axis=0)
y_mix = np.concatenate([y_lab, y_pseudo], axis=0)

# Continue training the same model
finetuen = base
finetuen.fit(x_mix, y_mix, epochs=EPOCHS_FINETUNE, batch_size=BATCH_SIZE, 
             validation_data=(x_test, y_test))
finetune_test_acc = finetuen.evaluate(x_test, y_test, verbose=0)[1]
print(f"Test accuracy after fine-tuning with pseudo-labels: {finetune_test_acc:.4f}")


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Epoch 1/5


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 83ms/step - accuracy: 0.1612 - loss: 2.2301 - val_accuracy: 0.5536 - val_loss: 1.8815
Epoch 2/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 69ms/step - accuracy: 0.6048 - loss: 1.7044 - val_accuracy: 0.7124 - val_loss: 1.1215
Epoch 3/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step - accuracy: 0.7466 - loss: 0.9927 - val_accuracy: 0.7872 - val_loss: 0.6789
Epoch 4/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step - accuracy: 0.8013 - loss: 0.6303 - val_accuracy: 0.8537 - val_loss: 0.4697
Epoch 5/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step - accuracy: 0.8566 - loss: 0.4338 - val_accuracy: 0.8861 - val_loss: 0.3813
Base test accuracy after supervised training on 1000 samples: 0.8861
Accepted pseudo-labeled samples: 26425 / 59000 (44.8%)
Epoch 1/5
[1m215/215[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 17ms/step - accuracy

Result:
Training on just 1 000 true labels gave a baseline test accuracy of 88.6%. After adding 
~26 400 high-confidence pseudo-labels and fine-tuning, accuracy rose to 93.6%. This ~5-point jump shows that leveraging unlabeled data via pseudo-labeling can substantially improve performance when labels are scarce.

Trade-off: quantity vs. quality of pseudo-labels
We accepted 44.8% of the unlabeled pool (threshold ≥ 0.95). A high threshold ensures most pseudo-labels are correct but limits their number. Lowering the threshold would increase sample count but risk introducing noisy labels that may hurt rather than help.

Model confidence matters
The sharply improved accuracy indicates that the model’s high-confidence predictions are reliable. Early in training, some classes may be underrepresented among pseudo-labels; monitoring per-class acceptance can reveal and correct biases.