# Label classifier (3dshapes): data collection

## Task 3

**Author**: Maleakhi A. Wijaya  
**Description**: This notebook contains code used to collect experimentation data. We compare the performance of methods discussed in Rabanset et al. against our proposed CBSD method. The end-to-end task for task 3 is to predict binary clusters containing labels from the original orientation latent values.

In [1]:
# Load utilities functions
%run ../../scripts/constants.py
%run ../../scripts/3dshapes_utils.py
%run ../../scripts/shift_applicator.py
%run ../../scripts/shift_dimensionality_reductor.py
%run ../../scripts/experiment_utils.py
%run ../../scripts/shift_statistical_test.py

In [2]:
## Random seed
SEED = 20
np.random.seed(SEED)
tf.random.set_seed(SEED)

## Load dataset

In [3]:
files_dir = "../../data/3dshapes.h5"
# index 0 = image category
X_train, X_test, y_train, y_test, c_train, c_test = train_test_split_3dshapes(files_dir, 70000, DatasetTask.Task3, 
                                                                              train_size=0.80, class_index=5)

Training samples: 56000
Testing samples: 14000


In [4]:
n_classes = len(np.unique(np.concatenate([y_train, y_test])))
concept_names = SHAPES3D_CONCEPT_NAMES
concept_values = [len(np.unique(np.concatenate([c_train[:, i], c_test[:, i]]))) for i in range(c_train.shape[1])]

# Split training into validation set as well 
X_train, X_valid = X_train[:40000], X_train[40000:]
y_train, y_valid = y_train[:40000], y_train[40000:]
c_train, c_valid = c_train[:40000], c_train[40000:]

In [5]:
# Reshape to appropriate shift input
# It is noteworthy that for efficiency, we represent the images as only 2 dimension
# when we preprocessing (number of instances/ batch size * flatten size).
# When visualising back the image, we need to reshape it back to the original dimension
ORIGINAL_SHAPE = X_test.shape[1:] # constant hold the image original shape
X_test_flatten = deepcopy(X_test.reshape(X_test.shape[0], -1))
X_train_flatten = deepcopy(X_train.reshape(X_train.shape[0], -1))
X_valid_flatten = deepcopy(X_valid.reshape(X_valid.shape[0], -1))

## Dimensionality reduction

We implemented various dimensionality reduction methods, amounting to:
- End to end model (label classifiers/ BBSD)
- Concept bottleneck model (CBSD)
- Principal component analysis (PCA)
- Sparse random projection (SRP)

### End-to-end model

In [7]:
path = "../../models/end_to_end_3dshapes_task3"
# For training and saving
histories, end_to_end_model = end_to_end_neural_network(n_classes, Dataset.SHAPES3D, 
                         X_train, y_train, X_valid, y_valid, path)

# For loading
end_to_end_model = tf.keras.models.load_model(path)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: ../../models/end_to_end_3dshapes_task3/assets


In [8]:
# Evaluate model
y_pred = end_to_end_model.predict(X_test)
y_pred = np.argmax(y_pred, axis=1)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      6622
           1       1.00      1.00      1.00      7378

    accuracy                           1.00     14000
   macro avg       1.00      1.00      1.00     14000
weighted avg       1.00      1.00      1.00     14000



### Concept bottleneck model

**Input to Concept**

In [7]:
path = "../../models/multitask_3dshapes"
# For training and saving
histories, mt_model = multitask_model(Dataset.SHAPES3D,
                                            X_train, c_train,
                                            X_valid, c_valid, path, concept_values)

# For loading
mt_model = tf.keras.models.load_model(path)

In [8]:
# Evaluate model
for i, pred in enumerate(mt_model.predict(X_test)):
    print("*"*20, f"Model: {SHAPES3D_CONCEPT_NAMES[i]}", "*"*20)
    c_truth = c_test[:, i]
    c_pred = np.argmax(pred, axis=1)
    
    print(classification_report(c_truth, c_pred))
    print("\n\n")

******************** Model: color ********************
              precision    recall  f1-score   support

           0       0.99      1.00      1.00      2386
           1       1.00      1.00      1.00      2451
           2       1.00      1.00      1.00      2442
           3       1.00      1.00      1.00      2439
           4       1.00      1.00      1.00      2432

    accuracy                           1.00     12150
   macro avg       1.00      1.00      1.00     12150
weighted avg       1.00      1.00      1.00     12150




******************** Model: shape ********************
              precision    recall  f1-score   support

           0       0.88      0.92      0.90      1251
           1       0.92      0.87      0.90      1213
           2       0.92      0.92      0.92      1173
           3       0.90      0.92      0.91      1227
           4       0.90      0.89      0.90      1219
           5       0.93      0.90      0.92      1227
           6       

**Concept to Output**

In [11]:
# Build and train model. For simplicity, we used logistic regression
# although can be substituted using other model.
com = LogisticRegression()
com.fit(c_train, y_train)

LogisticRegression()

In [12]:
y_test_pred = com.predict(c_test)
print(classification_report(y_test_pred, y_test))
print(confusion_matrix(y_test_pred, y_test))

              precision    recall  f1-score   support

           0       0.56      0.54      0.55      6201
           1       0.54      0.56      0.55      5949

    accuracy                           0.55     12150
   macro avg       0.55      0.55      0.55     12150
weighted avg       0.55      0.55      0.55     12150

[[3329 2872]
 [2643 3306]]


### Principal component analysis

In [13]:
pca, n_components = principal_components_analysis(X_train_flatten)
print(f"The number of components to explain 80% of variance is {n_components}.")

The number of components to explain 80% of variance is 2.


### Sparse random projection

In [14]:
srp, n_components = sparse_random_projection(X_train_flatten)
print(f"The number of components to explain 80% of variance is {n_components}.")

The number of components to explain 80% of variance is 2.


## Data collection

This section performs various experiments to collect data. We consider various dimensionality reduced methods discussed in the paper and thesis.

### PCA

In [6]:
method = DimensionalityReductor.PCA
model = pca
method_str = "PCA"

#### Knockout shift

In [7]:
shift_type = ShiftType.Knockout
shift_type_params = {"cl": MAJORITY}
shift_str = "ko_task3"

In [8]:
dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)

In [9]:
# Save file
save_result(shift_str, method_str, dict_result, True, "3dshapes")

Saving successfully.


### SRP

In [36]:
method = DimensionalityReductor.SRP
model = srp
method_str = "SRP"

#### Knockout shift

In [7]:
shift_type = ShiftType.Knockout
shift_type_params = {"cl": MAJORITY}
shift_str = "ko_task3"

In [8]:
dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)

In [9]:
# Save file
save_result(shift_str, method_str, dict_result, True, "3dshapes")

Saving successfully.


### BBSDs

In [36]:
method = DimensionalityReductor.BBSDs
model = end_to_end_model
method_str = "BBSDs"

#### Knockout shift

In [6]:
shift_type = ShiftType.Knockout
shift_type_params = {"cl": MAJORITY}
shift_str = "ko_task3"

In [6]:
dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)

In [27]:
# Save file
save_result(shift_str, method_str, dict_result, True, "3dshapes")

Saving successfully.


#### Concept shifts

In [None]:
shift_type = ShiftType.Concept

list_shift_str = [
    "floor_task3",
    "wall_task3",
    "object_scale_task3",
]

list_shift_type_params = [
    {"cl": MAJORITY, "concept_idx": 0}, # scale is index 2 in the concept names
    {"cl": MAJORITY, "concept_idx": 1},
    [{"cl": MAJORITY, "concept_idx": 2}, {"cl": MAJORITY, "concept_idx": 3}],
]

In [None]:
for shift_str, shift_type_params in tqdm(zip(list_shift_str, list_shift_type_params)):
    dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)
    
    # Save
    save_result(shift_str, method_str, dict_result, True, "3dshapes")

#### Image shifts

In [None]:
list_shift = [
    ShiftType.Rotation,
    ShiftType.Shear,
    ShiftType.Flip,
    ShiftType.All
]

list_shift_str = [
    "rotation_task3",
    "shear_task3",
    "flip_task3",
    "all_task3"
]

shift_type_param = {"orig_dims": ORIGINAL_SHAPE}

In [None]:
for shift_str, shift_type in tqdm(zip(list_shift_str, list_shift)):
    dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_param, n_exp=50, n_std=2)
    
    # Save
    save_result(shift_str, method_str, dict_result, True, "3dshapes")

#### Gaussian shift

In [None]:
shift_type = ShiftType.Gaussian
shift_type_params = None
shift_str = "gaussian_task3"

In [None]:
dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)

In [None]:
# Save file
save_result(shift_str, method_str, dict_result, True, "3dshapes")

### BBSDh

In [36]:
method = DimensionalityReductor.BBSDh
model = end_to_end_model
method_str = "BBSDh_task3"

#### Knockout shift

In [6]:
shift_type = ShiftType.Knockout
shift_type_params = {"cl": MAJORITY}
shift_str = "ko_task3"

In [6]:
dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)

In [27]:
# Save file
save_result(shift_str, method_str, dict_result, True, "3dshapes")

Saving successfully.


#### Concept shifts

In [None]:
shift_type = ShiftType.Concept

list_shift_str = [
    "floor_task3",
    "wall_task3",
    "object_scale_task3",
]

list_shift_type_params = [
    {"cl": MAJORITY, "concept_idx": 0}, # scale is index 2 in the concept names
    {"cl": MAJORITY, "concept_idx": 1},
    [{"cl": MAJORITY, "concept_idx": 2}, {"cl": MAJORITY, "concept_idx": 3}],
]

In [None]:
for shift_str, shift_type_params in tqdm(zip(list_shift_str, list_shift_type_params)):
    dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)
    
    # Save
    save_result(shift_str, method_str, dict_result, True, "3dshapes")

#### Image shifts

In [None]:
list_shift = [
    ShiftType.Rotation,
    ShiftType.Shear,
    ShiftType.Flip,
    ShiftType.All
]

list_shift_str = [
    "rotation_task3",
    "shear_task3",
    "flip_task3",
    "all_task3"
]

shift_type_param = {"orig_dims": ORIGINAL_SHAPE}

In [None]:
for shift_str, shift_type in tqdm(zip(list_shift_str, list_shift)):
    dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_param, n_exp=50, n_std=2)
    
    # Save
    save_result(shift_str, method_str, dict_result, True, "3dshapes")

#### Gaussian shift

In [None]:
shift_type = ShiftType.Gaussian
shift_type_params = None
shift_str = "gaussian_task3"

In [None]:
dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)

In [None]:
# Save file
save_result(shift_str, method_str, dict_result, True, "3dshapes")

### CBSDs

In [36]:
method = DimensionalityReductor.CBSDs
model = mt_model
method_str = "CBSDs"

#### Knockout shift

In [7]:
shift_type = ShiftType.Knockout
shift_type_params = {"cl": MAJORITY}
shift_str = "ko_task3"

In [8]:
dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)

In [9]:
# Save file
save_result(shift_str, method_str, dict_result, True, "3dshapes")

Saving successfully.


### CBSDh

In [36]:
method = DimensionalityReductor.CBSDh
model = mt_model
method_str = "CBSDh"

#### Knockout shift

In [7]:
shift_type = ShiftType.Knockout
shift_type_params = {"cl": MAJORITY}
shift_str = "ko_task3"

In [8]:
dict_result = main_experiment(model, method, X_valid, y_valid,
                             c_valid, X_test_flatten, y_test, c_test,
                             shift_type, ORIGINAL_SHAPE, n_classes,
                             concept_names, concept_values, 
                             shift_type_params, n_exp=50, n_std=2)

In [9]:
# Save file
save_result(shift_str, method_str, dict_result, True, "3dshapes")

Saving successfully.
