This notebook was created at 21:42 16/10/20

Multi-class classification of liquid crystal phases using deep learning techniques

This notebook is the first attempt at classifying mutliple liquid crystal phases using deep learning - specifically a CNN architecture.
The aim is to advance upon previous notebooks that attempted to preform various binary phase classification from LC textures.

In [76]:
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Model
from keras.layers import Input, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D
from keras.layers import AveragePooling2D, MaxPooling2D, Dropout, GlobalMaxPooling2D, GlobalAveragePooling2D
from keras.layers.experimental.preprocessing import Rescaling
from keras.preprocessing import image_dataset_from_directory
from sklearn.metrics import confusion_matrix

Load in the data.

In [68]:
train_directory = "C:/Users/Jason/Documents/University/Year_4/MPhys_Project(s)/Liquid_crystals-machine_learning/LiquidCrystalMachineLearning/Images_categorized/Train"
test_directory = "C:/Users/Jason/Documents/University/Year_4/MPhys_Project(s)/Liquid_crystals-machine_learning/LiquidCrystalMachineLearning/Images_categorized/Test"
image_size = (368,640)

train_dataset = image_dataset_from_directory(train_directory,
                            labels="inferred",
                            label_mode="categorical",
                            color_mode="rgb",
                            batch_size=64,
                            image_size=image_size,
                            shuffle=True
                        )
val_dataset = image_dataset_from_directory(test_directory,
                            labels="inferred",
                            label_mode="categorical",
                            color_mode="rgb",
                            batch_size=64,
                            image_size=image_size,
                            shuffle=True
                        )

Found 462 files belonging to 5 classes.
Found 64 files belonging to 5 classes.


Let's see if the files imported as expected.

In [69]:
print(train_dataset.element_spec)
print(train_dataset.class_names)
for data, labels in train_dataset:
    print(data.shape)
    print(data.dtype)
    print(labels.shape)
    print(labels.dtype)

(TensorSpec(shape=(None, 368, 640, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 5), dtype=tf.float32, name=None))
['0.Isotropic', '1.Nematic', '2.SmA', '3.SmF', '4.Cholesteric']
(64, 368, 640, 3)
<dtype: 'float32'>
(64, 5)
<dtype: 'float32'>
(64, 368, 640, 3)
<dtype: 'float32'>
(64, 5)
<dtype: 'float32'>
(64, 368, 640, 3)
<dtype: 'float32'>
(64, 5)
<dtype: 'float32'>
(64, 368, 640, 3)
<dtype: 'float32'>
(64, 5)
<dtype: 'float32'>
(64, 368, 640, 3)
<dtype: 'float32'>
(64, 5)
<dtype: 'float32'>
(64, 368, 640, 3)
<dtype: 'float32'>
(64, 5)
<dtype: 'float32'>
(64, 368, 640, 3)
<dtype: 'float32'>
(64, 5)
<dtype: 'float32'>
(14, 368, 640, 3)
<dtype: 'float32'>
(14, 5)
<dtype: 'float32'>


In [70]:
print(val_dataset.element_spec)
print(val_dataset.class_names)
for data, labels in val_dataset:
    print(data.shape)
    print(data.dtype)
    print(labels.shape)
    print(labels.dtype)

(TensorSpec(shape=(None, 368, 640, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 5), dtype=tf.float32, name=None))
['0.Isotropic', '1.Nematic', '2.SmA', '3.SmF', '4.Cholesteric']
(64, 368, 640, 3)
<dtype: 'float32'>
(64, 5)
<dtype: 'float32'>


Let's define out pipeline.

In [71]:
image_shape = (image_size[0], image_size[1], 3)
X_inputs = Input(shape = image_shape)
# Rescale images to have values in range [0,1]
X = Rescaling(scale = 1/255)(X_inputs)

# Apply convolutional and pooling layers
X = Conv2D(filters=32, kernel_size=(3,3), activation="relu")(X)
X = MaxPooling2D(pool_size=(3,3))(X)
X = Conv2D(filters=32, kernel_size=(3,3), activation="relu")(X)
X = MaxPooling2D(pool_size=(3,3))(X)

# Apply fully connected layer
X = Flatten()(X)
X = Dense(units=128, activation="relu")(X)
X = Dense(units=64, activation="relu")(X)

# Output layer
num_classes = 5
X_outputs = Dense(units=num_classes, activation="softmax")(X)

model = Model(inputs = X_inputs, outputs = X_outputs)

Let's see what this model looks like.

In [72]:
model.summary()

Model: "functional_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         [(None, 368, 640, 3)]     0         
_________________________________________________________________
rescaling_3 (Rescaling)      (None, 368, 640, 3)       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 366, 638, 32)      896       
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 122, 212, 32)      0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 120, 210, 32)      9248      
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 40, 70, 32)        0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 89600)            

Now we need to compile, train and test the model.

In [73]:
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

In [74]:
model.fit(train_dataset, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x208ca736748>

Let's see how the model does on unseen data.

In [75]:
loss, acc = model.evaluate(val_dataset)



Let's see the confusion matrix on our predictions to see where our model is struggling.

In [80]:
predictions = model.predict(val_dataset)
y_pred = np.argmax(predictions, axis = 1)

In [88]:
# Get true labels
y_true = np.argmax(np.concatenate([labels for data, labels in val_dataset], axis=0), axis=1)
print(y_true)
print(y_pred)

print("Confusion matrix:")
print(confusion_matrix(y_true=y_true, y_pred=y_pred, normalize="true"))

[2 4 3 3 3 4 2 4 4 4 1 4 4 3 4 2 1 2 0 4 1 4 4 3 1 1 4 2 1 2 2 1 1 1 0 1 2
 2 3 3 2 0 0 2 4 3 2 1 0 4 2 4 3 2 4 2 4 2 1 0 0 4 4 2]
[4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 0 4 4 4 4 4 4 4
 4 0 4 4 4 4 4 4 4 4 4 0 4 4 4 4 4 4 0 0 4 4 1 4 4 4 4]
Confusion matrix:
[[0.14285714 0.14285714 0.         0.         0.71428571]
 [0.         0.         0.         0.         1.        ]
 [0.11764706 0.         0.         0.         0.88235294]
 [0.11111111 0.         0.         0.         0.88888889]
 [0.05263158 0.05263158 0.         0.         0.89473684]]


An imbalnced training dataset (many more cholesteric images than other phases) is one major flaw with the current model.