# Question 2

In this question we will:

- Compute GLCM and LBP texture descriptors for a training dataset of texture images.
- Compute GLCM and LBP texture descriptors for a test dataset of texture images.
- Train a K-nearest-neighbours (KNN) classifier using the texture descriptors extracted from training images.
- Classify the texture descriptors from the test dataset using the KNN classifier.


# Step 1: Compute the texture descriptions for the training images.

For each training image, calculate a vector of GLCM features.  Which GLCM features and the set of displacements you choose to you use are up to you (note that displacements for `skimage.feature.graycomatrix()` need to be specified by distances and angles in **radians**).  Experiment with different combinations of displacements and features to obtain the best possible classification rate.  Use conservative choices to begin with until everything is working, then come back and experiemnt.  As described in the Topic 10 lecture notes, use `skimage.feature.graycomatrix()` and `skimage.feature.graycoprops()` to calculate GLCM features.  You'll probably want to use `normed=True` with `graycomatrix()`.  Your GLCM features should be stored as a 120-row array by m-element array, (m will depend on how many different features and displacements you used and whether or not you combine values for different displacements or not, e.g., by taking their mean).  

_Hint: Pay close attention to the format of the return values of  `graycomatrix()` and `graycoprops()`._

For each training image, calculate the rotationally invariant LBP features using `skiamge.feature.local_binary_pattern()`.  You can experiment with parameters `P` and `R` to get a good classification rate, but probably `P=8` and `R=1` are good enough.   For the `method` parameter, use `'uniform'` which gives you the rotationally-invariant uniform LBP variant we talked about in class.   Remember that `skiamge.feature.local_binary_pattern()` returns an "LBP Image", which is an image in which, when P=8, the pixel value is between 0 and 9, and corresponds to one of the ten possible pattern labels.  It's up to you to turn the "LBP Image" into a 10-bin histogram, which serves as the feature vector for that image (you can use `numpy.histogram()` for this but again remember to specify `bins` and `range` parameters, and that it returns two things, and you only need the first one). 

Addionally, calculate the LBP variance feature again using `skimage.feature.local_binary_pattern()` but use `method='var'` instead.  This is the VAR feature we saw in class.  Use the same P and R as before.  Build a 16-bin histogram of the resulting 'LBP-VAR' image; use `range=(0,7000)` with `numpy.hisotgram()` (this is not quite "correct", but it's good enough).  Concatenate these with the rotationally invariant LBP features so that you have a 26-element feature vector for each training image.   These should be stored as a 120-row, 26-column array (26 columns assuming P=8).

You can do this all in one loop which builds both feature arrays.

In [357]:
# Write your code here.
import numpy as np
import skimage.io as io
import skimage.feature as feature
import os as os
import pandas as pd

# Read training images
brodatztraining_path = '/u1/cmpt487-819/data/asn5/brodatztraining/'
brodatztraining_files = pd.read_csv("/u1/cmpt487-819/data/asn5/brodatztraining.csv", header=None)
brodatztraining_file_list = brodatztraining_files[0].tolist()

glcm_training_feature_vector = []
lbp_uniform_training_feature_vector = []
lbp_variance_training_feature_vector = []
properties = ['contrast', 'homogeneity', 'energy', 'correlation']

distances = [1, 2,5,10]  
angles = [0,np.pi/4, np.pi/2, 3*np.pi/4]

for i in range(len(brodatztraining_file_list)):
    image_path = os.path.join(brodatztraining_path, brodatztraining_file_list[i])   
    image = io.imread(image_path)
    
    glcm = feature.graycomatrix(image, distances=distances, levels=256, symmetric=True, angles=angles, normed=True)
    glcm_features = []
    for prop in properties:
        glcm_features.extend(feature.graycoprops(glcm, prop).mean(axis=1)) 
    glcm_training_feature_vector.append(glcm_features)
    
    

    lbp = feature.local_binary_pattern(image, P=24, R=3, method='uniform')
    lbp_hist, _ = np.histogram(lbp, bins=np.arange(11), range=(0, 1))
    lbp_hist = lbp_hist / lbp_hist.sum()
    lbp_uniform_training_feature_vector.append(lbp_hist)
    
    
    lbp_var = feature.local_binary_pattern(image, P=8, R=1, method='var')
    lbp_var_hist, _ = np.histogram(lbp_var, bins=16, range=(0, 7000)) 
    lbp_var_hist = lbp_var_hist / lbp_var_hist.sum()
    lbp_variance_training_feature_vector.append(lbp_var_hist)
    
    
    print(f"\rIteration {i + 1} of {len(brodatztraining_file_list)}", end="")

glcm_training_feature_vector = np.array(glcm_training_feature_vector)
lbp_uniform_training_feature_vector = np.array(lbp_uniform_training_feature_vector)
lbp_variance_training_feature_vector = np.array(lbp_variance_training_feature_vector)

print("\nFeature Vector Shape:", glcm_training_feature_vector.shape)
print("LBP Uniform Feature Vector Shape:", lbp_uniform_training_feature_vector.shape)
print("LBP Variance Feature Vector Shape:", lbp_variance_training_feature_vector.shape)

Iteration 120 of 120
Feature Vector Shape: (120, 16)
LBP Uniform Feature Vector Shape: (120, 10)
LBP Variance Feature Vector Shape: (120, 16)


# Step 2: Compute Test Image Features

Compute the exact same features as you did in step 1 for each of the test images.  Store them in the same way (these arrays will just have more rows, specifically 320 rows, one for each testing sample). For GLCM you'll probably have trouble beating 65% classification rate.  For LBP you should be able to get 95% or better.

In [358]:
# Write your code here.  
brodatztesting_path = '/u1/cmpt487-819/data/asn5/brodatztesting/'
brodatztesting_files = pd.read_csv("/u1/cmpt487-819/data/asn5/brodatztesting.csv", header=None)
brodatztesting_file_list = brodatztesting_files[0].tolist()

glcm_testing_feature_vector = []
lbp_uniform_testing_feature_vector = []
lbp_variance_testing_feature_vector = []
properties = ['contrast', 'homogeneity', 'energy', 'correlation']

distances = [1, 2, 5, 10]
angles = [0,np.pi/4, np.pi/2, 3*np.pi/4]

for i in range(len(brodatztesting_file_list)):
    image_path = os.path.join(brodatztesting_path, brodatztesting_file_list[i])
    image = io.imread(image_path)
    
    glcm = feature.graycomatrix(image, distances=distances, angles=angles, levels=256, symmetric=True, normed=True)

    glcm_features = []
    for prop in properties:
        glcm_features.extend(feature.graycoprops(glcm, prop).mean(axis=1)) 
    
    glcm_testing_feature_vector.append(glcm_features)
    
    lbp = feature.local_binary_pattern(image, P=24, R=3, method='uniform')
    lbp_hist, _ = np.histogram(lbp, bins=np.arange(11), range=(0, 1))
    lbp_hist = lbp_hist / lbp_hist.sum()
    lbp_uniform_testing_feature_vector.append(lbp_hist)
    
    
    lbp_var = feature.local_binary_pattern(image, P=8, R=1, method='var')
    lbp_var_hist, _ = np.histogram(lbp_var, bins=16, range=(0, 7000)) 
    lbp_var_hist = lbp_var_hist / lbp_var_hist.sum()
    lbp_variance_testing_feature_vector.append(lbp_var_hist) 

    print(f"\rIteration {i + 1} of {len(brodatztesting_file_list)}", end="")

glcm_testing_feature_vector = np.array(glcm_testing_feature_vector)
lbp_uniform_testing_feature_vector = np.array(lbp_uniform_testing_feature_vector)
lbp_variance_testing_feature_vector = np.array(lbp_variance_testing_feature_vector)

print("\nFeature Vector Shape:", glcm_testing_feature_vector.shape)
print("LBP Uniform Feature Vector Shape:", lbp_uniform_testing_feature_vector.shape)
print("LBP Variance Feature Vector Shape:", lbp_variance_testing_feature_vector.shape)

Iteration 320 of 320
Feature Vector Shape: (320, 16)
LBP Uniform Feature Vector Shape: (320, 10)
LBP Variance Feature Vector Shape: (320, 16)


# Step 3: Generate Label Arrays for the Training and Testing Data

Use labels 1 for the first class, label 2 for the second class, etc.   This should be easy to do since the filenames are ordered in blocks of 15 or 40 images of each class for training and testing respectively.

In [359]:
# Write your code for step 3 here.  
training_labels = np.zeros(len(brodatztraining_file_list))
testing_labels = np.zeros(len(brodatztesting_file_list))

training_files_per_class = 15 
testing_files_per_class = 40 
num_classes = 8 


for class_label in range(1, num_classes + 1):
    start_idx = (class_label - 1) * training_files_per_class
    end_idx = class_label * training_files_per_class
    training_labels[start_idx:end_idx] = class_label

for class_label in range(1, num_classes + 1):
    start_idx = (class_label - 1) * testing_files_per_class
    end_idx = class_label * testing_files_per_class
    testing_labels[start_idx:end_idx] = class_label


# Step 4:  Train an KNN classifier.  

Train an KNN  classifier using your GLCM features.  Train another one using your LBP features.



In [360]:
import sklearn.neighbors as knn

# Write your code here. This should be quite short.

knn_glcm = knn.KNeighborsClassifier(n_neighbors = 5)
knn_glcm.fit(glcm_training_feature_vector,training_labels)


combined_lbp_training = np.hstack((lbp_uniform_training_feature_vector, lbp_variance_training_feature_vector))
combined_lbp_testing = np.hstack((lbp_uniform_testing_feature_vector, lbp_variance_testing_feature_vector))

knn_lbp = knn.KNeighborsClassifier(n_neighbors = 5)
knn_lbp.fit(combined_lbp_training,training_labels)



# Step 4:  Predict the classes of the test images

Predict the classes of the test images using both classifiers.

In [361]:
# Write your code here.  Again this should be quite short.

testing_labels = testing_labels.astype(int)


predicted_labels_glcm = knn_glcm.predict(glcm_testing_feature_vector).astype(int)

predicted_labels_lbp = knn_lbp.predict(combined_lbp_testing).astype(int)


# Step 6:  Display Results

Display results as in the final step of Question 1.  For each classifier display the image filenames that were incorrectly classified, the confisuion matrix, and the classification rate.  





In [362]:
# Write your code here.
# Define a confusion matrix function
def confusion_matrix(y_true, y_pred, num_classes):
    conf_matrix = np.zeros((num_classes, num_classes), dtype=int)
    for true, pred in zip(y_true, y_pred):
        conf_matrix[true - 1][pred - 1] += 1
    return conf_matrix

# GLCM results
conf_matrix_glcm = confusion_matrix(testing_labels, predicted_labels_glcm, num_classes=8)
correct_classifications_glcm = np.trace(conf_matrix_glcm)
total_samples_glcm = np.sum(conf_matrix_glcm)
classification_rate_glcm = correct_classifications_glcm / total_samples_glcm * 100
misclassified_indices_glcm = np.where(testing_labels != predicted_labels_glcm)[0]
misclassified_files_glcm = [brodatztesting_file_list[i] for i in misclassified_indices_glcm]

# LBP Uniform results
conf_matrix_lbp = confusion_matrix(testing_labels, predicted_labels_lbp, num_classes=8)
correct_classifications_lbp = np.trace(conf_matrix_lbp)
total_samples_lbp = np.sum(conf_matrix_lbp)
classification_rate_lbp = correct_classifications_lbp / total_samples_lbp * 100
misclassified_indices_lbp = np.where(testing_labels != predicted_labels_lbp)[0]
misclassified_files_lbp = [brodatztesting_file_list[i] for i in misclassified_indices_lbp]

# Display GLCM results
print("GLCM Results:")
print("Confusion Matrix:")
print(conf_matrix_glcm)
print("\nClassification Rate: {:.2f}%".format(classification_rate_glcm))
print("\nMisclassified Images:")
for idx in misclassified_indices_glcm:
    print(f"{brodatztesting_file_list[idx]} | True: {testing_labels[idx]} | Predicted: {predicted_labels_glcm[idx]}")

# Display LBP Uniform results
print("\nLBP Uniform Results:")
print("Confusion Matrix:")
print(conf_matrix_lbp)
print("\nClassification Rate: {:.2f}%".format(classification_rate_lbp))
print("\nMisclassified Images:")
for idx in misclassified_indices_lbp:
    print(f"{brodatztesting_file_list[idx]} | True: {testing_labels[idx]} | Predicted: {predicted_labels_lbp[idx]}")

GLCM Results:
Confusion Matrix:
[[40  0  0  0  0  0  0  0]
 [33  7  0  0  0  0  0  0]
 [ 2  0 38  0  0  0  0  0]
 [ 1  0  0 32  0  0  7  0]
 [ 0  0  0  0 40  0  0  0]
 [ 0  0  0  0  0 40  0  0]
 [ 0  0  0  0  0  0 40  0]
 [ 0  0  0  0  0  0  0 40]]

Classification Rate: 86.56%

Misclassified Images:
patch-203221.png | True: 2 | Predicted: 1
patch-204094.png | True: 2 | Predicted: 1
patch-205575.png | True: 2 | Predicted: 1
patch-206787.png | True: 2 | Predicted: 1
patch-208216.png | True: 2 | Predicted: 1
patch-208437.png | True: 2 | Predicted: 1
patch-208618.png | True: 2 | Predicted: 1
patch-209739.png | True: 2 | Predicted: 1
patch-210674.png | True: 2 | Predicted: 1
patch-212706.png | True: 2 | Predicted: 1
patch-213095.png | True: 2 | Predicted: 1
patch-213305.png | True: 2 | Predicted: 1
patch-214023.png | True: 2 | Predicted: 1
patch-216523.png | True: 2 | Predicted: 1
patch-217835.png | True: 2 | Predicted: 1
patch-218050.png | True: 2 | Predicted: 1
patch-221458.png | True: 2 

# Step 7: Reflections

Answer the following questions right here in this block:

- Discuss the performance difference of the two different texture features.  Hypothesize reasons for observed differenes.
	
	_Your answer:_
    
    
    	1.	GLCM Features:
	•	The GLCM features provided a classification rate of 86.56%, which is decent but lower compared to LBP features.
	•	GLCM primarily relies on second-order statistics, capturing the spatial relationships of pixel intensities. However, this can lead to limitations in robustness when rotation or scaling is present in the texture patterns.
	•	Since the test dataset contained rotated examples of the textures, GLCM struggled because it lacks inherent rotation invariance. Even with multiple angles considered, the texture’s spatial arrangement changes significantly, which can reduce its ability to generalize well.
	2.	LBP Features:
	•	LBP features achieved an impressive classification rate of 99.69%.
	•	LBP inherently includes rotation invariance and grayscale invariance, making it more robust for datasets with varied orientations and intensities. This is particularly advantageous for datasets like this one, which include rotated textures.
	•	The inclusion of LBP variance (VAR) further enhanced the discriminative power by accounting for local texture contrast. This contributed to distinguishing finer details that GLCM could not capture effectively.

- For each of your two classifiers, discuss the misclassified images.  Were there any classes that were particularly difficult to distinguish?  Do the misclassified images (over all classes) have anything in common that would cause them to be misclassified?  If so what do they ahve in common, and why do you think it is confusing the classifier?

	_Your answer:_
    
    
    	1.	GLCM Misclassified Images:
	•	The confusion matrix revealed difficulties in distinguishing classes 2 and 1, as well as some confusion between classes 4 and 7.
	•	The misclassified images often had textures with subtle spatial differences that GLCM could not capture effectively. For instance:
	•	Class 2 textures may exhibit patterns similar to Class 1 but slightly rotated or scaled.
	•	Class 4’s textures may have co-occurrence properties similar to Class 7’s due to overlapping features like edges or periodicity.
	•	Common Causes of Misclassification:
	•	Lack of rotation invariance in GLCM features, as the spatial arrangement shifts under rotation.
	•	Overlapping intensity patterns in different texture classes that cannot be distinguished by co-occurrence matrices alone.
	2.	LBP Misclassified Images:
	•	Only one image (patch-850990.png) was misclassified, with the true label being Class 8 and predicted as Class 7.
	•	Possible Reasons:
	•	The texture in this misclassified image likely shares similar edge orientation or contrast features with Class 7, leading to confusion in the LBP histogram representation.
	•	Another reason could be variations in the local texture density or uneven representation of patterns in the image, which caused the LBP histogram bins to overlap in their values.