The usual preliminaries....


In [13]:
pip install scikit-learn


Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\rushej2\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


In [14]:
pip install datasets

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\rushej2\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


In [15]:
from datasets import load_dataset

imdb_dataset = load_dataset("imdb")['train']

In [16]:
train_data = []
train_data_labels = []
for item in imdb_dataset:
  train_data.append(item['text'])
  train_data_labels.append(item['label'])

In [17]:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(analyzer='word',max_features=1000,lowercase=True,stop_words='english',ngram_range=(1,2))
features = vectorizer.fit_transform(train_data).toarray()

In [18]:
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(features,train_data_labels,train_size=0.9,random_state=123)

We will use three Decision Tree models, each one using a different splitting criterion.

In [19]:

from sklearn.tree import DecisionTreeClassifier
model_dt1 = DecisionTreeClassifier(criterion='log_loss')
model_dt2 = DecisionTreeClassifier(criterion='gini')
model_dt3 = DecisionTreeClassifier(criterion='entropy')


Train the models.

In [20]:
model_dt1 = model_dt1.fit(X=X_train,y=y_train)
model_dt2 = model_dt2.fit(X=X_train,y=y_train)
model_dt3 = model_dt3.fit(X=X_train,y=y_train)

Test the models on the validation set.

In [21]:
y_pred_dt1 = model_dt1.predict(X_val)
y_pred_dt2 = model_dt2.predict(X_val)
y_pred_dt3 = model_dt3.predict(X_val)

Now let's calculate the accuracy of the models' predictions on the validation set.

In [22]:
from sklearn.metrics import accuracy_score, confusion_matrix
print("Decision Tree 1", accuracy_score(y_val,y_pred_dt1))
print(confusion_matrix(y_val,y_pred_dt1))
print()
print("Decision Tree 2", accuracy_score(y_val,y_pred_dt2))
print(confusion_matrix(y_val,y_pred_dt2))
print()
print("Decision Tree 3", accuracy_score(y_val,y_pred_dt3))
print(confusion_matrix(y_val,y_pred_dt3))
print()


Decision Tree 1 0.6884
[[861 380]
 [399 860]]

Decision Tree 2 0.704
[[872 369]
 [371 888]]

Decision Tree 3 0.7004
[[887 354]
 [395 864]]



Now create the voting ensemble...

I developed the ensemble model by storing the binary classification for each prediction in a list and then determining the ensemble's prediction by selecting the most common element in that list.

In [23]:
from sklearn.metrics import confusion_matrix, accuracy_score

y_pred_ensemble = []

for i in range(len(y_pred_dt1)):
    votes = [y_pred_dt1[i], y_pred_dt2[i], y_pred_dt3[i]]
    if votes.count(1) > votes.count(0):
        y_pred_ensemble.append(1)
    else:
        y_pred_ensemble.append(0)

accuracy = accuracy_score(y_val, y_pred_ensemble)
cm = confusion_matrix(y_val, y_pred_ensemble)

print("Ensemble Accuracy:", accuracy)
print("Confusion Matrix:\n", cm)



Ensemble Accuracy: 0.7088
Confusion Matrix:
 [[888 353]
 [375 884]]


The ensemble did improve the accuracy on the IMDB dataset but only very slightly. This is likely due to the individual decision tree models being very similar and making the same mistakes. Perhaps a more effective approach would be to use more diverse individual models.

In [24]:
import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score

easy_reviews = [
    "Wow, this was fantastic! As I was watching it, I asked myself, 'Is this the best animated movie I've ever seen?' I think the answer is 'yes.'",
    "Yeah, I must admit, I love this movie. Which is nothing to be ashamed of; great movie, great directing, great set, great scale, great canvas, great story.",
    "This is the finest movie I have ever seen of the drama kind; it has everything to make an excellent movie. All the actors play an outstanding role.",
    "I have no idea how anyone could like this dull, uninspiring movie. It was very, very predictable. The leading actress had no talent.",
    "This movie is probably one of the worst movies I have ever seen. Don't waste your time watching this. I almost turned this movie off watching it.",
    "If you want a quick and easy way to punish your kids, take them to see this film. This overlong and boring movie will put them to sleep."
]

X_test = vectorizer.transform(easy_reviews).toarray()

y_pred_dt1 = model_dt1.predict(X_test)
y_pred_dt2 = model_dt2.predict(X_test)
y_pred_dt3 = model_dt3.predict(X_test)

test_labels = [1, 1, 1, 0, 0, 0]

y_pred_ensemble = []
for i in range(len(y_pred_dt1)):
    votes = [y_pred_dt1[i], y_pred_dt2[i], y_pred_dt3[i]]
    y_pred_ensemble.append(1 if votes.count(1) > votes.count(0) else 0)

accuracy_dt1 = accuracy_score(test_labels, y_pred_dt1)
accuracy_dt2 = accuracy_score(test_labels, y_pred_dt2)
accuracy_dt3 = accuracy_score(test_labels, y_pred_dt3)
accuracy_ensemble = accuracy_score(test_labels, y_pred_ensemble)

print("Predictions for Easy Review:")
for i, review in enumerate(easy_reviews):
    correct_dt1 = 'Correct' if y_pred_dt1[i] == test_labels[i] else 'Incorrect'
    correct_dt2 = 'Correct' if y_pred_dt2[i] == test_labels[i] else 'Incorrect'
    correct_dt3 = 'Correct' if y_pred_dt3[i] == test_labels[i] else 'Incorrect'
    correct_ensemble = 'Correct' if y_pred_ensemble[i] == test_labels[i] else 'Incorrect'
    
    print(f"\nReview: {review}")
    print(f"Model 1 Prediction: {'Positive' if y_pred_dt1[i] == 1 else 'Negative'} ({correct_dt1})")
    print(f"Model 2 Prediction: {'Positive' if y_pred_dt2[i] == 1 else 'Negative'} ({correct_dt2})")
    print(f"Model 3 Prediction: {'Positive' if y_pred_dt3[i] == 1 else 'Negative'} ({correct_dt3})")
    print(f"Ensemble Prediction: {'Positive' if y_pred_ensemble[i] == 1 else 'Negative'} ({correct_ensemble})")

print("\nIndividual Accuracies:")
print(f"Model 1 Accuracy: {accuracy_dt1:.4f}")
print(f"Model 2 Accuracy: {accuracy_dt2:.4f}")
print(f"Model 3 Accuracy: {accuracy_dt3:.4f}")
print(f"Ensemble Accuracy: {accuracy_ensemble:.4f}")

cm = confusion_matrix(test_labels, y_pred_ensemble)
print("\nConfusion Matrix for Ensemble Predictions:\n", cm)



Predictions for Easy Review:

Review: Wow, this was fantastic! As I was watching it, I asked myself, 'Is this the best animated movie I've ever seen?' I think the answer is 'yes.'
Model 1 Prediction: Positive (Correct)
Model 2 Prediction: Positive (Correct)
Model 3 Prediction: Positive (Correct)
Ensemble Prediction: Positive (Correct)

Review: Yeah, I must admit, I love this movie. Which is nothing to be ashamed of; great movie, great directing, great set, great scale, great canvas, great story.
Model 1 Prediction: Positive (Correct)
Model 2 Prediction: Positive (Correct)
Model 3 Prediction: Positive (Correct)
Ensemble Prediction: Positive (Correct)

Review: This is the finest movie I have ever seen of the drama kind; it has everything to make an excellent movie. All the actors play an outstanding role.
Model 1 Prediction: Positive (Correct)
Model 2 Prediction: Positive (Correct)
Model 3 Prediction: Positive (Correct)
Ensemble Prediction: Positive (Correct)

Review: I have no idea how 

As expected the models were all very successful at predicting the easy reviews. The individual models predicted almost each review correct so therefore the ensemble was also right.

In [26]:
import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score

adversarial_reviews = [
    "For a horror film, it had me laughing out loud from start to finish—truly a unique experience for the genre.",
    "What a delightful waste of time! I love sitting through a three hour film that goes nowhere. I’m looking forward to the sequel already!]",
    "Almost everything about my day at the cinema was brilliant; the popcorn was lovely, the seats were fantastic and the tickets were great value. The film itself though, the less said the better.",
    "I was dreading watching this movie as I had been told the acting was awful and the plot was confusing. However, having watched it myself, I completely disagree. Those reviews were absolute nonsense.",
    "I wanted to turn this film off from the moment it began. It was a horrible and unsettling experience, exactly how a horror movie should be. Despite the awful fear I felt, I couldn’t look away.",
    "I usually hate the lead actor but his performance wasn’t a complete disaster this time around. I was shocked that he didn’t totally ruin the film and it was actually quite enjoyable."]
   
X_test = vectorizer.transform(adversarial_reviews).toarray()

y_pred_dt1 = model_dt1.predict(X_test)
y_pred_dt2 = model_dt2.predict(X_test)
y_pred_dt3 = model_dt3.predict(X_test)


test_labels = [0, 0, 0, 1, 1, 1]

y_pred_ensemble = []
for i in range(len(y_pred_dt1)):
    votes = [y_pred_dt1[i], y_pred_dt2[i], y_pred_dt3[i]]
    y_pred_ensemble.append(1 if votes.count(1) > votes.count(0) else 0)

accuracy_dt1 = accuracy_score(test_labels, y_pred_dt1)
accuracy_dt2 = accuracy_score(test_labels, y_pred_dt2)
accuracy_dt3 = accuracy_score(test_labels, y_pred_dt3)
accuracy_ensemble = accuracy_score(test_labels, y_pred_ensemble)

print("Predictions for Each Review:")
for i, review in enumerate(adversarial_reviews):
    print(f"\nReview: {review}")
    print(f"Model 1 Prediction: {'Positive' if y_pred_dt1[i] == 1 else 'Negative'}")
    print(f"Model 2 Prediction: {'Positive' if y_pred_dt2[i] == 1 else 'Negative'}")
    print(f"Model 3 Prediction: {'Positive' if y_pred_dt3[i] == 1 else 'Negative'}")
    print(f"Ensemble Prediction: {'Positive' if y_pred_ensemble[i] == 1 else 'Negative'}")

print("\nIndividual Accuracies:")
print(f"Model 1 Accuracy: {accuracy_dt1:.4f}")
print(f"Model 2 Accuracy: {accuracy_dt2:.4f}")
print(f"Model 3 Accuracy: {accuracy_dt3:.4f}")
print(f"Ensemble Accuracy: {accuracy_ensemble:.4f}")

cm = confusion_matrix(test_labels, y_pred_ensemble)
print("\nConfusion Matrix for Ensemble Predictions:\n", cm)


Predictions for Each Review:

Review: For a horror film, it had me laughing out loud from start to finish—truly a unique experience for the genre.
Model 1 Prediction: Positive
Model 2 Prediction: Positive
Model 3 Prediction: Positive
Ensemble Prediction: Positive

Review: What a delightful waste of time! I love sitting through a three hour film that goes nowhere. I’m looking forward to the sequel already!]
Model 1 Prediction: Negative
Model 2 Prediction: Negative
Model 3 Prediction: Negative
Ensemble Prediction: Negative

Review: Almost everything about my day at the cinema was brilliant; the popcorn was lovely, the seats were fantastic and the tickets were great value. The film itself though, the less said the better.
Model 1 Prediction: Positive
Model 2 Prediction: Positive
Model 3 Prediction: Positive
Ensemble Prediction: Positive

Review: I was dreading watching this movie as I had been told the acting was awful and the plot was confusing. However, having watched it myself, I compl

The models were far less successful when predicting the sentiment of the adversarial test set due to the more nuanced language techniques. The ensemble was better than the naive bayes classifier that I used in Assignment 2 but after analysing the individual review predictions it was clear that the three contributing models were too similar. The 3 decision tree classifiers almost always voted for the same sentiment regardless of whether they were right or wrong. The models were clearly identifying the same important features and making the same mistakes which defeats the purpose of the ensemble.

In [28]:
import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score

combined_labels = [1, 1, 1, 0, 0, 0] + [0, 0, 0, 1, 1, 1]

combined_reviews = easy_reviews + adversarial_reviews
X_test_combined = vectorizer.transform(combined_reviews).toarray()

y_pred_dt1 = model_dt1.predict(X_test_combined)
y_pred_dt2 = model_dt2.predict(X_test_combined)
y_pred_dt3 = model_dt3.predict(X_test_combined)

y_pred_ensemble = []
for i in range(len(y_pred_dt1_combined)):
    votes = [y_pred_dt1[i], y_pred_dt2[i], y_pred_dt3[i]]
    y_pred_ensemble.append(1 if votes.count(1) > votes.count(0) else 0)

# Calculate accuracy scores for each model and the ensemble on the combined test set
accuracy_dt1_combined = accuracy_score(combined_labels, y_pred_dt1)
accuracy_dt2_combined = accuracy_score(combined_labels, y_pred_dt2)
accuracy_dt3_combined = accuracy_score(combined_labels, y_pred_dt3)
accuracy_ensemble_combined = accuracy_score(combined_labels, y_pred_ensemble)

cm_ensemble = confusion_matrix(combined_labels, y_pred_ensemble)

# Print overall accuracy and confusion matrices
print("\nOverall Accuracies (Combined Test Set):")
print(f"Model 1 Accuracy: {accuracy_dt1_combined:.4f}")
print(f"Model 2 Accuracy: {accuracy_dt2_combined:.4f}")
print(f"Model 3 Accuracy: {accuracy_dt3_combined:.4f}")
print(f"Ensemble Accuracy: {accuracy_ensemble_combined:.4f}")

print("\nConfusion Matrix for Ensemble (Combined Test Set):\n", cm_ensemble)



Overall Accuracies (Combined Test Set):
Model 1 Accuracy: 0.6667
Model 2 Accuracy: 0.6667
Model 3 Accuracy: 0.6667
Ensemble Accuracy: 0.6667

Confusion Matrix for Ensemble (Combined Test Set):
 [[4 2]
 [2 4]]


Overall, the ensemble had a very similar performance on my own test set as it did on the IMDB reviews. This makes sense as between the easy and adversarial set there was a good mix of easy and hard reviews to classify which would have also been captured in the IMDB set due to its large sample size. This indicates that my own test set was a fair way to trial and analyse the ensemble model. My biggest learning from this was that ensembles can certainly be a more effective way to design a classificaation model. However, it's very important to choose a diverse and large set of individual models so that you can reduce bias and improve overall accuracy by capturing different patterns in the data. 