# Steam Reviews Classification Notebook

This notebook demonstrates several methods to classify Steam reviews using different approaches:

- **Traditional ML:** Random Forest, XGBoost, and Logistic Regression with TF-IDF (using n-grams).
- **Transformer-based:** DistilBERT (via a sentiment analysis pipeline) and Zero-Shot Classification with BART.

The target variable is the review sentiment (from the `voted_up` field, converted to binary labels: 1 for positive, 0 for negative).


In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, classification_report

# Traditional ML classifiers
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression

# Transformer-based methods from Hugging Face
from transformers import pipeline

# Download NLTK resources
import nltk
nltk.download('punkt')
nltk.download('stopwords')


  from .autonotebook import tqdm as notebook_tqdm
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Happy\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Happy\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [2]:
# Load the cleaned data (make sure 'steam_reviews_cleaned.csv' is in your working directory)
df = pd.read_csv('steam_reviews_cleaned.csv')
df.head()

Unnamed: 0,app_id,review,voted_up,is_english,cleaned_review
0,620,I don’t remember signing up for this. Maybe I ...,True,True,’ remember signing . Maybe . idea long . Maybe...
1,620,"I want to say this, i dont think valve is mak...",True,True,"want say , dont think valve making portal 3 . ..."
2,620,"Ah, Portal 2. The portal gun is iconic. When p...",True,True,"Ah , Portal 2 . portal gun iconic . platformin..."
3,620,Portal 2 is one game I hesitated to get it but...,True,True,Portal 2 one game hesitated get believe review...
4,620,this gotta one of my favorite games to play in...,True,True,got ta one favorite games play free time fun g...


In [3]:
# Prepare the data for classification

# The target is the 'voted_up' column (convert it to integer: 1 for positive, 0 for negative)
X = df['cleaned_review']
y = df['voted_up'].astype(int)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print('Training set size:', len(X_train))
print('Test set size:', len(X_test))

Training set size: 3077
Test set size: 770


In [4]:
# Vectorize the text using TF-IDF with n-grams (unigrams and bigrams)
vectorizer = TfidfVectorizer(ngram_range=(1, 2))
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

print('TF-IDF features shape:', X_train_tfidf.shape)

TF-IDF features shape: (3077, 81912)


## Random Forest Classification

In [5]:
# Train a Random Forest classifier
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train_tfidf, y_train)

# Predict and evaluate on the test set
pred_rf = rf.predict(X_test_tfidf)
print('Random Forest Accuracy:', accuracy_score(y_test, pred_rf))
print(classification_report(y_test, pred_rf))

Random Forest Accuracy: 0.8480519480519481
              precision    recall  f1-score   support

           0       0.81      0.21      0.34       140
           1       0.85      0.99      0.91       630

    accuracy                           0.85       770
   macro avg       0.83      0.60      0.63       770
weighted avg       0.84      0.85      0.81       770



## XGBoost Classification

In [6]:
# Train an XGBoost classifier
xgb = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)
xgb.fit(X_train_tfidf, y_train)

# Predict and evaluate on the test set
pred_xgb = xgb.predict(X_test_tfidf)
print('XGBoost Accuracy:', accuracy_score(y_test, pred_xgb))
print(classification_report(y_test, pred_xgb))

Parameters: { "use_label_encoder" } are not used.



XGBoost Accuracy: 0.8714285714285714
              precision    recall  f1-score   support

           0       0.75      0.44      0.56       140
           1       0.89      0.97      0.92       630

    accuracy                           0.87       770
   macro avg       0.82      0.70      0.74       770
weighted avg       0.86      0.87      0.86       770



## Logistic Regression with K-grams

Here we use a Logistic Regression classifier with the same TF-IDF features (using n-grams) to see how it performs.

In [7]:
# Train a Logistic Regression classifier
lr = LogisticRegression(max_iter=1000, random_state=42)
lr.fit(X_train_tfidf, y_train)

# Predict and evaluate on the test set
pred_lr = lr.predict(X_test_tfidf)
print('Logistic Regression (K-grams) Accuracy:', accuracy_score(y_test, pred_lr))
print(classification_report(y_test, pred_lr))

Logistic Regression (K-grams) Accuracy: 0.8558441558441559
              precision    recall  f1-score   support

           0       0.94      0.22      0.36       140
           1       0.85      1.00      0.92       630

    accuracy                           0.86       770
   macro avg       0.90      0.61      0.64       770
weighted avg       0.87      0.86      0.82       770



## DistilBERT Classification

Here we use a pre-built Hugging Face sentiment-analysis pipeline with DistilBERT. Note that the model (`distilbert-base-uncased-finetuned-sst-2-english`) is fine-tuned on the SST-2 dataset, so its predictions may not perfectly align with Steam reviews.

In [8]:
# Create a DistilBERT sentiment analysis pipeline
distilbert_classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Function to obtain predictions from DistilBERT
def get_distilbert_predictions(reviews):
    preds = []
    for review in reviews:
        # Pass truncation=True so that texts longer than 512 tokens are truncated
        result = distilbert_classifier(review, truncation=True)[0]
        # Convert the label to a binary value: 1 for POSITIVE, 0 for NEGATIVE
        pred = 1 if result['label'] == "POSITIVE" else 0
        preds.append(pred)
    return preds

# Get predictions on the test set (this may take a while for a large dataset)
pred_distilbert = get_distilbert_predictions(X_test.tolist())

print('DistilBERT Accuracy:', accuracy_score(y_test, pred_distilbert))
print(classification_report(y_test, pred_distilbert))





All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
Device set to use 0


DistilBERT Accuracy: 0.8207792207792208
              precision    recall  f1-score   support

           0       0.50      0.86      0.63       140
           1       0.96      0.81      0.88       630

    accuracy                           0.82       770
   macro avg       0.73      0.83      0.76       770
weighted avg       0.88      0.82      0.84       770



## Transformer Decoder (Zero-Shot Classification with BART)

For a transformer decoder approach, we demonstrate zero-shot classification using Facebook's BART model. Zero-shot classification allows us to classify texts without a task-specific fine-tuning, making it a flexible (albeit computationally heavy) option.

Here we define candidate labels for sentiment as `positive` and `negative`.

In [None]:
from transformers import pipeline
# Use "roberta-large-mnli" for zero-shot classification
# !!!Put your your own Access Token to WORK !!!
zero_shot_classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
    token="" #HERE
)

# Define candidate labels for sentiment
candidate_labels = ["negative", "positive"]

# Function to obtain predictions using zero-shot classification
def get_bart_predictions(reviews):
    preds = []
    for review in reviews:
        result = zero_shot_classifier(review, candidate_labels)
        # The label with the highest score is used as the prediction
        pred_label = result['labels'][0]
        pred = 1 if pred_label.lower() == "positive" else 0
        preds.append(pred)
    return preds

# Get predictions on the test set (again, this may be slow on a large dataset)
pred_bart = get_bart_predictions(X_test.tolist())

print('Transformer Decoder (BART) Zero-Shot Accuracy:', accuracy_score(y_test, pred_bart))
print(classification_report(y_test, pred_bart))

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing TFBartForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBartForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBartForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBartForSequenceClassification for predictions without further training.
Device set to use 0


Transformer Decoder (BART) Zero-Shot Accuracy: 0.825974025974026
              precision    recall  f1-score   support

           0       0.51      0.92      0.66       140
           1       0.98      0.80      0.88       630

    accuracy                           0.83       770
   macro avg       0.75      0.86      0.77       770
weighted avg       0.89      0.83      0.84       770



## Conclusion

This notebook explored several approaches for classifying Steam reviews. You can compare the performance of traditional ML methods (using TF-IDF and n-grams) with modern transformer-based methods. Depending on your computational resources and the specifics of your dataset, you may choose to further fine-tune or extend these models for improved performance.