# Random Forest Model

## To predict whether the critic score will be high or low based on different features
### Target Variable and Features
- Target variable (y) = Critic_Score_Status (low/high)
- X = Genre, ESRB_Rating, Platform, Publisher, Developer_x, Country, Total_Sales

### Machine Learning Models
- rf_model = RandomForestClassifier
- brf_model = BalancedRandomForestClassifier
- eec_model = EasyEnsembleClassifier

### Steps
1. Preprocessing data, drop unnecessary columns.
2. Fill NaN values in 'Critic_Score' columns with 0, in order to maintain the row count above 9,000. (If the NaNs in Critic_Score columns is dropped, then the row count will fall to approx. 4,000 rows.)
3. Drop NaNs row
4. Create additional 'Critic_Score_Status' column labeling the game by their critic score, the critic score which is higher than or equal to 7 is labeled with high, otherwise is labeled with low.
5. Bucket the categorical columns to reduce the variables. Keeping top 10, and bin others as 'other'.
6. Encode the cateforical variable using OneHotEncoder method
7. Assign target variable and features:
    - Target variable (y) = Critic_Score_Status (low/high)
    - Features (X) = Genre, ESRB_Rating, Platform, Publisher, Developer_x, Country, Total_Sales
8. Split the dataset into training and testing and scale the data.
9. Create the ML models then train and test the model.
10. Evaluate the model with the accuracy score and classification Report


In [88]:
import pandas as pd
import numpy as np

from pathlib import Path
from collections import Counter

from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import confusion_matrix
from imblearn.metrics import classification_report_imbalanced

In [89]:
# Load the dataset
games_df = pd.read_csv('Cleaned_Data/all_columns_df.csv')
games_df

Unnamed: 0,Rank,Name,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,User_Score,Year,Country,Total_Sales
0,1,Wii Sports,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,,2006.0,Japan,82.86
1,2,Super Mario Bros.,Platform,,NES,Nintendo,Nintendo EAD,10.0,,1985.0,Japan,40.24
2,3,Mario Kart Wii,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,9.1,2008.0,Japan,37.14
3,4,PlayerUnknown's Battlegrounds,Shooter,,PC,PUBG Corporation,PUBG Corporation,,,2017.0,,36.60
4,5,Wii Sports Resort,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,8.8,2009.0,Japan,33.09
...,...,...,...,...,...,...,...,...,...,...,...,...
19857,19858,FirePower for Microsoft Combat Flight Simulator 3,Simulation,T,PC,GMX Media,Shockwave Productions,,,2004.0,,0.01
19858,19859,Tom Clancy's Splinter Cell,Shooter,T,PC,Ubisoft,Ubisoft,,,2003.0,Europe,0.01
19859,19860,Ashita no Joe 2: The Anime Super Remix,Fighting,,PS2,Capcom,Capcom,,,2002.0,Japan,0.01
19860,19861,Tokyo Yamanote Boys for V: Main Disc,Adventure,,PSV,Rejet,Rejet,,,2017.0,,0.01


In [90]:
# Check null values
games_df.isna().sum()

Rank                0
Name                0
Genre               0
ESRB_Rating      5937
Platform            0
Publisher           0
Developer_x         2
Critic_Score    15156
User_Score      19624
Year                3
Country          7985
Total_Sales         0
dtype: int64

In [91]:
# Check unique values
# games_df.nunique()

In [92]:
# Drop columns that won't be included in the analysis
games_df.drop(['Rank', 'Name', 'User_Score', 'Year'], axis=1, inplace=True)
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,Country,Total_Sales
0,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,Japan,82.86
1,Platform,,NES,Nintendo,Nintendo EAD,10.0,Japan,40.24
2,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,Japan,37.14
3,Shooter,,PC,PUBG Corporation,PUBG Corporation,,,36.60
4,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,Japan,33.09
...,...,...,...,...,...,...,...,...
19857,Simulation,T,PC,GMX Media,Shockwave Productions,,,0.01
19858,Shooter,T,PC,Ubisoft,Ubisoft,,Europe,0.01
19859,Fighting,,PS2,Capcom,Capcom,,Japan,0.01
19860,Adventure,,PSV,Rejet,Rejet,,,0.01


In [93]:
# Replace NaN in 'Critic_Score' column with 0
games_df['Critic_Score'] = games_df['Critic_Score'].fillna(0)
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,Country,Total_Sales
0,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,Japan,82.86
1,Platform,,NES,Nintendo,Nintendo EAD,10.0,Japan,40.24
2,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,Japan,37.14
3,Shooter,,PC,PUBG Corporation,PUBG Corporation,0.0,,36.60
4,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,Japan,33.09
...,...,...,...,...,...,...,...,...
19857,Simulation,T,PC,GMX Media,Shockwave Productions,0.0,,0.01
19858,Shooter,T,PC,Ubisoft,Ubisoft,0.0,Europe,0.01
19859,Fighting,,PS2,Capcom,Capcom,0.0,Japan,0.01
19860,Adventure,,PSV,Rejet,Rejet,0.0,,0.01


In [94]:
# To see the row count if drop NaN in all columns
games_df.dropna().count()

Genre           9383
ESRB_Rating     9383
Platform        9383
Publisher       9383
Developer_x     9383
Critic_Score    9383
Country         9383
Total_Sales     9383
dtype: int64

In [95]:
# Drop NaN row
games_df = games_df.dropna()
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,Country,Total_Sales
0,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,Japan,82.86
2,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,Japan,37.14
4,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,Japan,33.09
5,Role-Playing,E,GB,Nintendo,Game Freak,9.4,Japan,31.38
6,Platform,E,DS,Nintendo,Nintendo EAD,9.1,Japan,30.80
...,...,...,...,...,...,...,...,...
19823,Adventure,E10,PC,Sierra Entertainment,Stormfront Studios,0.0,United States,0.01
19838,Strategy,T,PC,Sega,The Creative Assembly,0.0,United States,0.01
19850,Simulation,M,XOne,THQ Nordic,Weappy Studio,0.0,AustriaSweden,0.01
19856,Platform,E,3DS,Nintendo,Nintendo,0.0,Japan,0.01


## Categorize Critic_Score to low and high

In [96]:
# Count how many games with critic score higher or equal to 7
games_df[games_df['Critic_Score'] >= 7]['Critic_Score'].count()

2435

In [97]:
# Add addition column to categorize games into Successful/Unsuccessful based on critic score
def label_score(row):
    if row['Critic_Score'] >= 7:
        return 'high'
    return 'low'

games_df['Critic_Score_Status'] = games_df.apply(lambda row: label_score(row), axis=1)
games_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  games_df['Critic_Score_Status'] = games_df.apply(lambda row: label_score(row), axis=1)


Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,Country,Total_Sales,Critic_Score_Status
0,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,Japan,82.86,high
2,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,Japan,37.14,high
4,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,Japan,33.09,high
5,Role-Playing,E,GB,Nintendo,Game Freak,9.4,Japan,31.38,high
6,Platform,E,DS,Nintendo,Nintendo EAD,9.1,Japan,30.80,high
...,...,...,...,...,...,...,...,...,...
19823,Adventure,E10,PC,Sierra Entertainment,Stormfront Studios,0.0,United States,0.01,low
19838,Strategy,T,PC,Sega,The Creative Assembly,0.0,United States,0.01,low
19850,Simulation,M,XOne,THQ Nordic,Weappy Studio,0.0,AustriaSweden,0.01,low
19856,Platform,E,3DS,Nintendo,Nintendo,0.0,Japan,0.01,low


In [98]:
# Drop 'Critic_Score' column
games_df = games_df.drop('Critic_Score', axis = 1)
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Country,Total_Sales,Critic_Score_Status
0,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,82.86,high
2,Racing,E,Wii,Nintendo,Nintendo EAD,Japan,37.14,high
4,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,33.09,high
5,Role-Playing,E,GB,Nintendo,Game Freak,Japan,31.38,high
6,Platform,E,DS,Nintendo,Nintendo EAD,Japan,30.80,high
...,...,...,...,...,...,...,...,...
19823,Adventure,E10,PC,Sierra Entertainment,Stormfront Studios,United States,0.01,low
19838,Strategy,T,PC,Sega,The Creative Assembly,United States,0.01,low
19850,Simulation,M,XOne,THQ Nordic,Weappy Studio,AustriaSweden,0.01,low
19856,Platform,E,3DS,Nintendo,Nintendo,Japan,0.01,low


In [99]:
# Check unique values
games_df.nunique()

Genre                    19
ESRB_Rating               6
Platform                 29
Publisher               100
Developer_x            1387
Country                  17
Total_Sales             591
Critic_Score_Status       2
dtype: int64

## Bucket data to top 10 and other bins

In [100]:
# Keep top 10 of Genre
top=games_df.Genre.value_counts().index[0:10 ]
games_df.Genre=np.where(games_df.Genre.isin(top),games_df.Genre,'other')

In [101]:
# Keep top 10 of Platform
top = games_df.Platform.value_counts().index[0:10]
games_df.Platform = np.where(games_df.Platform.isin(top), games_df.Platform,'other')

In [102]:
# Keep top 10 of Publisher
top=games_df.Publisher.value_counts().index[0:10]
games_df.Publisher = np.where(games_df.Publisher.isin(top), games_df.Publisher, 'other')

In [103]:
# Keep top 10 of Developer_x
top=games_df.Developer_x.value_counts().index[0:10]
games_df.Developer_x = np.where(games_df.Developer_x.isin(top), games_df.Developer_x,'other')

In [104]:
# Keep top 10 of Country
top=games_df.Country.value_counts().index[0:10]
games_df.Country = np.where(games_df.Country.isin(top), games_df.Country, 'other')

In [105]:
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Country,Total_Sales,Critic_Score_Status
0,Sports,E,Wii,Nintendo,other,Japan,82.86,high
2,Racing,E,Wii,Nintendo,other,Japan,37.14,high
4,Sports,E,Wii,Nintendo,other,Japan,33.09,high
5,Role-Playing,E,other,Nintendo,other,Japan,31.38,high
6,Platform,E,DS,Nintendo,other,Japan,30.80,high
...,...,...,...,...,...,...,...,...
19823,Adventure,E10,PC,other,other,United States,0.01,low
19838,other,T,PC,Sega,other,United States,0.01,low
19850,Simulation,M,other,other,other,AustriaSweden,0.01,low
19856,Platform,E,other,Nintendo,other,Japan,0.01,low


In [106]:
# Check unique values
games_df.nunique()

Genre                   11
ESRB_Rating              6
Platform                11
Publisher               11
Developer_x             11
Country                 11
Total_Sales            591
Critic_Score_Status      2
dtype: int64

In [107]:
# Check dtypes
games_df.dtypes

Genre                   object
ESRB_Rating             object
Platform                object
Publisher               object
Developer_x             object
Country                 object
Total_Sales            float64
Critic_Score_Status     object
dtype: object

## Encoding categorical variables

In [108]:
# Assign features
X = games_df.drop('Critic_Score_Status', axis = 1)
X

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Country,Total_Sales
0,Sports,E,Wii,Nintendo,other,Japan,82.86
2,Racing,E,Wii,Nintendo,other,Japan,37.14
4,Sports,E,Wii,Nintendo,other,Japan,33.09
5,Role-Playing,E,other,Nintendo,other,Japan,31.38
6,Platform,E,DS,Nintendo,other,Japan,30.80
...,...,...,...,...,...,...,...
19823,Adventure,E10,PC,other,other,United States,0.01
19838,other,T,PC,Sega,other,United States,0.01
19850,Simulation,M,other,other,other,AustriaSweden,0.01
19856,Platform,E,other,Nintendo,other,Japan,0.01


In [109]:
X.dtypes

Genre           object
ESRB_Rating     object
Platform        object
Publisher       object
Developer_x     object
Country         object
Total_Sales    float64
dtype: object

In [110]:
# Encoding object dtype columns
X_cat = X.select_dtypes(include='object')
X_cat = list(X_cat.columns)
X_cat

['Genre', 'ESRB_Rating', 'Platform', 'Publisher', 'Developer_x', 'Country']

In [111]:
from sklearn.preprocessing import OneHotEncoder

# creating instance of one-hot-encoder
enc = OneHotEncoder(sparse=False)
# Fit and transform the OneHotEncoder using the categorical variable list
encode_df = pd.DataFrame(enc.fit_transform(X[X_cat]))

# Add the encoded variable names to the dataframe
encode_df.columns = enc.get_feature_names(X_cat)

encode_df



Unnamed: 0,Genre_Action,Genre_Adventure,Genre_Fighting,Genre_Misc,Genre_Platform,Genre_Racing,Genre_Role-Playing,Genre_Shooter,Genre_Simulation,Genre_Sports,...,Country_Europe,Country_France,Country_Germany,Country_Japan,Country_Poland,Country_Russia,Country_South Korea,Country_United Kingdom,Country_United States,Country_other
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9378,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
9379,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
9380,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9381,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [112]:
# Reset X dataframe index to merge with encode_df
X.reset_index(drop=True, inplace=True)
X

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Country,Total_Sales
0,Sports,E,Wii,Nintendo,other,Japan,82.86
1,Racing,E,Wii,Nintendo,other,Japan,37.14
2,Sports,E,Wii,Nintendo,other,Japan,33.09
3,Role-Playing,E,other,Nintendo,other,Japan,31.38
4,Platform,E,DS,Nintendo,other,Japan,30.80
...,...,...,...,...,...,...,...
9378,Adventure,E10,PC,other,other,United States,0.01
9379,other,T,PC,Sega,other,United States,0.01
9380,Simulation,M,other,other,other,AustriaSweden,0.01
9381,Platform,E,other,Nintendo,other,Japan,0.01


In [113]:
# Merge one-hot encoded features and drop the originals
X = X.merge(encode_df, left_index=True, right_index=True)
X = X.drop(X_cat,1)
X

  X = X.drop(X_cat,1)


Unnamed: 0,Total_Sales,Genre_Action,Genre_Adventure,Genre_Fighting,Genre_Misc,Genre_Platform,Genre_Racing,Genre_Role-Playing,Genre_Shooter,Genre_Simulation,...,Country_Europe,Country_France,Country_Germany,Country_Japan,Country_Poland,Country_Russia,Country_South Korea,Country_United Kingdom,Country_United States,Country_other
0,82.86,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,37.14,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,33.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,31.38,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,30.80,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9378,0.01,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
9379,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
9380,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9381,0.01,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [114]:
# Assign the target
y = games_df['Critic_Score_Status']
y.value_counts()

low     6948
high    2435
Name: Critic_Score_Status, dtype: int64

In [115]:
X.shape

(9383, 62)

In [116]:
y.shape

(9383,)

## Spliting and scale the data

In [117]:
# Split data to training and testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# Check the balance of the target variables.
print(f"y_train: {Counter(y_train)}")
print(f"y_test: {Counter(y_test)}")

y_train: Counter({'low': 5200, 'high': 1837})
y_test: Counter({'low': 1748, 'high': 598})


In [118]:
# Creating a StandardScaler instance.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fitting the Standard Scaler with the training data.
X_scaler = scaler.fit(X_train)

# Scaling the data.
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

## Random Forest Classifier Model

In [119]:
# Create a random forest classifier.
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators=128, random_state=78) 

In [120]:
# Fitting the model
rf_model = rf_model.fit(X_train_scaled, y_train)

In [121]:
# Making predictions using the testing data.
predictions = rf_model.predict(X_test_scaled)

In [125]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
cm = confusion_matrix(y_test, predictions)

# Create a DataFrame from the confusion matrix.
cm_df = pd.DataFrame(
    cm, index=["Actual high", "Actual low"], columns=["Predicted high", "Predicted low"])

cm_df

Unnamed: 0,Predicted high,Predicted low
Actual high,257,341
Actual low,199,1549


In [126]:
# Calculating the accuracy score.
acc_score = accuracy_score(y_test, predictions)

In [136]:
# Displaying results
print("Confusion Matrix")
display(cm_df)
print(f"Accuracy Score : {acc_score}")
print("---------------------")
print("Classification Report")
print(classification_report(y_test, predictions))

Confusion Matrix


Unnamed: 0,Predicted high,Predicted low
Actual high,257,341
Actual low,199,1549


Accuracy Score : 0.7698209718670077
---------------------
Classification Report
              precision    recall  f1-score   support

        high       0.56      0.43      0.49       598
         low       0.82      0.89      0.85      1748

    accuracy                           0.77      2346
   macro avg       0.69      0.66      0.67      2346
weighted avg       0.75      0.77      0.76      2346



## Balanced Random Forest Classifier Model

In [128]:
# Resample the training data with the BalancedRandomForestClassifier
from imblearn.ensemble import BalancedRandomForestClassifier

brf_model = BalancedRandomForestClassifier(n_estimators=100, random_state = 1) 

# Fitting the model
brf_model.fit(X_train, y_train)

In [129]:
# Calculated the balanced accuracy score
y_pred_brf = brf_model.predict(X_test)

from sklearn.metrics import balanced_accuracy_score
balanced_accuracy_score(y_test, y_pred_brf)

0.7346857947544446

In [130]:
# Display the confusion matrix
from sklearn.metrics import confusion_matrix
pd.DataFrame(
    confusion_matrix(y_test, y_pred_brf),
    index=["Actual high", "Actual low"],
    columns=["Predicted high", "Predicted low"])

Unnamed: 0,Predicted high,Predicted low
Actual high,436,162
Actual low,454,1294


In [131]:
# Print the imbalanced classification report
print(classification_report_imbalanced(y_test, y_pred_brf))

                   pre       rec       spe        f1       geo       iba       sup

       high       0.49      0.73      0.74      0.59      0.73      0.54       598
        low       0.89      0.74      0.73      0.81      0.73      0.54      1748

avg / total       0.79      0.74      0.73      0.75      0.73      0.54      2346



## Easy Ensemble AdaBoost Classifier Model

In [132]:
# Train the EasyEnsembleClassifier
from imblearn.ensemble import EasyEnsembleClassifier 

eec_model = EasyEnsembleClassifier(n_estimators=100, random_state=1)

eec_model.fit(X_train, y_train)

In [133]:
# Calculated the balanced accuracy score
y_pred_eec = eec_model.predict(X_test)

balanced_accuracy_score(y_test, y_pred_eec)

0.7214178841753212

In [134]:
# Display the confusion matrix
pd.DataFrame(
    confusion_matrix(y_test, y_pred_eec),
    index=["Actual high_risk", "Actual low_risk"],
    columns=["Predicted high_risk", "Predicted low_risk"])

Unnamed: 0,Predicted high_risk,Predicted low_risk
Actual high_risk,415,183
Actual low_risk,439,1309


In [135]:
# Print the imbalanced classification report
print(classification_report_imbalanced(y_test, y_pred_eec))

                   pre       rec       spe        f1       geo       iba       sup

       high       0.49      0.69      0.75      0.57      0.72      0.52       598
        low       0.88      0.75      0.69      0.81      0.72      0.52      1748

avg / total       0.78      0.73      0.71      0.75      0.72      0.52      2346

