# Random Forest Model

## To predict whether the critic score will be high or low based on different features
### Target Variable and Features
- Target variable (y) = Critic_Score_Status (low/high)
- X = Genre, ESRB_Rating, Platform, Publisher, Developer_x, Country, Total_Sales

### Machine Learning Models
- rf_model = RandomForestClassifier
- brf_model = BalancedRandomForestClassifier
- eec_model = EasyEnsembleClassifier

### Steps
1. Preprocessing data, dropping unnecessary columns.
2. Drop NaNs values
3. Create an additional 'Critic_Score_Status' column labeling the game by their critic score, the critic score which is higher than or equal to 7 is labeled with high, otherwise is labeled with low.
4. Bucket the categorical columns to reduce the variables. Keeping top 10, and bin others as 'other'.
5. Encode the categorical variable using OneHotEncoder method
6. Assign target variable and features:
    - Target variable (y) = Critic_Score_Status (low/high)
    - Features (X) = Genre, ESRB_Rating, Platform, Publisher, Developer_x, Country, Total_Sales
7. Split the dataset into training and testing set with 75:25 ratio, and scale the data.
8. Create the ML models then train and test the model.
9. Evaluate the model with the accuracy score and classification Report


In [1]:
import pandas as pd
import numpy as np

from pathlib import Path
from collections import Counter

from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import confusion_matrix
from imblearn.metrics import classification_report_imbalanced

In [2]:
# Load the dataset from AWS S3 bucket
games_df = pd.read_csv('https://video-game-dataset-uot-boot-camp-2022-group-4.s3.us-east-2.amazonaws.com/all_columns_df.csv')
games_df

Unnamed: 0,Rank,Name,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,User_Score,Year,Country,Total_Sales
0,1,Wii Sports,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,,2006.0,Japan,82.86
1,2,Super Mario Bros.,Platform,,NES,Nintendo,Nintendo EAD,10.0,,1985.0,Japan,40.24
2,3,Mario Kart Wii,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,9.1,2008.0,Japan,37.14
3,4,PlayerUnknown's Battlegrounds,Shooter,,PC,PUBG Corporation,PUBG Corporation,,,2017.0,,36.60
4,5,Wii Sports Resort,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,8.8,2009.0,Japan,33.09
...,...,...,...,...,...,...,...,...,...,...,...,...
19857,19858,FirePower for Microsoft Combat Flight Simulator 3,Simulation,T,PC,GMX Media,Shockwave Productions,,,2004.0,,0.01
19858,19859,Tom Clancy's Splinter Cell,Shooter,T,PC,Ubisoft,Ubisoft,,,2003.0,Europe,0.01
19859,19860,Ashita no Joe 2: The Anime Super Remix,Fighting,,PS2,Capcom,Capcom,,,2002.0,Japan,0.01
19860,19861,Tokyo Yamanote Boys for V: Main Disc,Adventure,,PSV,Rejet,Rejet,,,2017.0,,0.01


In [3]:
# Check null values
games_df.isna().sum()

Rank                0
Name                0
Genre               0
ESRB_Rating      5937
Platform            0
Publisher           0
Developer_x         2
Critic_Score    15156
User_Score      19624
Year                3
Country          7985
Total_Sales         0
dtype: int64

In [4]:
# Drop columns that won't be included in the analysis
games_df.drop(['Rank', 'Name', 'User_Score', 'Year'], axis=1, inplace=True)
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,Country,Total_Sales
0,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,Japan,82.86
1,Platform,,NES,Nintendo,Nintendo EAD,10.0,Japan,40.24
2,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,Japan,37.14
3,Shooter,,PC,PUBG Corporation,PUBG Corporation,,,36.60
4,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,Japan,33.09
...,...,...,...,...,...,...,...,...
19857,Simulation,T,PC,GMX Media,Shockwave Productions,,,0.01
19858,Shooter,T,PC,Ubisoft,Ubisoft,,Europe,0.01
19859,Fighting,,PS2,Capcom,Capcom,,Japan,0.01
19860,Adventure,,PSV,Rejet,Rejet,,,0.01


In [5]:
# To see the row count if drop NaN in all columns
games_df.dropna().count()

Genre           3579
ESRB_Rating     3579
Platform        3579
Publisher       3579
Developer_x     3579
Critic_Score    3579
Country         3579
Total_Sales     3579
dtype: int64

In [6]:
# Drop NaN row
games_df = games_df.dropna()
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,Country,Total_Sales
0,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,Japan,82.86
2,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,Japan,37.14
4,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,Japan,33.09
5,Role-Playing,E,GB,Nintendo,Game Freak,9.4,Japan,31.38
6,Platform,E,DS,Nintendo,Nintendo EAD,9.1,Japan,30.80
...,...,...,...,...,...,...,...,...
19732,Sports,E,GBA,2K Sports,Indie Built,6.6,United States,0.01
19767,Action,M,PC,Ubisoft,Capcom,7.1,Europe,0.01
19792,Shooter,T,PC,Activision,Infinity Ward,7.0,United States,0.01
19794,Action,E,GBA,Atlus,Atlus Co.,6.0,Japan,0.01


## Categorize Critic_Score to low and high

In [7]:
# Count how many games with critic score higher or equal to 7
games_df[games_df['Critic_Score'] >= 7]['Critic_Score'].count()

2435

In [8]:
# Add addition column to categorize games into Successful/Unsuccessful based on critic score
def label_score(row):
    if row['Critic_Score'] >= 7:
        return 'high'
    return 'low'

games_df['Critic_Score_Status'] = games_df.apply(lambda row: label_score(row), axis=1)
games_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  games_df['Critic_Score_Status'] = games_df.apply(lambda row: label_score(row), axis=1)


Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Critic_Score,Country,Total_Sales,Critic_Score_Status
0,Sports,E,Wii,Nintendo,Nintendo EAD,7.7,Japan,82.86,high
2,Racing,E,Wii,Nintendo,Nintendo EAD,8.2,Japan,37.14,high
4,Sports,E,Wii,Nintendo,Nintendo EAD,8.0,Japan,33.09,high
5,Role-Playing,E,GB,Nintendo,Game Freak,9.4,Japan,31.38,high
6,Platform,E,DS,Nintendo,Nintendo EAD,9.1,Japan,30.80,high
...,...,...,...,...,...,...,...,...,...
19732,Sports,E,GBA,2K Sports,Indie Built,6.6,United States,0.01,low
19767,Action,M,PC,Ubisoft,Capcom,7.1,Europe,0.01,high
19792,Shooter,T,PC,Activision,Infinity Ward,7.0,United States,0.01,high
19794,Action,E,GBA,Atlus,Atlus Co.,6.0,Japan,0.01,low


In [9]:
# Drop 'Critic_Score' column
games_df = games_df.drop('Critic_Score', axis = 1)
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Country,Total_Sales,Critic_Score_Status
0,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,82.86,high
2,Racing,E,Wii,Nintendo,Nintendo EAD,Japan,37.14,high
4,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,33.09,high
5,Role-Playing,E,GB,Nintendo,Game Freak,Japan,31.38,high
6,Platform,E,DS,Nintendo,Nintendo EAD,Japan,30.80,high
...,...,...,...,...,...,...,...,...
19732,Sports,E,GBA,2K Sports,Indie Built,United States,0.01,low
19767,Action,M,PC,Ubisoft,Capcom,Europe,0.01,high
19792,Shooter,T,PC,Activision,Infinity Ward,United States,0.01,high
19794,Action,E,GBA,Atlus,Atlus Co.,Japan,0.01,low


In [10]:
# Check unique values
games_df.nunique()

Genre                   18
ESRB_Rating              5
Platform                24
Publisher               62
Developer_x            774
Country                 11
Total_Sales            556
Critic_Score_Status      2
dtype: int64

## Bucket data to top 10 and other bins

In [11]:
# Keep top 10 of Genre
top = games_df.Genre.value_counts().index[0:10 ]
games_df.Genre=np.where(games_df.Genre.isin(top),games_df.Genre,'other')

In [12]:
# Keep top 10 of Platform
top = games_df.Platform.value_counts().index[0:10]
games_df.Platform = np.where(games_df.Platform.isin(top), games_df.Platform,'other')

In [13]:
# Keep top 10 of Publisher
top = games_df.Publisher.value_counts().index[0:10]
games_df.Publisher = np.where(games_df.Publisher.isin(top), games_df.Publisher, 'other')

In [14]:
# Keep top 10 of Developer_x
top = games_df.Developer_x.value_counts().index[0:10]
games_df.Developer_x = np.where(games_df.Developer_x.isin(top), games_df.Developer_x,'other')

In [15]:
# Keep top 10 of Country
top = games_df.Country.value_counts().index[0:10]
games_df.Country = np.where(games_df.Country.isin(top), games_df.Country, 'other')

In [16]:
games_df

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Country,Total_Sales,Critic_Score_Status
0,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,82.86,high
2,Racing,E,Wii,Nintendo,Nintendo EAD,Japan,37.14,high
4,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,33.09,high
5,Role-Playing,E,other,Nintendo,other,Japan,31.38,high
6,Platform,E,DS,Nintendo,Nintendo EAD,Japan,30.80,high
...,...,...,...,...,...,...,...,...
19732,Sports,E,GBA,other,other,United States,0.01,low
19767,Action,M,PC,Ubisoft,Capcom,Europe,0.01,high
19792,Shooter,T,PC,Activision,other,United States,0.01,high
19794,Action,E,GBA,other,other,Japan,0.01,low


In [17]:
# Check unique values
games_df.nunique()

Genre                   11
ESRB_Rating              5
Platform                11
Publisher               11
Developer_x             11
Country                 11
Total_Sales            556
Critic_Score_Status      2
dtype: int64

In [18]:
# Check dtypes
games_df.dtypes

Genre                   object
ESRB_Rating             object
Platform                object
Publisher               object
Developer_x             object
Country                 object
Total_Sales            float64
Critic_Score_Status     object
dtype: object

## Encoding categorical variables

In [19]:
# Assign features
X = games_df.drop('Critic_Score_Status', axis = 1)
X

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Country,Total_Sales
0,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,82.86
2,Racing,E,Wii,Nintendo,Nintendo EAD,Japan,37.14
4,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,33.09
5,Role-Playing,E,other,Nintendo,other,Japan,31.38
6,Platform,E,DS,Nintendo,Nintendo EAD,Japan,30.80
...,...,...,...,...,...,...,...
19732,Sports,E,GBA,other,other,United States,0.01
19767,Action,M,PC,Ubisoft,Capcom,Europe,0.01
19792,Shooter,T,PC,Activision,other,United States,0.01
19794,Action,E,GBA,other,other,Japan,0.01


In [20]:
X.dtypes

Genre           object
ESRB_Rating     object
Platform        object
Publisher       object
Developer_x     object
Country         object
Total_Sales    float64
dtype: object

In [21]:
# Encoding object dtype columns
X_cat = X.select_dtypes(include='object')
X_cat = list(X_cat.columns)
X_cat

['Genre', 'ESRB_Rating', 'Platform', 'Publisher', 'Developer_x', 'Country']

In [22]:
from sklearn.preprocessing import OneHotEncoder

# creating instance of one-hot-encoder
enc = OneHotEncoder(sparse=False)
# Fit and transform the OneHotEncoder using the categorical variable list
encode_df = pd.DataFrame(enc.fit_transform(X[X_cat]))

# Add the encoded variable names to the dataframe
encode_df.columns = enc.get_feature_names(X_cat)

encode_df



Unnamed: 0,Genre_Action,Genre_Adventure,Genre_Fighting,Genre_Misc,Genre_Platform,Genre_Racing,Genre_Role-Playing,Genre_Shooter,Genre_Sports,Genre_Strategy,...,Country_Europe,Country_Finland,Country_France,Country_Japan,Country_Norway,Country_Russia,Country_South Korea,Country_United Kingdom,Country_United States,Country_other
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3575,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3576,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3577,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [23]:
# Reset X dataframe index to merge with encode_df
X.reset_index(drop=True, inplace=True)
X

Unnamed: 0,Genre,ESRB_Rating,Platform,Publisher,Developer_x,Country,Total_Sales
0,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,82.86
1,Racing,E,Wii,Nintendo,Nintendo EAD,Japan,37.14
2,Sports,E,Wii,Nintendo,Nintendo EAD,Japan,33.09
3,Role-Playing,E,other,Nintendo,other,Japan,31.38
4,Platform,E,DS,Nintendo,Nintendo EAD,Japan,30.80
...,...,...,...,...,...,...,...
3574,Sports,E,GBA,other,other,United States,0.01
3575,Action,M,PC,Ubisoft,Capcom,Europe,0.01
3576,Shooter,T,PC,Activision,other,United States,0.01
3577,Action,E,GBA,other,other,Japan,0.01


In [24]:
# Merge one-hot encoded features and drop the originals
X = X.merge(encode_df, left_index=True, right_index=True)
X = X.drop(X_cat,1)
X

  X = X.drop(X_cat,1)


Unnamed: 0,Total_Sales,Genre_Action,Genre_Adventure,Genre_Fighting,Genre_Misc,Genre_Platform,Genre_Racing,Genre_Role-Playing,Genre_Shooter,Genre_Sports,...,Country_Europe,Country_Finland,Country_France,Country_Japan,Country_Norway,Country_Russia,Country_South Korea,Country_United Kingdom,Country_United States,Country_other
0,82.86,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,37.14,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,33.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,31.38,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,30.80,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3574,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3575,0.01,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3576,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3577,0.01,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [25]:
# Assign the target
y = games_df['Critic_Score_Status']
y.value_counts()

high    2435
low     1144
Name: Critic_Score_Status, dtype: int64

In [26]:
X.shape

(3579, 61)

In [27]:
y.shape

(3579,)

## Spliting and scale the data

In [28]:
# Split data to training and testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# Check the balance of the target variables.
print(f"y_train: {Counter(y_train)}")
print(f"y_test: {Counter(y_test)}")

y_train: Counter({'high': 1813, 'low': 871})
y_test: Counter({'high': 622, 'low': 273})


In [29]:
# Creating a StandardScaler instance.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fitting the Standard Scaler with the training data.
X_scaler = scaler.fit(X_train)

# Scaling the data.
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

## Random Forest Classifier Model

In [30]:
# Create a random forest classifier.
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators=128, random_state=78) 

In [31]:
# Fitting the model
rf_model = rf_model.fit(X_train_scaled, y_train)

In [32]:
# Making predictions using the testing data.
predictions = rf_model.predict(X_test_scaled)

In [33]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
cm = confusion_matrix(y_test, predictions)

# Create a DataFrame from the confusion matrix.
cm_df = pd.DataFrame(
    cm, index=["Actual high", "Actual low"], columns=["Predicted high", "Predicted low"])

cm_df

Unnamed: 0,Predicted high,Predicted low
Actual high,523,99
Actual low,147,126


In [34]:
# Calculating the accuracy score.
acc_score = accuracy_score(y_test, predictions)

In [58]:
# Displaying results
print("Model: RandomForestClassifier")
print("Confusion Matrix:")
display(cm_df)
print(f"Accuracy Score : {acc_score}")
print("---------------------")
print("Classification Report:")
print(classification_report(y_test, predictions))

Model: RandomForestClassifier
Confusion Matrix:


Unnamed: 0,Predicted high,Predicted low
Actual high,523,99
Actual low,147,126


Accuracy Score : 0.7251396648044692
---------------------
Classification Report:
              precision    recall  f1-score   support

        high       0.78      0.84      0.81       622
         low       0.56      0.46      0.51       273

    accuracy                           0.73       895
   macro avg       0.67      0.65      0.66       895
weighted avg       0.71      0.73      0.72       895



## Balanced Random Forest Classifier Model

In [36]:
# Resample the training data with the BalancedRandomForestClassifier
from imblearn.ensemble import BalancedRandomForestClassifier

brf_model = BalancedRandomForestClassifier(n_estimators=128, random_state = 78) 

# Fitting the model
brf_model.fit(X_train, y_train)

In [53]:
# Calculated the balanced accuracy score
y_pred_brf = brf_model.predict(X_test)

from sklearn.metrics import balanced_accuracy_score
acc_score_brf = balanced_accuracy_score(y_test, y_pred_brf)

In [54]:
# Display the confusion matrix
from sklearn.metrics import confusion_matrix
cm_df_brf = pd.DataFrame(
    confusion_matrix(y_test, y_pred_brf),
    index=["Actual high", "Actual low"],
    columns=["Predicted high", "Predicted low"])

In [59]:
# Displaying results
print("Model: BalancedRandomForestClassifier")
print("Confusion Matrix:")
display(cm_df_brf)
print(f"Accuracy Score : {acc_score_brf}")
print("---------------------")
print("Classification Report:")
print(classification_report_imbalanced(y_test, y_pred_brf))

Model: BalancedRandomForestClassifier
Confusion Matrix:


Unnamed: 0,Predicted high,Predicted low
Actual high,439,183
Actual low,78,195


Accuracy Score : 0.7100367478180983
---------------------
Classification Report:
                   pre       rec       spe        f1       geo       iba       sup

       high       0.85      0.71      0.71      0.77      0.71      0.50       622
        low       0.52      0.71      0.71      0.60      0.71      0.50       273

avg / total       0.75      0.71      0.71      0.72      0.71      0.50       895



## Easy Ensemble AdaBoost Classifier Model

In [40]:
# Train the EasyEnsembleClassifier
from imblearn.ensemble import EasyEnsembleClassifier 

eec_model = EasyEnsembleClassifier(n_estimators=128, random_state=78)

eec_model.fit(X_train, y_train)

In [50]:
# Calculated the balanced accuracy score
y_pred_eec = eec_model.predict(X_test)

acc_score_eec = balanced_accuracy_score(y_test, y_pred_eec)

In [51]:
# Display the confusion matrix
cm_df_eec = pd.DataFrame(
    confusion_matrix(y_test, y_pred_eec),
    index=["Actual high_risk", "Actual low_risk"],
    columns=["Predicted high_risk", "Predicted low_risk"])

In [60]:
# Displaying results
print("Model: EasyEnsembleClassifier")
print("Confusion Matrix:")
display(cm_df_eec)
print(f"Accuracy Score : {acc_score_eec}")
print("---------------------")
print("Classification Report:")
print(classification_report_imbalanced(y_test, y_pred_eec))

Model: EasyEnsembleClassifier
Confusion Matrix:


Unnamed: 0,Predicted high_risk,Predicted low_risk
Actual high_risk,411,211
Actual low_risk,72,201


Accuracy Score : 0.6985177202219003
---------------------
Classification Report:
                   pre       rec       spe        f1       geo       iba       sup

       high       0.85      0.66      0.74      0.74      0.70      0.48       622
        low       0.49      0.74      0.66      0.59      0.70      0.49       273

avg / total       0.74      0.68      0.71      0.70      0.70      0.49       895

