# Model Background

The gradient boosted tree classifier first performs feature selection on a classifier with hyperparameters of learning_rate set to 0.01, max_features set to 'sqrt' and subsample set to 0.1, using feature importance and a set threshold value. This threshold value, which is set to the mean absolute feature value by default, determines which features are retained during the selection process. Features with absolute feature values higher than the threshold are retained, while the rest are discarded.

Once the feature selection has been completed, the main classifier model is then trained using the hyperparameter subsample set to 0.1. These hyperparameters for both the pre-trained and the trained model are set to try to prevent overfitting of the classifier. Grid search is then performed by doing 5-fold cross-validation and hyperparameter tuning with the combined training and validation subsets to find the best model, searching for the best possible hyperparameter combination. Once the best model is found, the model is saved using the joblib library and the training classification report, test classification report, the number of features kept in the best model, and the most important features ordered by importance value are printed in the results.

# Evaluation Results

The gradient-boosted tree classifier model was trained on a stratified-split dataset of 60:20:20, resulting in equal samples of real and fake reviews for all subsets. Two models were trained, one using the hold-out method and the other using 5-fold cross-validation with grid search. Both models utilize feature selection. 

For the model using the hold-out method, the classifier achieved 0.7115 accuracy and 0.7072 f1-score in training, 0.5769 accuracy and 0.5763 f1-score in validation, and 0.6731 accuracy and 0.67 f1-score in test. As indicated by the vast difference between training and validation subsets, the model showcases signs of overfitting, most likely caused by the low amount of samples in the dataset and the lack of regularization hyperparameters.

With grid search and a 5-fold cross-validation performed on the classifier, the best model parameters were set to learning_rate set to 0.1, max_depth set to 3, max_features was set to "sqrt", min_samples_leaf set to 2, min_samples_split set to 10, n_estimators set to 50, and subsample set to 0.5, With the feature selection threshold set to the median feature importance value, 5 out of 10 input features were retained in the best model, which are the word count, average word length, number of sentences, average sentence length, and number of adjectives, in descending order respectively. 

As a result, the best model achieves 0.7933 accuracy and 0.7929 f1-score in training and 0.6923 accuracy and 0.6882 f1-score in test. While the model slightly improved in accuracy due to 5-fold cross-validation and hyperparameter tuning performed by grid search, the model nevertheless still shows signs of overfitting despite additional regularization hyperparameters such as max_depth, max_features, and max_samples_split. Possible options to mitigate overfitting are early stopping by setting validation_fraction and n_iter_no_change hyperparameters to ensure the model stops training when validation set performance stagnates or worsens, additional regularization of the hyperparameters, such as max_leaf_nodes, and expanding the hyperparameter grid to account for more hyperparameter combinations.