# Modelling

### Contents

- [Code](#Code)
- [Results](#Results)

Here, we run the module that will be used for player prediction and squad generation. We will use the pycaret classification module. In this code, we assign the x values (player stats) and y values (binary recommendations).

We then split the the dataset into a 80/20 train:test split.

The 11 models (1 for each player role) are then saved. These model files are then loaded onto the streamlit app for prediction and squad generation.

## Code

In [10]:
import pandas as pd
from pycaret.classification import setup, compare_models, predict_model, load_model, finalize_model, save_model, pull
from sklearn.metrics import accuracy_score, f1_score
from sklearn.model_selection import train_test_split
import joblib

# Load the CSV file into a DataFrame
df = pd.read_csv('../data/players_df_sin_reco.csv')

# Assuming the last columns are the target labels and the rest are features
X = df.iloc[:, 6:20]  # Select columns 7 to 20 (0-based index: 6 to 19)
y = df.iloc[:, -11:]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# List of target column names
target_columns = y.columns

# Create an empty DataFrame to store results
results_df = pd.DataFrame(columns=['target', 'train_accuracy', 'test_accuracy', 'train_f1', 'test_f1', 'TT'])

for target in target_columns:
    X_target_train = X_train.copy()
    X_target_test = X_test.copy()
    y_target_train = y_train[target]
    y_target_test = y_test[target]
    df_target = pd.concat([X_target_train, y_target_train], axis=1)

    # Setup PyCaret environment
    clf = setup(data=df_target, target=target, session_id=42)

    # Compare different models
    best_model = compare_models()
    
    # Get the original results from compare_models, which includes 'TT'
    compare_results = pull()

    # Mapping from scikit-learn class names to PyCaret's model names
    model_name_mapping = {
        'LogisticRegression': 'Logistic Regression',
        'KNeighborsClassifier': 'K Neighbors Classifier',
        'DecisionTreeClassifier': 'Decision Tree Classifier',
        'RidgeClassifier': 'Ridge Classifier',
        'AdaBoostClassifier': 'Ada Boost Classifier',
        'GradientBoostingClassifier': 'Gradient Boosting Classifier',
        'LinearDiscriminantAnalysis': 'Linear Discriminant Analysis',
        'ExtraTreesClassifier': 'Extra Trees Classifier',
        'XGBClassifier': 'Extreme Gradient Boosting',
        'GaussianNB': 'Naive Bayes',
        'SVC': 'SVM - Linear Kernel',
        'RandomForestClassifier': 'Random Forest Classifier',
        'LGBMClassifier': 'Light Gradient Boosting Machine',
        'QuadraticDiscriminantAnalysis': 'Quadratic Discriminant Analysis',
        'DummyClassifier': 'Dummy Classifier'
    }

    # Extract the model name
    model_name = type(best_model).__name__
    pycaret_model_name = model_name_mapping.get(model_name)

    # Extract testing time (TT) from compare_models results by matching the PyCaret model name
    if pycaret_model_name:
        testing_time_row = compare_results[compare_results['Model'] == pycaret_model_name]
        if not testing_time_row.empty:
            testing_time = testing_time_row['TT (Sec)'].values[0]
        else:
            print(f"No match found for model {pycaret_model_name} in compare_results.")
            testing_time = None
    else:
        print(f"No mapping found for model {model_name}.")
        testing_time = None

    # Finalize the best model
    final_model = finalize_model(best_model)

    # Predict on the test set
    predictions_test = predict_model(final_model, data=X_target_test)
    predictions_train = predict_model(final_model, data=X_target_train)

    # Calculate test accuracy and F1 score
    y_pred_test = predictions_test['prediction_label']
    test_accuracy = accuracy_score(y_target_test, y_pred_test)
    test_f1 = f1_score(y_target_test, y_pred_test, average='weighted')

    # Calculate train accuracy and F1 score
    y_pred_train = predictions_train['prediction_label']
    train_accuracy = accuracy_score(y_target_train, y_pred_train)
    train_f1 = f1_score(y_target_train, y_pred_train, average='weighted')

    # Append results to DataFrame
    new_row = pd.DataFrame({'target': target, 
                            'train_accuracy': train_accuracy, 
                            'test_accuracy': test_accuracy, 
                            'train_f1': train_f1, 
                            'test_f1': test_f1, 
                            'TT': testing_time},index=[0])
    results_df = pd.concat([results_df, new_row], ignore_index=True)

    # Save the model
    save_model(final_model, f'model_{target}')

    # Load the model for future predictions
    loaded_model = load_model(f'model_{target}')

# Save the results to a PKL file using joblib
joblib.dump(results_df, 'finalmodel.pkl')


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Traditional Keeper
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
knn,K Neighbors Classifier,0.9833,0.9,0.9,0.85,0.8667,,0.8632,0.016
dt,Decision Tree Classifier,0.9833,0.89,0.9,0.85,0.8667,,0.8632,0.012
ridge,Ridge Classifier,0.9833,0.88,0.9,0.85,0.8667,,0.8632,0.012
rf,Random Forest Classifier,0.9833,0.9,0.8,0.8,0.8,,0.8,0.064
lda,Linear Discriminant Analysis,0.9833,0.88,0.9,0.85,0.8667,,0.8632,0.011
ada,Ada Boost Classifier,0.9667,0.89,0.8,0.75,0.7667,,0.7632,0.042
gbc,Gradient Boosting Classifier,0.9667,0.89,0.8,0.75,0.7667,,0.7632,0.035
et,Extra Trees Classifier,0.9667,0.9,0.8,0.75,0.7667,,0.7632,0.054
xgboost,Extreme Gradient Boosting,0.9667,0.9,0.8,0.75,0.7667,,0.7632,0.017
lr,Logistic Regression,0.9524,0.88,0.8,0.7,0.7333,,0.7278,0.021


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: KNeighborsClassifier
Mapped PyCaret model name: K Neighbors Classifier


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Sweeper Keeper
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.9833,0.9875,0.95,1.0,0.9667,0.9571,0.9632,0.025
et,Extra Trees Classifier,0.9548,0.9875,0.85,1.0,0.9,0.8748,0.8923,0.062
nb,Naive Bayes,0.9238,0.96,0.75,1.0,0.8333,0.7908,0.8201,0.012
rf,Random Forest Classifier,0.9238,0.9875,0.75,1.0,0.8333,0.7908,0.8201,0.073
knn,K Neighbors Classifier,0.9214,0.9688,0.75,0.9,0.8,0.7748,0.7923,0.016
ridge,Ridge Classifier,0.9214,0.9475,0.8,0.95,0.85,0.7998,0.8173,0.012
lda,Linear Discriminant Analysis,0.9214,0.9475,0.8,0.95,0.85,0.7998,0.8173,0.012
xgboost,Extreme Gradient Boosting,0.9071,0.95,0.8,0.9167,0.8133,0.7574,0.7909,0.02
svm,SVM - Linear Kernel,0.8905,0.9675,0.85,0.75,0.7733,0.716,0.7399,0.014
lightgbm,Light Gradient Boosting Machine,0.8881,0.92,0.8,0.85,0.7667,0.7034,0.7453,0.029


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: LogisticRegression
Mapped PyCaret model name: Logistic Regression


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Ball Playing Defender
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
et,Extra Trees Classifier,0.9833,1.0,1.0,0.9833,0.9909,0.9,0.9,0.102
ridge,Ridge Classifier,0.95,1.0,0.975,0.9667,0.9675,0.7667,0.7707,0.014
lda,Linear Discriminant Analysis,0.95,1.0,0.975,0.9667,0.9675,0.7667,0.7707,0.012
knn,K Neighbors Classifier,0.9357,0.985,0.98,0.9467,0.9596,0.7267,0.7363,0.016
rf,Random Forest Classifier,0.9357,1.0,0.98,0.95,0.9598,0.7696,0.773,0.073
lightgbm,Light Gradient Boosting Machine,0.9357,1.0,0.955,0.9633,0.9544,0.7934,0.807,0.034
lr,Logistic Regression,0.9333,1.0,0.975,0.95,0.9584,0.6667,0.6707,0.015
nb,Naive Bayes,0.9333,0.93,0.955,0.96,0.9546,0.8038,0.814,0.011
ada,Ada Boost Classifier,0.9333,0.99,0.975,0.9467,0.9564,0.7238,0.734,0.054
xgboost,Extreme Gradient Boosting,0.9214,1.0,0.955,0.9467,0.9453,0.7522,0.7715,0.02


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: ExtraTreesClassifier
Mapped PyCaret model name: Extra Trees Classifier


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_No Nonsense Defender
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.969,1.0,0.98,0.98,0.9778,0.9267,0.9363,0.023
et,Extra Trees Classifier,0.9381,1.0,0.975,0.9433,0.9544,0.8451,0.866,0.067
lightgbm,Light Gradient Boosting Machine,0.9381,0.9575,0.975,0.9433,0.9544,0.8451,0.866,0.03
ada,Ada Boost Classifier,0.9238,0.975,0.955,0.9433,0.9433,0.8147,0.8391,0.039
xgboost,Extreme Gradient Boosting,0.9238,0.9792,0.975,0.9183,0.9437,0.8148,0.8327,0.019
knn,K Neighbors Classifier,0.9167,0.9875,0.975,0.92,0.9413,0.7952,0.8237,0.016
ridge,Ridge Classifier,0.9048,0.9875,0.955,0.9183,0.9326,0.7677,0.7891,0.011
lda,Linear Discriminant Analysis,0.9048,0.9875,0.955,0.9183,0.9326,0.7677,0.7891,0.011
rf,Random Forest Classifier,0.8905,0.99,0.93,0.9233,0.9179,0.7385,0.773,0.065
nb,Naive Bayes,0.8738,0.9238,0.905,0.925,0.9062,0.6921,0.7083,0.01


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: LogisticRegression
Mapped PyCaret model name: Logistic Regression


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Full Back
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.9214,0.9675,0.955,0.95,0.9455,0.7951,0.8083,0.036
ridge,Ridge Classifier,0.8881,0.955,0.955,0.9033,0.9211,0.7236,0.7613,0.011
et,Extra Trees Classifier,0.8738,0.95,0.955,0.8914,0.9135,0.6648,0.6967,0.056
lda,Linear Discriminant Analysis,0.8714,0.955,0.93,0.8983,0.9072,0.6915,0.723,0.011
knn,K Neighbors Classifier,0.869,0.9938,0.955,0.8933,0.9124,0.6505,0.6702,0.018
rf,Random Forest Classifier,0.8405,0.9438,0.955,0.8648,0.8957,0.5505,0.5702,0.073
ada,Ada Boost Classifier,0.8405,0.8875,0.93,0.865,0.8894,0.5948,0.6168,0.042
nb,Naive Bayes,0.8381,0.89,0.88,0.8767,0.8719,0.6427,0.6641,0.012
xgboost,Extreme Gradient Boosting,0.8381,0.9025,0.93,0.8717,0.8894,0.5772,0.5965,0.02
lightgbm,Light Gradient Boosting Machine,0.8071,0.9125,0.905,0.8514,0.864,0.4934,0.507,0.03


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: LogisticRegression
Mapped PyCaret model name: Logistic Regression


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_All Action Midfielder
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.9381,1.0,0.955,0.96,0.9524,0.8559,0.8745,0.024
knn,K Neighbors Classifier,0.9333,0.9375,1.0,0.9267,0.9578,0.8143,0.8265,0.016
ridge,Ridge Classifier,0.9048,0.9792,0.98,0.9,0.9333,0.7677,0.799,0.012
gbc,Gradient Boosting Classifier,0.9048,0.9317,0.905,0.95,0.9246,0.7916,0.798,0.038
lda,Linear Discriminant Analysis,0.9048,0.9792,0.98,0.9,0.9333,0.7677,0.799,0.023
nb,Naive Bayes,0.8905,0.9192,0.885,0.955,0.9103,0.767,0.7887,0.01
dt,Decision Tree Classifier,0.8881,0.8775,0.905,0.93,0.9135,0.7487,0.7613,0.01
xgboost,Extreme Gradient Boosting,0.8738,0.9292,0.93,0.89,0.9056,0.7077,0.7294,0.028
rf,Random Forest Classifier,0.8714,0.9667,0.95,0.87,0.9056,0.6952,0.7196,0.075
lightgbm,Light Gradient Boosting Machine,0.869,0.975,0.98,0.8733,0.9156,0.641,0.6628,0.029


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: LogisticRegression
Mapped PyCaret model name: Logistic Regression


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Midfield Playmaker
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.9571,0.98,0.9633,0.9833,0.9707,0.8872,0.9021,0.021
et,Extra Trees Classifier,0.9571,0.9733,0.9833,0.9714,0.9742,0.8588,0.8645,0.069
xgboost,Extreme Gradient Boosting,0.9429,0.9633,0.9633,0.9714,0.9631,0.8284,0.8376,0.023
lightgbm,Light Gradient Boosting Machine,0.9405,0.9833,0.9833,0.9548,0.9652,0.7588,0.7645,0.041
ada,Ada Boost Classifier,0.9381,0.9483,0.9633,0.9667,0.9616,0.7748,0.7923,0.044
nb,Naive Bayes,0.9262,0.9233,0.9433,0.9667,0.9525,0.7924,0.802,0.011
lda,Linear Discriminant Analysis,0.9262,0.99,0.9633,0.9467,0.9527,0.7476,0.7591,0.012
knn,K Neighbors Classifier,0.9238,0.9533,0.9633,0.95,0.954,0.6993,0.7111,0.016
rf,Random Forest Classifier,0.9238,0.9633,0.9633,0.9514,0.9542,0.7388,0.7445,0.073
svm,SVM - Linear Kernel,0.9095,0.9633,0.9833,0.9214,0.947,0.6176,0.6291,0.01


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: LogisticRegression
Mapped PyCaret model name: Logistic Regression


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Traditional Winger
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.9548,1.0,0.9667,0.9417,0.9457,0.9082,0.9187,0.031
ridge,Ridge Classifier,0.8738,0.9583,0.7833,0.9167,0.8133,0.7227,0.7544,0.01
lda,Linear Discriminant Analysis,0.8738,0.9583,0.7833,0.9167,0.8133,0.7227,0.7544,0.011
xgboost,Extreme Gradient Boosting,0.8714,0.9292,0.8667,0.8333,0.83,0.7305,0.7506,0.026
ada,Ada Boost Classifier,0.8524,0.9625,0.9,0.7983,0.8195,0.7057,0.7363,0.044
gbc,Gradient Boosting Classifier,0.8262,0.9375,0.7,0.8667,0.7333,0.6145,0.6535,0.05
nb,Naive Bayes,0.8238,0.8896,0.8667,0.7733,0.7852,0.6472,0.6843,0.01
rf,Random Forest Classifier,0.8119,0.9042,0.75,0.8233,0.7538,0.6104,0.6395,0.066
et,Extra Trees Classifier,0.8095,0.9146,0.7333,0.7817,0.7229,0.5871,0.6162,0.054
dt,Decision Tree Classifier,0.8048,0.7833,0.7167,0.7833,0.7133,0.5698,0.6021,0.01


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: LogisticRegression
Mapped PyCaret model name: Logistic Regression


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Inverted Winger
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.969,0.9889,0.9667,0.975,0.9657,0.9387,0.9457,0.024
ridge,Ridge Classifier,0.9524,0.9889,0.9667,0.9417,0.9524,0.9053,0.9083,0.011
lda,Linear Discriminant Analysis,0.9524,0.9889,0.9667,0.9417,0.9524,0.9053,0.9083,0.011
et,Extra Trees Classifier,0.95,0.9833,0.9,1.0,0.94,0.9,0.9121,0.059
rf,Random Forest Classifier,0.919,0.9444,0.9,0.95,0.9114,0.8387,0.8578,0.078
gbc,Gradient Boosting Classifier,0.919,0.9556,0.9,0.925,0.9057,0.8362,0.8437,0.048
knn,K Neighbors Classifier,0.9167,0.9556,0.8667,0.975,0.8957,0.8333,0.8569,0.017
ada,Ada Boost Classifier,0.9048,0.9778,0.9333,0.8917,0.9038,0.8082,0.8228,0.043
xgboost,Extreme Gradient Boosting,0.8762,0.95,0.9,0.8517,0.8674,0.7545,0.7671,0.026
nb,Naive Bayes,0.869,0.9,0.8333,0.8917,0.8524,0.7362,0.7478,0.011


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: LogisticRegression
Mapped PyCaret model name: Logistic Regression


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Goal Poacher
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
knn,K Neighbors Classifier,0.8905,0.9346,0.7833,0.8333,0.7767,0.7285,0.7531,0.026
et,Extra Trees Classifier,0.8905,0.9383,0.7333,0.8667,0.7733,0.7218,0.7446,0.057
lr,Logistic Regression,0.8571,0.9358,0.7667,0.8167,0.7767,0.6734,0.687,0.025
ridge,Ridge Classifier,0.8571,0.9167,0.7167,0.7833,0.73,0.6493,0.6652,0.013
lda,Linear Discriminant Analysis,0.8571,0.9167,0.7167,0.7833,0.73,0.6493,0.6652,0.013
qda,Quadratic Discriminant Analysis,0.8286,0.8333,0.5833,0.75,0.6371,0.5587,0.5807,0.01
rf,Random Forest Classifier,0.8071,0.9058,0.6167,0.7583,0.6557,0.5397,0.5641,0.069
svm,SVM - Linear Kernel,0.7952,0.8742,0.65,0.6817,0.6229,0.53,0.5684,0.013
ada,Ada Boost Classifier,0.7762,0.8667,0.7167,0.6083,0.6424,0.4949,0.5195,0.039
xgboost,Extreme Gradient Boosting,0.75,0.855,0.6333,0.625,0.609,0.4293,0.4466,0.02


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: KNeighborsClassifier
Mapped PyCaret model name: K Neighbors Classifier


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Description,Value
0,Session id,42
1,Target,Class_Target Man
2,Target type,Binary
3,Original data shape,"(91, 15)"
4,Transformed data shape,"(91, 15)"
5,Transformed train set shape,"(63, 15)"
6,Transformed test set shape,"(28, 15)"
7,Numeric features,14
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
rf,Random Forest Classifier,0.969,1.0,1.0,0.955,0.9746,0.9362,0.9437,0.07
xgboost,Extreme Gradient Boosting,0.969,0.9889,1.0,0.955,0.9746,0.9362,0.9437,0.024
et,Extra Trees Classifier,0.9548,1.0,1.0,0.935,0.9635,0.9058,0.9168,0.062
knn,K Neighbors Classifier,0.9381,0.9681,1.0,0.91,0.9492,0.8725,0.8875,0.017
ada,Ada Boost Classifier,0.9214,0.9889,0.95,0.935,0.9349,0.832,0.8527,0.043
lr,Logistic Regression,0.919,0.9556,0.975,0.915,0.9353,0.8362,0.8592,0.016
gbc,Gradient Boosting Classifier,0.9048,0.9167,0.95,0.91,0.9206,0.8058,0.8289,0.044
lightgbm,Light Gradient Boosting Machine,0.8929,0.9722,0.9167,0.9017,0.9052,0.7808,0.7918,0.036
ridge,Ridge Classifier,0.8857,0.9556,0.9083,0.8967,0.8984,0.7625,0.7716,0.012
lda,Linear Discriminant Analysis,0.8857,0.9556,0.9083,0.8967,0.8984,0.7625,0.7716,0.01


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

Index(['Model', 'Accuracy', 'AUC', 'Recall', 'Prec.', 'F1', 'Kappa', 'MCC',
       'TT (Sec)'],
      dtype='object')
Best model name: RandomForestClassifier
Mapped PyCaret model name: Random Forest Classifier


Transformation Pipeline and Model Successfully Saved
Transformation Pipeline and Model Successfully Loaded


['finalmodel.pkl']

## Results

The average model accuracy is 0.9485, indicating that our predictor will predict the roles correctly at a high level.

| Average Model Accuracy  | 0.9485 |
|-------------------------|--------|

There is on the whole a very high train/test accuracy and f1 for all the top models taken for each role.

In [11]:
#Display the train/test accuracy and F1, and time taken
results_df

Unnamed: 0,target,train_accuracy,test_accuracy,train_f1,test_f1,TT
0,Class_Traditional Keeper,0.978022,1.0,0.978664,1.0,0.016
1,Class_Sweeper Keeper,1.0,1.0,1.0,1.0,0.025
2,Class_Ball Playing Defender,1.0,0.956522,1.0,0.960339,0.102
3,Class_No Nonsense Defender,1.0,0.956522,1.0,0.954694,0.023
4,Class_Full Back,1.0,0.956522,1.0,0.952704,0.036
5,Class_All Action Midfielder,1.0,0.869565,1.0,0.875049,0.024
6,Class_Midfield Playmaker,1.0,1.0,1.0,1.0,0.021
7,Class_Traditional Winger,1.0,1.0,1.0,1.0,0.031
8,Class_Inverted Winger,1.0,0.869565,1.0,0.870062,0.024
9,Class_Goal Poacher,0.923077,0.869565,0.92279,0.864803,0.026


The average train accuracy and F1 lies at around 0.99, while test accuracy and F1 lies at just below 0.93. Time taken averages at 0.03secs, which is very fast.

| train_accuracy | test_accuracy | train_f1 | test_f1 | TT     |
|----------------|---------------|----------|---------|--------|
| 0.9910         | 0.9289        | 0.9910   | 0.9275  | 0.0362 |

Over here, we can see that accuracy of the top 3 models for each role are all at a very high level, with the lowest being 0.8524. 

| Role                        |   M1   |   M2   |   M3   |
|-----------------------------|:------:|:------:|:------:|
| Class_Traditional Keeper    | 0.9857 | 0.9857 | 0.9857 |
| Class_Sweeper Keeper        | 0.9381 | 0.9214 | 0.9214 |
| Class_Ball Playing Defender | 0.9714 | 0.9690 | 0.9690 |
| Class_No Nonsense Defender  | 0.9548 | 0.9524 | 0.9524 |
| Class_Full Back             | 0.9357 | 0.9048 | 0.9048 |
| Class_All Action Midfielder | 0.9333 | 0.9214 | 0.9214 |
| Class_Midfield Playmaker    | 0.9405 | 0.9405 | 0.9405 |
| Class_Traditional Winger    | 0.9690 | 0.8905 | 0.8905 |
| Class_Inverted Winger       | 0.9500 | 0.9381 | 0.9190 |
| Class_Goal Poacher          | 0.9381 | 0.8548 | 0.8524 |
| Class_Target Man            | 0.9167 | 0.9095 | 0.8857 |

For greater clarity, we can see the top 3 models that produced the accuracy above for each role. Most of the top models were Logistic Regression, while 2 were K Neighbors Classifier, and 1 each were Extra Trees Classifier and Random Forest Classifier.

|             Role            |        1st Model       |        2nd Model       |          3rd Model         |
|:---------------------------:|:----------------------:|:----------------------:|:--------------------------:|
| Class_Traditional Keeper    | KNeighborsClassifier   | DecisionTreeClassifier | RidgeClassifier            |
| Class_Sweeper Keeper        | LogisticRegression     | ExtraTreesClassifier   | GaussianNB                 |
| Class_Ball Playing Defender | ExtraTreesClassifier   | RidgeClassifier        | LinearDiscriminantAnalysis |
| Class_No Nonsense Defender  | LogisticRegression     | ExtraTreesClassifier   | LGBMClassifier             |
| Class_Full Back             | LogisticRegression     | RidgeClassifier        | ExtraTreesClassifier       |
| Class_All Action Midfielder | LogisticRegression     | KNeighborsClassifier   | RidgeClassifier            |
| Class_Midfield Playmaker    | LogisticRegression     | ExtraTreesClassifier   | XGBClassifier              |
| Class_Traditional Winger    | LogisticRegression     | RidgeClassifier        | LinearDiscriminantAnalysis |
| Class_Inverted Winger       | LogisticRegression     | RidgeClassifier        | LinearDiscriminantAnalysis |
| Class_Goal Poacher          | KNeighborsClassifier   | ExtraTreesClassifier   | LogisticRegression         |
| Class_Target Man            | RandomForestClassifier | XGBClassifier          | ExtraTreesClassifier       |

For further breakdowns on how all models performed on each role, please refer to 02a-modellingappendix