# Threshold Adjustment

👇 Load the player `player_performances.csv` dataset to see what you will be working with.

In [12]:
import pandas as pd
df = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_Player_performance.csv")

In [13]:
df.head()

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers,target_5y
0,36,27.4,7.4,2.6,7.6,34.7,0.5,2.1,25.0,1.6,2.3,69.9,0.7,3.4,4.1,1.9,0.4,0.4,1.3,0
1,35,26.9,7.2,2.0,6.7,29.6,0.7,2.8,23.5,2.6,3.4,76.5,0.5,2.0,2.4,3.7,1.1,0.5,1.6,0
2,74,15.3,5.2,2.0,4.7,42.2,0.4,1.7,24.4,0.9,1.3,67.0,0.5,1.7,2.2,1.0,0.5,0.3,1.0,0
3,58,11.6,5.7,2.3,5.5,42.6,0.1,0.5,22.6,0.9,1.3,68.9,1.0,0.9,1.9,0.8,0.6,0.1,1.0,1
4,48,11.5,4.5,1.6,3.0,52.4,0.0,0.1,0.0,1.3,1.9,67.4,1.0,1.5,2.5,0.3,0.3,0.4,0.8,1


ℹ️ Each observations represents a player and each column a characteristic of performance. The target `target_5y` defines whether the player has had a professional career of less than 5 years [0] or 5 years or more [1].

# Preprocessing

👇 To avoid spending too much time on the preprocessing, Robust Scale the entire feature set. This practice is not optimal, but can be used for preliminary preprocessing and/or to get models up and running quickly.

Save the scaled feature set as `X_scaled`.

In [16]:
X = df.drop("target_5y", axis = 1)

In [17]:
y= df["target_5y"]

In [18]:
from sklearn.preprocessing import RobustScaler


scaler = RobustScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)

In [21]:
pd.DataFrame(X_scaled).describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
count,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0,1329.0
mean,-0.086581,0.127928,0.239248,0.267381,0.262496,0.028623,0.374153,0.40469,-0.095135,0.275258,0.233796,-0.069754,0.2076,0.200903,0.211377,0.327851,0.240783,0.41535,0.245015
std,0.582703,0.688218,0.856014,0.843425,0.856862,0.805196,0.96152,0.886546,0.493013,0.899401,0.947611,0.819519,0.779178,0.85154,0.824943,1.05297,0.821066,1.071336,0.905587
min,-1.733333,-1.07438,-0.960784,-0.9,-0.952381,-2.644737,-0.25,-0.25,-0.689231,-0.909091,-1.071429,-5.570313,-0.8,-0.9375,-0.88,-0.785714,-1.0,-0.5,-1.125
25%,-0.533333,-0.438017,-0.372549,-0.35,-0.357143,-0.486842,-0.25,-0.25,-0.689231,-0.363636,-0.428571,-0.507813,-0.4,-0.4375,-0.4,-0.357143,-0.4,-0.25,-0.375
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.466667,0.561983,0.627451,0.65,0.642857,0.513158,0.75,0.75,0.310769,0.636364,0.571429,0.492187,0.6,0.5625,0.6,0.642857,0.6,0.75,0.625
max,0.633333,2.049587,4.431373,4.05,3.571429,3.921053,5.5,5.166667,2.387692,6.090909,6.214286,2.242188,4.5,4.9375,4.56,6.785714,4.0,9.25,4.25


### ☑️ Check your code

In [22]:
from nbresult import ChallengeResult

result = ChallengeResult('scaled_features',
                         scaled_features = X_scaled
)

result.write()
print(result.check())

platform linux -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jakob/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/jakob/code/jahlah/data_challenges_TA/data-challenges/05-ML/03-Performance-metrics/03-Threshold-Adjustments
plugins: anyio-3.4.0
[1mcollecting ... [0mcollected 1 item

tests/test_scaled_features.py::TestScaled_features::test_scaled_features [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/scaled_features.pickle

[32mgit[39m commit -m [33m'Completed scaled_features step'[39m

[32mgit[39m push origin master


# Base modelling

🎯 The task is to detect players who will last 5 years minimum as professionals, with a 90% guarantee.

👇 Is a default Logistic Regression model going to satisfy the coach's requirements? Use cross validation and save the score that supports your answer under variable name `base_score`.

In [24]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate

cv_results = cross_validate(LogisticRegression(max_iter = 1000), X_scaled, y, cv = 10, scoring = ["precision"])

In [29]:
base_score = cv_results["test_precision"].mean()
base_score

0.7359367405495885

### ☑️ Check your code

In [30]:
from nbresult import ChallengeResult

result = ChallengeResult('base_precision',
                         score = base_score
)

result.write()
print(result.check())

platform linux -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jakob/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/jakob/code/jahlah/data_challenges_TA/data-challenges/05-ML/03-Performance-metrics/03-Threshold-Adjustments
plugins: anyio-3.4.0
[1mcollecting ... [0mcollected 1 item

tests/test_base_precision.py::TestBase_precision::test_precision_score [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/base_precision.pickle

[32mgit[39m commit -m [33m'Completed base_precision step'[39m

[32mgit[39m push origin master


# Threshold adjustment

👇 Find the decision threshold that guarantees a 90% precision for a player to last 5 years or more as a professional. Save the threshold under variable name `new_threshold`.

<details>
<summary>💡 Hint</summary>

- Make cross validated probability predictions with [`cross_val_predict`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html)
    
- Plug the probabilities into [`precision_recall_curve`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html) to generate precision scores at different thresholds

- Find out which threshold guarantees a precision of 0.9
      
</details>



In [32]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import precision_recall_curve

In [37]:
y_pred_probas_0, y_pred_probas_1 = cross_val_predict(LogisticRegression(), X_scaled, y, method = "predict_proba").T

In [42]:
y_pred_probas_0.shape, y_pred_probas_1.shape, y.shape

((1329,), (1329,), (1329,))

In [59]:
precision, recall, thresholds = precision_recall_curve(y, y_pred_probas_1)

In [60]:
precision.shape, thresholds.shape

((1298,), (1297,))

In [61]:
df_precision = pd.DataFrame({"precision": precision[:-1], "threshold": thresholds})

In [69]:
new_threshold = df_precision[df_precision["precision"] >= 0.9]["threshold"].min()
new_threshold

0.8626977113270222

### ☑️ Check your code

In [70]:
from nbresult import ChallengeResult

result = ChallengeResult('decision_threshold',
                         threshold = new_threshold
)

result.write()
print(result.check())

platform linux -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jakob/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/jakob/code/jahlah/data_challenges_TA/data-challenges/05-ML/03-Performance-metrics/03-Threshold-Adjustments
plugins: anyio-3.4.0
[1mcollecting ... [0mcollected 1 item

tests/test_decision_threshold.py::TestDecision_threshold::test_new_threshold [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/decision_threshold.pickle

[32mgit[39m commit -m [33m'Completed decision_threshold step'[39m

[32mgit[39m push origin master


# Using the new threshold

🎯 The coach has spotted a potentially interesting player, but wants your 90% guarantee that he would last 5 years minimum as a pro. Download the player's data [here](https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_New_player.csv).

❓ Would you risk recommending the player to the coach? Save your answer as string under variable name `recommendation` as "recommend" or "not recommend".

In [71]:
new_player = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_New_player.csv")

In [72]:
new_player

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers
0,80,31.4,14.3,5.9,11.1,52.5,0.0,0.1,11.1,2.6,3.9,65.4,3.0,5.0,8.0,2.4,1.1,0.8,2.2


In [75]:
new_player = scaler.transform(new_player)

In [76]:
model = LogisticRegression()
model.fit(X_scaled, y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [84]:
recommendation = "recommend" if model.predict_proba(new_player)[0][1] > new_threshold else "not recommend"

### ☑️ Check your code

In [85]:
from nbresult import ChallengeResult

result = ChallengeResult('recommendation',
                         recommendation = recommendation
)

result.write()
print(result.check())

platform linux -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jakob/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/jakob/code/jahlah/data_challenges_TA/data-challenges/05-ML/03-Performance-metrics/03-Threshold-Adjustments
plugins: anyio-3.4.0
[1mcollecting ... [0mcollected 1 item

tests/test_recommendation.py::TestRecommendation::test_recommendation [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/recommendation.pickle

[32mgit[39m commit -m [33m'Completed recommendation step'[39m

[32mgit[39m push origin master


# 🏁