# Threshold Adjustment

👇 Load the player `player_performances.csv` dataset to see what you will be working with.

In [1]:
import pandas as pd

data = pd.read_csv('data/player_performances.csv')

data.head()

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers,target_5y
0,36,27.4,7.4,2.6,7.6,34.7,0.5,2.1,25.0,1.6,2.3,69.9,0.7,3.4,4.1,1.9,0.4,0.4,1.3,0
1,35,26.9,7.2,2.0,6.7,29.6,0.7,2.8,23.5,2.6,3.4,76.5,0.5,2.0,2.4,3.7,1.1,0.5,1.6,0
2,74,15.3,5.2,2.0,4.7,42.2,0.4,1.7,24.4,0.9,1.3,67.0,0.5,1.7,2.2,1.0,0.5,0.3,1.0,0
3,58,11.6,5.7,2.3,5.5,42.6,0.1,0.5,22.6,0.9,1.3,68.9,1.0,0.9,1.9,0.8,0.6,0.1,1.0,1
4,48,11.5,4.5,1.6,3.0,52.4,0.0,0.1,0.0,1.3,1.9,67.4,1.0,1.5,2.5,0.3,0.3,0.4,0.8,1


ℹ️ Each observation represents a player and each column a characteristic of performance. The target `target_5y` defines whether the player has had a professional career of less than 5 years [0] or 5 years or more [1].

# Preprocessing

👇 To avoid spending too much time on the preprocessing, Robust Scale the entire feature set. This practice is not optimal, but can be used for preliminary preprocessing and/or to get models up and running quickly.

Save the scaled feature set as `X_scaled`.

In [2]:
X = data.drop(columns=['target_5y'])

In [3]:
y = data.target_5y

In [4]:
from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()

X_scaled = scaler.fit_transform(X)

### ☑️ Check your code

In [5]:
from nbresult import ChallengeResult

result = ChallengeResult('scaled_features',
                         scaled_features = X_scaled
)

result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/bingobango/.pyenv/versions/tom/bin/python3
cachedir: .pytest_cache
rootdir: /Users/bingobango/code/lewagon/data-threshold-adjustments/tests
plugins: anyio-3.6.1, asyncio-0.19.0, typeguard-2.13.3
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_scaled_features.py::TestScaled_features::test_scaled_features [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/scaled_features.pickle

[32mgit[39m commit -m [33m'Completed scaled_features step'[39m

[32mgit[39m push origin master



# Base modeling

🎯 The task is to detect players who will last 5 years minimum as professionals, with a 90% guarantee.

👇 Is a default Logistic Regression model going to satisfy the coach's requirements? Use cross-validation and save the score that supports your answer under variable name `base_score`.

In [6]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate

In [7]:
log_model = LogisticRegression()

In [8]:
cv_results = cross_validate(log_model, X_scaled, y, cv=5)

In [9]:
cv_results

{'fit_time': array([0.01823711, 0.01434708, 0.01132107, 0.01122212, 0.01061082]),
 'score_time': array([0.00052977, 0.00040603, 0.00035   , 0.00042105, 0.00032616]),
 'test_score': array([0.65789474, 0.73308271, 0.71052632, 0.69811321, 0.70566038])}

In [10]:
base_score = cv_results['test_score'].mean()

### ☑️ Check your code

In [11]:
from nbresult import ChallengeResult

result = ChallengeResult('base_precision',
                         score = base_score
)

result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/bingobango/.pyenv/versions/tom/bin/python3
cachedir: .pytest_cache
rootdir: /Users/bingobango/code/lewagon/data-threshold-adjustments/tests
plugins: anyio-3.6.1, asyncio-0.19.0, typeguard-2.13.3
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_base_precision.py::TestBase_precision::test_precision_score [32mPASSED[0m[32m  [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/base_precision.pickle

[32mgit[39m commit -m [33m'Completed base_precision step'[39m

[32mgit[39m push origin master



# Threshold adjustment

👇 Find the decision threshold that guarantees a 90% precision for a player to last 5 years or more as a professional. Save the threshold under variable name `new_threshold`.

<details>
<summary>💡 Hint</summary>

- Make cross validated probability predictions with [`cross_val_predict`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html)
    
- Plug the probabilities into [`precision_recall_curve`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html) to generate precision scores at different thresholds

- Find out which threshold guarantees a precision of 0.9
      
</details>



In [12]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import precision_recall_curve

In [13]:
# Predict probabilities
y_pred_probas_0, y_pred_probas_1 = cross_val_predict(LogisticRegression(),
                                                     X_scaled, data['target_5y'],
                                                     method = "predict_proba").T

In [20]:
# Generate precision and thresholds (and recalls) using probabilities for class 1
precision, recall, thresholds = precision_recall_curve(data['target_5y'], y_pred_probas_1)

In [30]:
# Populate dataframe with precision and threshold
df_precision = pd.DataFrame({"precision" : precision[:-1], "threshold" : thresholds})
df_precision

Unnamed: 0,precision,threshold
0,0.621988,0.043222
1,0.622457,0.071775
2,0.622926,0.071827
3,0.623396,0.081373
4,0.623112,0.095488
...,...,...
1294,1.000000,0.987010
1295,1.000000,0.987417
1296,1.000000,0.987636
1297,1.000000,0.993303


In [33]:
new_threshold = df_precision[df_precision['precision'] >= 0.9]['threshold'].min()
new_threshold

0.8666405182816879

### ☑️ Check your code

In [34]:
from nbresult import ChallengeResult

result = ChallengeResult('decision_threshold',
                         threshold = new_threshold
)

result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/bingobango/.pyenv/versions/tom/bin/python3
cachedir: .pytest_cache
rootdir: /Users/bingobango/code/lewagon/data-threshold-adjustments/tests
plugins: anyio-3.6.1, asyncio-0.19.0, typeguard-2.13.3
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_decision_threshold.py::TestDecision_threshold::test_new_threshold [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/decision_threshold.pickle

[32mgit[39m commit -m [33m'Completed decision_threshold step'[39m

[32mgit[39m push origin master



# Using the new threshold

🎯 The coach has spotted a potentially interesting player, but wants your 90% guarantee that he would last 5 years minimum as a pro. Download the player's data [here](https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_New_player.csv).

In [36]:
new_player = pd.read_csv("data/ML_New_player.csv")

new_player

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers
0,80,31.4,14.3,5.9,11.1,52.5,0.0,0.1,11.1,2.6,3.9,65.4,3.0,5.0,8.0,2.4,1.1,0.8,2.2


❓ Would you risk recommending the player to the coach? Save your answer as string under variable name `recommendation` as "recommend" or "not recommend".

In [37]:
new_player_scaled = scaler.transform(new_player)

In [38]:
model = LogisticRegression()
model.fit(X_scaled, y)

In [40]:
probs = model.predict_proba(new_player_scaled)[0][1]
probs

0.9454837979678007

In [58]:
def custom_predict(X,custom_threshold):
    probs = model.predict_proba(X)
    expensive_probs = probs[:,1]
    return (expensive_probs > custom_threshold)

In [61]:
custom_prediction = custom_predict(new_player_scaled,new_threshold)[0]
custom_prediction

True

In [62]:
recommendation = 'recommend'

### ☑️ Check your code

In [63]:
from nbresult import ChallengeResult

result = ChallengeResult('recommendation',
                         recommendation = recommendation
)

result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/bingobango/.pyenv/versions/tom/bin/python3
cachedir: .pytest_cache
rootdir: /Users/bingobango/code/lewagon/data-threshold-adjustments/tests
plugins: anyio-3.6.1, asyncio-0.19.0, typeguard-2.13.3
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_recommendation.py::TestRecommendation::test_recommendation [32mPASSED[0m[32m   [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/recommendation.pickle

[32mgit[39m commit -m [33m'Completed recommendation step'[39m

[32mgit[39m push origin master



# 🏁