# Threshold Adjustment

👇 Load the player `player_performances.csv` dataset to see what you will be working with.

In [2]:
import pandas as pd

data = pd.read_csv('data/player_performances.csv')

data.head()

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers,target_5y
0,36,27.4,7.4,2.6,7.6,34.7,0.5,2.1,25.0,1.6,2.3,69.9,0.7,3.4,4.1,1.9,0.4,0.4,1.3,0
1,35,26.9,7.2,2.0,6.7,29.6,0.7,2.8,23.5,2.6,3.4,76.5,0.5,2.0,2.4,3.7,1.1,0.5,1.6,0
2,74,15.3,5.2,2.0,4.7,42.2,0.4,1.7,24.4,0.9,1.3,67.0,0.5,1.7,2.2,1.0,0.5,0.3,1.0,0
3,58,11.6,5.7,2.3,5.5,42.6,0.1,0.5,22.6,0.9,1.3,68.9,1.0,0.9,1.9,0.8,0.6,0.1,1.0,1
4,48,11.5,4.5,1.6,3.0,52.4,0.0,0.1,0.0,1.3,1.9,67.4,1.0,1.5,2.5,0.3,0.3,0.4,0.8,1


ℹ️ Each observation represents a player and each column a characteristic of performance. The target `target_5y` defines whether the player has had a professional career of less than 5 years [0] or 5 years or more [1].

# Preprocessing

👇 To avoid spending too much time on the preprocessing, Robust Scale the entire feature set. This practice is not optimal, but can be used for preliminary preprocessing and/or to get models up and running quickly.

Save the scaled feature set as `X_scaled`.

In [4]:
# YOUR CODE HERE
from sklearn.preprocessing import RobustScaler
from sklearn.model_selection import train_test_split
rscaler = RobustScaler()

In [5]:
X_train, X_test, y_train, y_test = train_test_split(data.drop(columns='target_5y'), data['target_5y'])

In [6]:
rscaler.fit(X_train)

In [7]:
X_scaled = rscaler.transform(X_train)

In [14]:
X_test_scaled = rscaler.transform(X_test)

### ☑️ Check your code

In [8]:
from nbresult import ChallengeResult

result = ChallengeResult('scaled_features',
                         scaled_features = X_scaled
)

result.write()
print(result.check())


platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/nigel/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/nigel/code/jwnigel/data-threshold-adjustments/tests
plugins: anyio-3.6.1
[1mcollecting ... [0mcollected 1 item

test_scaled_features.py::TestScaled_features::test_scaled_features [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/scaled_features.pickle

[32mgit[39m commit -m [33m'Completed scaled_features step'[39m

[32mgit[39m push origin master



# Base modeling

🎯 The task is to detect players who will last 5 years minimum as professionals, with a 90% guarantee.

👇 Is a default Logistic Regression model going to satisfy the coach's requirements? Use cross-validation and save the score that supports your answer under variable name `base_score`.

In [9]:
# YOUR CODE HERE
from sklearn.linear_model import LogisticRegression

In [10]:
logmodel = LogisticRegression()

In [12]:
logmodel.fit(X_scaled, y_train)

In [13]:
from sklearn.model_selection import cross_val_score

In [17]:
base_score = cross_val_score(logmodel, X_test_scaled, y_test).mean()

### ☑️ Check your code

In [18]:
from nbresult import ChallengeResult

result = ChallengeResult('base_precision',
                         score = base_score
)

result.write()
print(result.check())


platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/nigel/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/nigel/code/jwnigel/data-threshold-adjustments/tests
plugins: anyio-3.6.1
[1mcollecting ... [0mcollected 1 item

test_base_precision.py::TestBase_precision::test_precision_score [32mPASSED[0m[32m  [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/base_precision.pickle

[32mgit[39m commit -m [33m'Completed base_precision step'[39m

[32mgit[39m push origin master



# Threshold adjustment

👇 Find the decision threshold that guarantees a 90% precision for a player to last 5 years or more as a professional. Save the threshold under variable name `new_threshold`.

<details>
<summary>💡 Hint</summary>

- Make cross validated probability predictions with [`cross_val_predict`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html)
    
- Plug the probabilities into [`precision_recall_curve`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html) to generate precision scores at different thresholds

- Find out which threshold guarantees a precision of 0.9
      
</details>



In [19]:
from sklearn.neighbors import KNeighborsClassifier

In [33]:
# YOUR CODE HERE
knnmodel = KNeighborsClassifier()

In [34]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import precision_recall_curve

802     1
271     1
1049    1
640     0
1204    1
       ..
911     1
1168    1
664     1
1258    1
975     0
Name: target_5y, Length: 996, dtype: int64

In [37]:
from sklearn.metrics import precision_recall_curve

In [39]:
data['prob_yes'], data['prob_no'] = cross_val_predict(knnmodel, X_scaled, y_train, method='predict_proba', cv=5)
precision, recall, threshhold = precision_recall_curve(X_scaled, data['prob_no'])

ValueError: too many values to unpack (expected 2)

### ☑️ Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('decision_threshold',
                         threshold = new_threshold
)

result.write()
print(result.check())

# Using the new threshold

🎯 The coach has spotted a potentially interesting player, but wants your 90% guarantee that he would last 5 years minimum as a pro. Download the player's data [here](https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_New_player.csv).

In [None]:
new_player = pd.read_csv("data/ML_New_player.csv")

new_player

❓ Would you risk recommending the player to the coach? Save your answer as string under variable name `recommendation` as "recommend" or "not recommend".

In [None]:
# YOUR CODE HERE

### ☑️ Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('recommendation',
                         recommendation = recommendation
)

result.write()
print(result.check())

# 🏁