# Threshold Adjustment

In this exercice, you will adjust the **threshold** of a **Logistic Regression** model to enhance its **Precision**.

In [2]:
import pandas as pd

data = pd.read_csv('data/player_performances.csv')

data.head()

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers,target_5y
0,36,27.4,7.4,2.6,7.6,34.7,0.5,2.1,25.0,1.6,2.3,69.9,0.7,3.4,4.1,1.9,0.4,0.4,1.3,0
1,35,26.9,7.2,2.0,6.7,29.6,0.7,2.8,23.5,2.6,3.4,76.5,0.5,2.0,2.4,3.7,1.1,0.5,1.6,0
2,74,15.3,5.2,2.0,4.7,42.2,0.4,1.7,24.4,0.9,1.3,67.0,0.5,1.7,2.2,1.0,0.5,0.3,1.0,0
3,58,11.6,5.7,2.3,5.5,42.6,0.1,0.5,22.6,0.9,1.3,68.9,1.0,0.9,1.9,0.8,0.6,0.1,1.0,1
4,48,11.5,4.5,1.6,3.0,52.4,0.0,0.1,0.0,1.3,1.9,67.4,1.0,1.5,2.5,0.3,0.3,0.4,0.8,1


Each observations represents a player and each column a characteristic of performance. The target defines whether the player has had a professional career of less than 5 years [0] or 5 years or more [1].

The task is to build a model capable of being 90% correct when it identifies players who will last 5 or more years as professionals. In Machine Learning terms, the model needs to have a 90% **precision**.

## Preprocessing

👇 Drop the rows that contain missing data

👇 Scale the features

##  Logistic Regression

👇 Cross validate a `LogisticRegression` model and score its precision.

## Threshold adjustment

👇 Find the decision threshold that guarantees a 90% precision for a player to last 5 years or more as a professional.

<details>
<summary>💡 Hint</summary>

- Make cross validated probability predictions with [`cross_val_predict`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html)
    
- Plug the probabilities into [`precision_recall_curve`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html) to generate precision scores at different thresholds

- Find out which threshold guarantees a precision of 0.9
      
</details>



## Using the new threshold

In [3]:
new_player = pd.read_csv("data/new_player.csv")

new_player

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers
0,76.0,25.4,12.0,4.5,10.2,44.1,0.1,0.4,13.3,3.0,3.9,78.2,1.5,2.0,3.4,1.4,0.9,0.3,1.9


👇 Given the new threshold, can you give a 90% guarantee that the following player will last at least 5 years as a pro? Compute an answer.

# 🏁