# Model Callibration

Model callibration is the process of adjusting the output of a model to better match the true probabilities. This is key for utilising an xG model so we can understand the probability of a goal being scored from a given shot. For example a shot with a 0.2 xG should have a 20% chance of being scored, with a 0.8 xG having an 80% chance of being scored. 

Generally when MLP are trained the outputs are not calibrated to the true probabilities. This is where Platt Scaling comes in.

## Platt Scaling

Platt Scaling is typically used for calibrating models for binary classification. It involves training a logistic regression model on the output from your original model using a separate calibration dataset.

The Scikit-learn library includes techniques for improving classifier accuracy through probability calibration.

Platt Scaling (Logistic Regression) method: Fitting a logistic regression model to the classifier’s output probabilities is accomplished by this method. A calibrated probability mapping function from the original probabilities is established by using maximum likelihood estimation. With the assistance of logistic regression, Scikit-learn’s CalibratedClassifierCV class facilitates probability calibration.

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV
import numpy as np
 
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
 
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# Train a logistic regression classifier
clf = LogisticRegression()
clf.fit(X_train, y_train)
 
# Create a calibrated classifier using Platt Scaling
calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid')
calibrated_clf.fit(X_train, y_train)
 
# Make predictions on the testing set
y_proba = calibrated_clf.predict_proba(X_test)
 
# Print the predicted class labels and probabilities
print("Predicted Probabilities:\n",[np.argmax(prob) for prob in y_proba])
```



In [2]:
import torch
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV
from torch.utils.data import DataLoader

In [None]:
def platt_scaling(model, validation_loader, device):
    model.eval()
    logits_list = []
    labels_list = []

    # Collect logits and labels from the validation set
    with torch.no_grad():
        for inputs, labels in validation_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            logits_list.append(outputs.cpu().numpy())
            labels_list.append(labels.cpu().numpy())

    logits = np.concatenate(logits_list)
    labels = np.concatenate(labels_list)

    # Train logistic regression for Platt Scaling
    lr = LogisticRegression()
    calibrated_clf = CalibratedClassifierCV(lr, method='sigmoid')
    calibrated_clf.fit(logits, labels)

    return calibrated_clf


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
calibrated_model = platt_scaling(model, validation_loader, device)