## **Modeling process**

During modeling phase the metric that will be used to evaluate the performance of the models as well as to facilitate the comparison between them is `balanced accuracy`.  

Balanced accuracy is the average of the recall (sensitivity) for each class, treating each class equally regardless of its frequency.   
Unlike standard accuracy, which in imbalanced datasets is biased towards the majority class and gives a false impression about the model's performance, balanced accuracy metric provides a more fair view of how well the model performs across all classes and illustrate the overall model performance.

For a heavily imbalanced dataset, balanced accuracy will prevent the majority class from dominating the metric, allowing us to see if the model truly performs well on the minority classes, and not only on the majority class.

You can find detailed information on evaluation metrics used for both classification and regression tasks in the **[official documentation](https://scikit-learn.org/1.5/modules/model_evaluation.html)**

### Logistic Regression Essentials

Multinomial Logistic Regression is a classification algorithm used when the target variable is categorical and has more than two possible classes.   
Unlike binary logistic regression, which uses the sigmoid function, multinomial logistic regression uses the softmax function to compute the probability of each class.   

In multinomial logistic regression, we model the probability of a data point belonging to one of multiple possible classes. For example, we might predict weather conditions such as "No Precipitation," "Rain," or "Snowfall."   
In scikit-learn, multinomial logistic regression by default outputs a predicted class y for each data point, selecting the class with the highest probability. 

How the Algorithm Works:

**Input Features and Coefficients:**   
The model takes the input features (independent variables, X) and initializes a vector of coefficients (β) for each class.   
These coefficients are initially set to small random values or zeros and will be adjusted during the training process by the solver (the optimization algorithm). 

**Understanding Logits (Log Odds):**   
In logistic regression, a core concept is the logit, also known as the log odds and represents a way of expressing probabilities.   
For a given class k, the odds are defined as the ratio of the probability of that class occurring to the probability of it not occurring:   
    $$
\text{odds}_k = \frac{P(y = y_k \mid X)}{1 - P(y = y_k \mid X)}
    $$

And consequently, for a given class k, the logit is defined as the natural logarithm of the odds of that class occurring.     
This transformation maps probabilities (which range from 0 to 1) to logits (which range from −∞ to +∞), making it possible to create a linear relationship between the input features and the predicted outcome y.

**Logit Calculation:**   
For each class k, the logit is computed as a linear combination of the input features and their corresponding coefficients:   
    $$
\text{logit}_k = \beta_{0,k} + \beta_{1,k} x_1 + \beta_{2,k} x_2 + \dots + \beta_{n,k} x_n
    $$

The model calculates a logit for each class, and this is the raw score before probabilities are computed. Applying this and you end up in our project with 3 logits for every record in your dataset. (1 logit for the 'no precipitation class/label, 1 logit for the 'rain' class/label and 1 logit for the 'snowfall' class/label) 

**Softmax Transformation:**   
The logits are then passed through the softmax function, which converts them into probabilities for each class.   
The softmax function also ensures that all probabilities are between 0 and 1, and that the total probability across all classes equals 1:   
    $$
   P(y = k \mid X) = \frac{\exp(\text{logit}_k)}{\sum_{j=1}^{K} \exp(\text{logit}_j)}
    $$
So after the Softmax transformation every record in the dataset has been assigned with a probability for each class.   
In our case the model predicts whether there will be 'No Precipitation', 'Rain', or 'Snowfall', and it calculates three logits (one for each class). These logits are then transformed into probabilities using the softmax function and the model assigns the class with the highest probability to the target variable y.   
For example, a record in our dataset could end up having the following probabilities:
   - No Precipitation: 0.60   
   - Rain: 0.30   
   - Snowfall: 0.10
   
Then the model would predict the class 'No Precipitation' for the target variable y, since it has the highest probability.

**Loss/Error Function (Cross-Entropy):**   

The cross-entropy loss function measures the error between the predicted probabilities and the true class labels. It’s calculated as:   
    $$
   L = - \sum_{k=1}^{K} y_k \log \left( P(y = k \mid X) \right)
    $$   

This loss is used as feedback to the model. The closer the predicted probability is to the true label, the smaller the loss.   
To make it more concrete we will use the above example where the model predicted 'No precipitation' and suppose that the true label was 'rain'.   
In this example, since the true label is Rain, we have $y_{\text{rain}}=1$, and for the other classes, $y_{\text{no precipitation}}=0$ and $y_{\text{snowfall}}=0$.   

Applying the formula of the loss function shown above, and considering only the probability for the correct class (Rain), as the other terms will be multiplied by 0, we have: $L = -(1 \times \log(0.30)) - (0 \times \log(0.60)) - (0 \times \log(0.10))$ which leads to:   $L = -\log(0.30) \approx -(-0.523) = 0.523$.   
This is the loss for this particular record (0.523).


At this point the solver takes charge and adjusts the coefficients in the direction to minimize the loss function.

**Solver and Coefficient Adjustment:**   
The solver is the optimization algorithm responsible for adjusting the β vectors (the coefficients). It iteratively updates the coefficients to reduce the loss, stopping when the change in the loss function between iterations falls below a certain threshold (tol) or after a maximum number of iterations (max_iter) is reached.

All the above come into place in the scikit-learn library where the model/algorithm is implemented. You can find all the details you may need in the official documentation of the library **[here](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)**