# **Cost Function**

A **cost function** quantifies the error or difference between the predicted outputs of a model and the actual target values. It provides a scalar value that the model seeks to minimize during training to improve its performance.

---

## **1. Regression Cost Functions**

Regression cost functions are used when the target variable is continuous, aiming to measure the difference between predicted and actual values.

### **1.1 Mean Squared Error (MSE):**
- **Definition:** MSE calculates the average squared difference between the predicted and actual values. It penalizes larger errors more heavily due to squaring.
- **Formula:**  
  $$ J(\theta) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
- **Characteristics:**
  - Sensitive to outliers.
  - Commonly used due to its simplicity and differentiability.

---

### **1.2 Root Mean Squared Error (RMSE):**
- **Definition:** RMSE is the square root of MSE, representing the error in the same units as the target variable.
- **Formula:**  
  $$ J(\theta) = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} $$
- **Characteristics:**
  - Easier to interpret compared to MSE.
  - Still sensitive to outliers.

---

### **1.3 Mean Absolute Error (MAE):**
- **Definition:** MAE calculates the average of the absolute differences between predicted and actual values, treating all errors equally.
- **Formula:**  
  $$ J(\theta) = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$
- **Characteristics:**
  - More robust to outliers compared to MSE.
  - Does not penalize larger errors more than smaller ones.

---

### **1.4 Coefficient of Determination ($R^2$ Score):**
- **Definition:** $R^2$ measures the proportion of variance in the target variable explained by the model.
- **Formula:**  
  $$ R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} $$
- **Characteristics:**
  - Ranges from 0 to 1 (higher values indicate better performance).
  - Negative $R^2$ indicates that the model performs worse than the mean of the target variable.

---

## **2. Classification Cost Functions**

Classification cost functions are used when the target variable is categorical, focusing on evaluating predicted probabilities or class labels.

### **2.1 Binary Classification:**
#### **Log Loss (Binary Cross-Entropy):**
- **Definition:** Log Loss measures the performance of a classification model where the output is a probability value between 0 and 1. It penalizes predictions that are far from the actual class.
- **Formula:**  
  $$ J(\theta) = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right] $$
- **Characteristics:**
  - Suitable for binary classification problems (e.g., 0 or 1).
  - Penalizes incorrect predictions more as the predicted probability deviates from the true class.

---

### **2.2 Multi-Class Classification:**
#### **Cross-Entropy Loss:**
- **Definition:** Cross-Entropy Loss generalizes Log Loss for multi-class problems. It evaluates how well the predicted probability distribution aligns with the actual distribution.
- **Formula:**  
  $$ J(\theta) = -\frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{k} y_{ij} \log(\hat{y}_{ij}) $$
- **Characteristics:**
  - Suitable for multi-class classification problems.
  - Ensures the predicted probabilities across all classes sum to 1.

---

## **Summary**
- **Regression Cost Functions:** 
  - Focus on minimizing numerical differences between predicted and actual values. Common metrics include MSE, RMSE, MAE, and $R^2$.
- **Classification Cost Functions:** 
  - Focus on minimizing errors in predicted probabilities or class labels. Common metrics include Binary Log Loss and Multi-Class Cross-Entropy Loss.
