
## Transfer Learning
Transfer learning is a technique in machine learning where a pre-trained model, which has already learned useful features or representations from a source dataset, is adapted for a new but related target task or dataset. Instead of training a model from scratch, transfer learning leverages the knowledge gained from the source task to improve the learning process and performance on the target task, often resulting in faster training and better performance with less data.

In summary, transfer learning allows you to:

Save time and computational resources by reusing an existing pre-trained model.
Benefit from the knowledge the model has already gained from the source dataset.
Achieve better performance on the target task, especially when the target dataset is smaller or has limited labeled data.

When you transfer the model you remove the last node (including its weights and biases) and replace it with a new fresh node and then proceed to train the model on the new data with the new node, but with all the other layers the same. If you only have a small amount of data then you can only replace the weights and biases for the output layer, but if you have tons of data you can retrain the entire network.

### When does transfer learning make sense?
- Task A and B have the same input x
- You have a lot more data for Task A than Task B (task A being the model you transfer from)
- When low level featuers from A could be helpful for learning about Task B

# What is multitask learning?
Multi-task learning is an approach in machine learning where a single model is trained to perform multiple related tasks simultaneously, sharing information across these tasks. The idea is that by learning from multiple tasks, the model can exploit the commonalities and differences among them, which can lead to improved performance and generalization compared to training separate models for each task.

Multi-task learning is particularly useful when:

Tasks are related: If the tasks share some underlying structure or features, learning them together can help the model leverage this shared knowledge and improve performance.

Data is limited: When there is limited data for individual tasks, multi-task learning can help by allowing the model to learn from the combined data of all tasks, effectively increasing the amount of training data and reducing overfitting.

Computational resources are constrained: Training a single multi-task model can be more efficient than training separate models for each task, as the shared layers can be trained once and used for multiple tasks, reducing the overall training time and computational resources required.

In summary, multi-task learning is a technique in machine learning that involves training a single model to perform multiple related tasks simultaneously. This approach can lead to improved performance, generalization, and efficiency by exploiting the commonalities and differences among the tasks and sharing information across them.

Lets say you are building a network for a self-driving car and you need a CV system that can identify multiple types of objects such as stop signs, pedestrains, cars, traffic lights, and other items. Instead of training a network for each item you would more likely utilize multitask learning where yhat becomes a vector of size (nitems, 1). In the same vein the size of the last layer (output layer) of the network would be of size nitems. In this sort of network with multiple output nodes, the cost function usually just sums up the loss for each node.
- loss = Yhat = 1/m * np.sum(i) * np.sum(j) * loss(yhat[j](i), y[j](i))
- loss() is the usual logistic loss function
  - -yj(i) * log(yhatj(i)) - (1 - yj(i)) * log(yhatj(i))

This method differs from softmax regression as there can be positives in multiple nodes (there can be a stop sign, pedestrain, and car present in the image).

This loss function implementaiton is what makes the network perform for multi-task learning.

An interesting thing about this approach is that you can use training data that does not contain all the labels for all the images. Meaning some of the data can have only stop signs labeled, some data only cars, and some data with multiple labels. The model() will still be able to learn all classes as long as there is enough labeled data for each individual class option. When this is the case, you only sum over the labels that are present in the data set for each class and ignore the unlabled data within the loss function.

### Logistis Loss Refresher
The logistic loss, also known as log loss or cross-entropy loss, is a common loss function used in binary classification tasks. Given a set of true labels (y) and predicted probabilities (p) for the positive class, the logistic loss for a single example can be calculated as:

Logistic Loss = -[y * log(p) + (1 - y) * log(1 - p)]

Where:

y is the true label (0 or 1)
p is the predicted probability of the positive class (a value between 0 and 1)
The logistic loss for the entire dataset is computed by averaging the loss across all examples:

Logistic Loss (Dataset) = -(1/N) * Σ[y_i * log(p_i) + (1 - y_i) * log(1 - p_i)]

Where:

N is the number of examples in the dataset
y_i is the true label for the i-th example
p_i is the predicted probability of the positive class for the i-th example
The logistic loss penalizes incorrect predictions by assigning a higher loss to predictions that are further from the true labels. It is widely used in logistic regression and neural networks for binary classification tasks.

## When does multi-task learning make sense?
- Training on a set of tasks that could benefit from having shared lower-level features.
- Usually: Amount of data you have for each task is quite similar.
- Can be helpful when you dont have a huge amount of data for each individual task as the network can learn some valuable lower-level features which is can apply to each of the individual classes.
- For the individual tasks to see a significant increase in performance with multi-task learning there needs to be a large amount of data for the other tasks (e.g. 1000 examples of task A, 100,000 examples total)

**IMPORTANT** Multitask learning is less common than transfer learning with the exception of CV applications where multi-task learning is much more common.


