# Transfer learning


## Hugging face & Pytorch

#### **Hugging Face**
- A platform for **machine learning (ML)** and **Natural Language Processing (NLP)**.
- Known for its **Transformers Library**, offering pre-trained models like **BERT**, **GPT**, and **T5**.
- Focuses on **NLP tasks** such as text classification, sentiment analysis, and text generation.
- Often called the **"GitHub of Machine Learning"** due to its collaborative, community-driven approach.
- Ideal for quick, out-of-the-box NLP solutions.

#### **PyTorch**
- An **open-source deep learning framework** developed by **Meta**.
- Built on **Python**, making it intuitive and widely used in research and academia.
- Key feature: **dynamic computation graphs**, allowing real-time changes to models.
- Excels in **custom model building**, rapid prototyping, and handling large-scale computations with GPU acceleration.
- Best for creating tailored, research-focused AI solutions.


#### **Summary**
- **Hugging Face** simplifies NLP with pre-trained models and tools, while **PyTorch** provides flexibility for custom model development.
- Together, they integrate seamlessly for advanced NLP applications like sentiment analysis, language translation, and text summarization.
- Hugging Face is great for quick implementations, while PyTorch is ideal for research and custom solutions.

## Fine-tuning

* training LLMs from scratch requires significant time, infrastructure, dataset & investment, fine-tuning adapts general models for downstream tasks such as sentiment analysis or text generation using domain datasets

* benefits
    * time & resource efficiency
    * tailored responses
    * task-specific adaptation
* pitfalls
    * overfitting & underfitting
    * catastrophic forgetting (losing initial knowledge)
    * data leakage

* QA bot example
    * domain specific dataset
    * novel cost function
    * reinforcement learning, direct preference optimization, encoder as evaluator
    * response evaluation is hard => adapting BERT to produce continuous outputs that serve as reward modeling

* fine-tuning strategies
    * self-supervised fine-tuning (masking & predicting words)
    * supervised fine-tuning (sentiment prediction)
    * reinforcement learning from human feedback
        * prompt-> model -> response 1, response 2, response 3 -> human ranking -> reward -> model
    * hybrid fine-tuning (combining multiple approaches)

* direct preference optimization
    * optimizes language models directly based on human preferences
    * simple -> more easy to implement than RLHF
    * human-centric -> aligns model outputs with human preference
    * no reward training necessary
    * faster convergence

* supervised fine-tuning
    * full fine-tuning (all parameters are tuned)
    * parameter efficient fine-tuning (most parameters are kept)


