# Hugging Face: Basic

Hugging Face is a company and open-source community that provides tools to buid, deploy and share ML models.

Mission:
- To democratize good machine learning, one commit at a time.
- Open-source AI by providing a one-stop-shop of resources, ranging from models (+30k), datasets (+5k), ML demos (+2k) and libraries.

Key services:
- **Models**: A collection of thousands of pre-trained models for different tasks.
- **Datasets**: A library of datasets covering various domains and languages.
- **Spaces**: A platform for hosting and sharing interactive web applications and demo.
- **Inference API**: A fully managed service that allows you to deploy any Hugging Face model on scalable and secure infrastructure.
- **AutoTrain**: A no-code solution that automatically searches and trains the best model for your data and task.

## Topics:
- Transformer overview
- transformer library
- transformer.pipeline

Natural Language Processing

- NLP is a field that combines linguistics and machine learning to understand human language. It involves tasks like classifying sentences based on sentiment or grammar, generating text, extracting answers from texts, and handling challenges in speech recognition and computer vision.



## Transformer history

- June 2017: The Transformer architecture was introduced by google "Attention is all you need." (https://arxiv.org/abs/1706.03762).
- June 2018: GPT, the first pretrained Transformer model, used for fine-tuning on various NLP tasks and obtained state-of-the-art results
- October 2018: BERT, another large pretrained model, this one designed to produce better summaries of sentences.
- February 2019: GPT-2, an improved (and bigger) version of GPT that was not immediately publicly released due to ethical concerns.
- October 2019: DistilBERT, a distilled version of BERT that is 60% faster, 40% lighter in memory, and still retains 97% of BERT’s performance.
- October 2019: BART and T5, two large pretrained models using the same architecture as the original Transformer model.
- May 2020, GPT-3, an even bigger version of GPT-2 that is able to perform well on a variety of tasks without the need for fine-tuning (called zero-shot learning).

Broadly, they can be grouped into three categories:

- GPT-like (also called auto-regressive Transformer models)
- BERT-like (also called auto-encoding Transformer models)
- BART/T5-like (also called sequence-to-sequence Transformer models)

![Transformers Chrono](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_chrono.svg)

![Transformers Chrono](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/model_parameters.png)

**Large Language Model (LLM)** is the model that are trained on massive amounts of text data to understand contextual relationshipss between words.

**Pretrained Models**: Pretrained models are LLMs that have been trained on large corpora of text data. These models are trained on diverse sources which capable of generating coherent and contextually appropriate text.

**Fine-tuning** is the process of taking a pretrained model and further training it on a specific task or domain. Fine-tuned models are useful in various applications such as language translation, chatbots, question-answering systems, and more.


Model components:

**Encoder**: The encoder receives an input and builds its features.

**Decoder**: The decoder uses the features along with other inputs to generate a target sequence.

**Attention layers**: It is designed to mimic the human cognitive mechanism of selectively focusing. Attention layers enable models to assign different weights or importance to different parts of the input sequence.

**Model head**: the additional layers added on top of the base transformer architecture to tailor the model for a specific task, allowing it to transform the learned representations into task-specific predictions or outputs.

The original Transformer architecture
![Transformers Chrono](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers.svg)

## Model archetecture:

### **Encoder-only models** (auto-encoding models)

- Encoder models use only the encoder of a Transformer model. At each stage, the attention layers can access all the words in the initial sentence.
- The pretraining of these models is usually trained to predict randsom masking.
- Models: ALBERT, BERT, DistilBERT, ELECTRA, RoBERTa
- Good for tasks that require understanding of the input, such as sentence classification and named entity recognition.

### **Decoder-only models** (auto-regressive models)

- Decoder models use only the decoder of a Transformer model. At each stage, for a given word the attention layers can only access the words positioned before it in the sentence.
- The pretraining of decoder models is trained to predicting the next word in the sentence.
- Models: CTRL, GPT, GPT-2, Transformer XL
- Good for generative tasks such as text generation.

### **Encoder-decoder models** (sequence-to-sequence models)

- At each stage, the attention layers of the encoder can access all the words in the initial sentence, whereas the attention layers of the decoder can only access the words positioned before a given word in the input.
- Pretrained by replacing random spans of text with a single mask special word and the objective is then to predict the text that this mask word replaces.
- Models: BART, mBART, Marian, T5
- Good for generative tasks that require an input, such as summarization, translation, or generative question answering.

## Bias and limitations

- Pretrained is trained on data on the internet.
- Easily generate sexist, racist, or homophobic content.
- Fine-tuning the model on your data won’t make this intrinsic bias disappear.

## Reference:

- Hugging Face NLP Course
  - https://huggingface.co/learn/nlp-course/chapter1/1

- datacamp: An Introduction to Using Transformers and Hugging Face
  - https://www.datacamp.com/tutorial/an-introduction-to-using-transformers-and-hugging-face