## 0. Installation

[guide reference](https://huggingface.co/docs/huggingface_hub/quick-start)


### 0.1 huggingface_hub
- ```pip install huggingface_hub```
- ```pip install -U "huggingface_hub[cli]"```

### 0.2 transformers
- ```pip install 'transformers[torch]'```  
- ```pip install 'transformers[tf-cpu]'```
- ```pip install 'transformers[flax]'```

### 0.3 others
- ```pip install datasets```
- ```pip install evaluate```
- ```pip install scikit-learn``` (evaluate dependencies)

## 1. Model

See [model hub](https://huggingface.co/models) for model investory

### 1.1 pipeline

A pipeline load a model for a specified NLP task (e.g. zero-shot-classification). User has an option to specify a model. If not specified, a default model (e.g. gpt-2) will be used. See [pipeline](https://huggingface.co/docs/transformers/en/pipeline_tutorial) for more examples. To initiate a pipeline

```python
model = pipeline('zero-shot-classification', 
                      model='roberta-large-mnli', 
                      cache_dir='PATH/TO/CACHE/DIR'
                      device=0)
```

The model will by default saved in ```~/.cache```. To delete cached model, first install

```pip install huggingface_hub["cli"]```

Then run

```huggingface-cli delete-cache```

### 1.2 AutoClass

Automatically infers and loads the correct architecture from a given checkpoint. Use with the ```from_pretrained()``` method. For NLP, one usually call the **AutoTokenizer** and **AutoModelFor** class. This ensures that the correct architecture is loaded [see ref](https://huggingface.co/docs/transformers/en/autoclass_tutorial#:~:text=instances%20of%20models.-,This%20will%20ensure%20you%20load%20the%20correct%20architecture%20every%20time.%20In%20the%20next%20tutorial%2C%20learn%20how%20to%20use%20your%20newly%20loaded%20tokenizer%2C%20image%20processor%2C%20feature%20extractor%20and%20processor%20to%20preprocess%20a%20dataset%20for%20fine%2Dtuning.,-TensorFlow).

For example:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = 'distilbert/distilbert-base-uncased'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
```


### 1.3 Fine-tuning

#### 1.3.1 trainer class

train the head of the transformer (see [ref](https://huggingface.co/docs/transformers/v4.39.3/en/training#:~:text=You%20will%20see,model%20to%20it)). Allow specifying the hyperparameters such as gradient accumulation, mixed precision, learning rate scheduler, regularization etc. Another detail example can be found [here](https://towardsdatascience.com/fine-tuning-pretrained-nlp-models-with-huggingfaces-trainer-6326a4456e7b).

To freeze some weights of a transformer, do

```python
for param in list_of_param_to_be_frozen:
    param.requires_grad = False
```

For example, to freeze the embedding layers and the first n transformer layers 

```python
modules = [L1bb.embeddings, *L1bb.encoder.layer[:n]] 
for module in modules:
    for param in module.parameters():
        param.requires_grad = False
```

See [discussion](https://discuss.huggingface.co/t/how-to-freeze-some-layers-of-bertmodel/917#:~:text=modules%20%3D%20%5BL1bb.embeddings%2C%20*L1bb.encoder.layer%5B%3A5%5D%5D%20%23Replace%205%20by%20what%20you%20want%0Afor%20module%20in%20mdoules%3A%0A%20%20%20%20for%20param%20in%20module.parameters()%3A%0A%20%20%20%20%20%20%20%20param.requires_grad%20%3D%20False) for more detail.

#### 1.3.2 adapter

add additional trainable weights to the transformer, which weights are frozen. This has been shown to be very memory-efficient with lower compute usage while producing results comparable to a fully fine-tuned model. This falls under a more general class of training called *Parameter-Efficient Fine Tuning* (PEFT). See [here](https://huggingface.co/docs/transformers/en/peft) for more detail.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "facebook/opt-350m"
peft_model_id = "ybelkada/opt-350m-lora"

model = AutoModelForCausalLM.from_pretrained(model_id)
model.load_adapter(peft_model_id)
```

#### 1.3.3 [training with script](https://huggingface.co/docs/transformers/en/run_scripts)

this fine-tunes the model on a more granular (weights) level. Since this involves updating the transformer model itself (rather than changing hyperparameters or adding and training an adapter). There are scripts in [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects) available for user to make necessary changes to adapt to their use cases, such as training on a customer dataset (see [here](https://huggingface.co/docs/transformers/en/run_scripts#:~:text=tmp/tst%2Dsummarization-,Use%20a%20custom%20dataset,-The%20summarization%20script) for more detail).

### 1.4 TrainingArguments: hyperparameters
**batch size**

**learning rate/scheduler**

**regularization**

## N. Misc

### N.1 Questions
- When running pipeline/specify a model, what do we download?
- Difference between ```pipeline``` and ```from_pretrained```?
- How to run inference using GPU?
- When the model is cached, is it stored on RAM? GPU?


### N.2 FAQ 

#### Theory
- What's the different between **architecture** and **checkpoint**?
    - Architecture refers to the skeleton of the model and checkpoints are the weights for a given architecture. For example, BERT is an architecture, while google-bert/bert-base-uncased is a checkpoint. Model is a general term that can mean either architecture or checkpoint.
- What's the difference between **AutoModelForSeq2SeqLM** and **AutoModelForCausalLM**?
    - AutoModelForSeq2SeqLM is used for language models with encoder-decoder architecture, like T5 and BART. This architecture is typically used in generative tasks where the output heavily relies on the input (e.g. translation, summarization). AutoModelForCausalLM is used for auto-regressive language models (decoder-only) like all the GPT models. They are used for all other types of generative tasks [(Reference)](https://stackoverflow.com/questions/75549632/difference-between-automodelforseq2seqlm-and-automodelforcausallm). 
- What is the different between **base** and **instruct** prompting?
    - Instruct is a checkpoint of a model that is furthered fine-tuned on a specific corpus. Instruct models are more suitable for NLP tasks.
- LoRA does not offer much boost in computation time. Why is that?
    - TBA, but look into gradient computation, backward and forward passing and see if LoRA reduce those operations. Also take into account of overhead in adding the adapters. See [reference](https://github.com/huggingface/transformers/issues/25760) for PEFT discussion and [reference](https://pytorch.org/docs/stable/notes/autograd.html#:~:text=During%20the%20forward%20pass%2C%20an%20operation%20is%20only%20recorded%20in%20the%20backward%20graph%20if%20at%20least%20one%20of%20its%20input%20tensors%20require%20grad.%20During%20the%20backward%20pass%20(.backward())%2C%20only%20leaf%20tensors%20with%20requires_grad%3DTrue%20will%20have%20gradients%20accumulated%20into%20their%20.grad%20fields.) for the discussion of the mechanism of autograd and ```requires_grad()```.
    
#### Practical
- I'm facing the OSError: Can't load config for \<model>. Make sure that: - \<model> is a correct model identifier listed on 'https://huggingface.co/models'.
    - Run ```pip install --upgrade transformers```
- I got the assertion error: "AssertionError: Torch not compiled with CUDA enabled" when trying to use GPU (e.g. ```torch.cuda.is_available()```).
    - Run ```pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html``` [see ref](https://github.com/pytorch/pytorch/issues/50032).
- What's the difference between ```pipeline``` and ```from_pretrained```?
    - ```pipeline``` load up a model. ```from_pretrained``` from **AutoClass** automatically infers and loads up a correct architecture given a checkpoint.
- When running pipeline/specify a model, what do we download?
    - The model snapshot is downloaded by default to ```~/.cache```. 
- Do I need to run **AutoTokenizer** before **AutoModelFor**? Why?
    - It is recommended to use the **AutoTokenizer** class before the **AutoModelFor** class to load pretrained instances of models. This will ensure you load the correct architecture every time.
- ```torch.cuda.is_available()=False```
    - Make sure cuda version is compatible with pytorch. E.g. torch 1.9.0 is compatible with cuda 11.1. If upgraded to torch 1.13.1, it is only compatible with cuda 11.6. See [reference](https://saturncloud.io/blog/why-torchcudaisavailable-returns-false-even-after-installing-pytorch-with-cuda/#:~:text=is_available()%20might%20return%20False%20is%20that%20the%20installed%20version,is_available()%20will%20return%20False%20.)
- ```peft 0.3.0``` requires with ```torch > 1.13.0```
    - TBA
- When the model is cached, is it stored on RAM? GPU?
    - TBA
- Where is HF datasets stored?
    - Disk by default. Therefore will not inflate memory usage ([ref](https://huggingface.co/docs/transformers/en/training#:~:text=Remember%20that%20Hugging,the%20entire%20dataset)).
- How to run inference/train using GPU?
    - TBA
- How to use **accelerate** library?
    - TBA
- What are the alternatives to huggingface?
    - OpenAI, Cohere
- Cuda driver API vs runtime? Difference between ```nvidia-smi``` and ```nvcc --version```?
    - TBA