# 07. LLMs

1. Intro
2. Prompting
3. PEFT
4. Alignment
5. LLMs for Code
6. Exercise

# 1. Introduction

Transformers ([lena-voita.github.io](https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html)):

![](https://lena-voita.github.io/resources/lectures/seq2seq/transformer/model-min.png)

#### SCALING LAWS

[Kaplan et al - Scaling Laws for Neural Language Models 2020](https://arxiv.org/abs/2001.08361):

Model performance depends most strongly on scale, which consists of three factors: the amount of compute used for training, the size of the dataset, and the number of model parameters (excluding embeddings). Within reasonable limits, performance depends very weakly on other architectural hyperparameters such as depth vs. width.

![](res/05_scaling_laws.png)

#### LARGE MODELS

![](./res/07_model_size_in_tokens.png)


[Source: [scale.com/guides/large-language-models](https://scale.com/guides/large-language-models#model-size-and-performance)]

#### CONTEXT SIZE

![](./res/07_gpt-4_context_length.png)

[Source: [lifearchitect.ai/gpt-4](https://lifearchitect.ai/gpt-4/)]

Claude 3.5 --- 200K context tokens
- about 150K words
- hundreds of pages of text
- a couple of books (The Great Gatsby about 72K tokens)
- text that would take about 10 hours to read

#### HARD TASKS FOR LLMS

There are examples of complex problems that some LLMs solve well.

![](res/07_hard_task_gpt4.png)

There are examples of simple problems that LLMs do poorly.

![](./res/07_llm_hard_gpt4o_calculator.png)

![](./res/07_llm_hard_gpt4o_alice.png)

#### MORE
- [Dziri et al - Faith and Fate: Limits of Transformers on Compositionality](https://arxiv.org/abs/2305.18654)
- [Bubeck et al - Sparks of Artificial General Intelligence: Early experiments with GPT-4](https://arxiv.org/abs/2303.12712)
- [Nezhurina et al - Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models](https://arxiv.org/abs/2406.02061)

# 2. Prompting

#### IN-CONTEXT LEARNING

In-context learning (ICL) is a technique where task demonstrations are integrated into the prompt in a natural language format.

![](http://ai.stanford.edu/blog/assets/img/posts/2022-08-01-understanding-incontext/images/image13.gif)

- 0-shot
- 1-shot
- few-shot

#### INSTRUCTIONS

- LM just predicts the next token given the previous tokens
- One core capability of Large Language Models (LLMs) is to follow natural language instructions

#### CHAIN-OF-THOUGHT

![](https://www.promptingguide.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcot.1933d9fe.png&w=1920&q=75)

#### MORE

- [Kaplan et al - Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361v1)
- [Wei et al - Emergent Abilities of Large Language Models](https://arxiv.org/abs/2206.07682)
- [Wei et al - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
- [Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers](https://arxiv.org/abs/2212.10559)
- [Microsoft: The power of prompting](https://www.microsoft.com/en-us/research/blog/the-power-of-prompting/)
- [Yandex: Тетрадь с чит-промптами](https://ya.ru/project/cheat-prompts/index)
- [Antropic: Prompt engineering](https://docs.anthropic.com/claude/docs/prompt-engineering)

# 3. PEFT

#### TAXONOMY

![](res/07_peft_taxonomy.png)

[Source: [Lialin et al 2023](https://arxiv.org/abs/2303.15647)]

#### LoRA

![](./res/07_lora.png)

[Source: [sebastianraschka.com](https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms)]

#### MORE

- [Lialin et al - Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2303.15647)
- [LoRA](https://huggingface.co/docs/peft/conceptual_guides/lora)
- [Practical tips](https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms)

# 4. Alignment

![](res/07_alignment.png)

#### RLHF

![](2022-2023/res_nlp/rlhf_overview.png)

More:
- [Stiennon et al - Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)


#### MORE

- [Zhou et al - LIMA: Less Is More for Alignment](https://arxiv.org/abs/2305.11206): SFT on carefully selected examples (1000), without using RL
- [Rafailov et al - Direct Preference Optimization](https://arxiv.org/abs/2305.18290)
- [Constitutional AI](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback)
- [Zhout et al - LIMA: Less Is More for Alignment](https://arxiv.org/abs/2305.11206)
- [Rafailov et al - Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)
- https://wandb.ai/ayush-thakur/RLHF/reports/Understanding-Reinforcement-Learning-from-Human-Feedback-RLHF-Part-1--VmlldzoyODk5MTIx
- [Illustrating Reinforcement Learning from Human Feedback (RLHF)](https://huggingface.co/blog/rlhf)
- https://github.com/opendilab/awesome-RLHF
- https://rail.eecs.berkeley.edu/deeprlcourse/

# 5. LLMs for Code

|**model**     | **team** |**year** | **context** | **terms**   | **reference**|
|--------------|----------|---------|-------------|-------------|--------------|
| GPT-4        | OpenAI   | 2023    | 32K         | [pricing](https://openai.com/pricing)|[gpt-4](https://openai.com/gpt-4) |
| codellama-7b-instruct-hf| meta | 2023 | 16K     |[terms](https://github.com/facebookresearch/llama/blob/main/LICENSE)|[codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) |
| codellama-13b-instruct-hf| meta | 2023 | 16K     |[terms](https://github.com/facebookresearch/llama/blob/main/LICENSE)|[codellama/CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf) |
| codellama-34b-instruct-hf| meta | 2023 | 16K     |[terms](https://github.com/facebookresearch/llama/blob/main/LICENSE)|[codellama/CodeLlama-34b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf) |
| codellama-70b-instruct-hf| meta | 2023 | 16K     |[terms](https://github.com/facebookresearch/llama/blob/main/LICENSE)|[codellama/CodeLlama-70b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-70b-Instruct-hf) |
| deepseek-coder-33b-instruct | deepseek | 2024 | 16K | [terms](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL) | [deepseek-ai/deepseek-coder-33b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)|
| codegemma-2b | google   | 2024    | 8K          |[terms](https://ai.google.dev/gemma/terms)|[google/codegemma-2b](https://huggingface.co/google/codegemma-2b)|
| codegemma-7b | google   | 2024    | 8K          |[terms](https://ai.google.dev/gemma/terms)|[google/codegemma-7b](https://huggingface.co/google/codegemma-7b)|
| codegemma-7b-it| google | 2024    | 8K          |[terms](https://ai.google.dev/gemma/terms)|[google/codegemma-7b-it](https://huggingface.co/google/codegemma-7b-it)|
|              |          |         |                          |                    |             |              |
|              |          |         |                          |                    |             |              |

#### MORE

- https://github.com/huybery/Awesome-Code-LLM
- [Code Llama: Open Foundation Models for Code](https://arxiv.org/abs/2308.12950)
- [DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence](https://arxiv.org/abs/2401.14196)
- [CodeGemma: Open Code Models Based on Gemma](https://goo.gle/codegemma)


# 6. Exercise

It is necessary to conduct a mini-study on the applicability of LLMs for the task of clone detection.

Possible plan:
1. choose any open model (codellama, codegemma etc.)
2. choose any dataset for clones, or part of it, or come up with a small number of examples yourself
3. select a prompt
4. get the model's responses
5. transform the model's responses into labels (in any way: classification, regular expressions, manually...)
6. calculate metrics, make conclusions