# 07. LLMs

1. Intro
2. Prompting
3. PEFT
4. Alignment
5. LLMs for Code
6. Exercise
7. References

# 1. Introduction

Transformers ([lena-voita.github.io](https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html)):

![](https://lena-voita.github.io/resources/lectures/seq2seq/transformer/model-min.png)

#### SCALING LAWS

![](res/05_scaling_laws.png)

#### LARGE MODELS

![](https://cdn.builder.io/api/v1/image/assets%2Fe0438815ba51486bbb6a202747122d4b%2F894dade70d724952bfad3956c599865a)

#### CONTEXT SIZE

![](https://i.redd.it/b3h0mebrssna1.png)

Claude 2.1 --- 200K токенов контекста
- примерно 150K слов
- сотни страниц текста
- пара книг ("Великий Гэтсби около 72K токенов)
- текст, который чтобы прочитать потребуется около 10 часов

#### HARD TASKS FOR LLMS

Есть примеры сложных задач, которые некоторые LLMs решают хорошо.

![](res/07_hard_task_gpt4.png)

Есть примеры простых задач, которые LLMs решают плохо.

![](res/07_letters_1_false.png)

![](res/07_letters_2.png)

![](res/07_letters_3_false.png)

![](res/07_math_1_false.png)

![](res/07_math_1_true.png)

![](res/07_math_2.png)

![](res/07_math_3.png)

#### MORE
- [Dziri et al - Faith and Fate: Limits of Transformers on Compositionality](https://arxiv.org/abs/2305.18654)
- [Bubeck et al - Sparks of Artificial General Intelligence: Early experiments with GPT-4](https://arxiv.org/abs/2303.12712)

# 2. Prompting

#### IN-CONTEXT LEARNING

In-context learning (ICL) is a technique where task demonstrations are integrated into the prompt in a natural language format.

![](http://ai.stanford.edu/blog/assets/img/posts/2022-08-01-understanding-incontext/images/image13.gif)

- 0-shot
- 1-shot
- few-shot

#### INSTRUCTIONS

- LM just predicts the next token given the previous tokens
- One core capability of Large Language Models (LLMs) is to follow natural language instructions

#### CHAIN-OF-THOUGHT

![](https://www.promptingguide.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcot.1933d9fe.png&w=1920&q=75)

#### MORE

- [Kaplan et al - Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361v1)
- [Wei et al - Emergent Abilities of Large Language Models](https://arxiv.org/abs/2206.07682)
- [Wei et al - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
- [Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers](https://arxiv.org/abs/2212.10559)
- [Microsoft: The power of prompting](https://www.microsoft.com/en-us/research/blog/the-power-of-prompting/)
- [Yandex: Тетрадь с чит-промптами](https://ya.ru/project/cheat-prompts/index)
- [Antropic: Prompt engineering](https://docs.anthropic.com/claude/docs/prompt-engineering)

# 3. PEFT

#### TAXONOMY

![](res/07_peft_taxonomy.png)

#### LoRA

![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfbd169-eb7e-41e1-a050-556ccd6fb679_1600x672.png)

#### MORE

- [Lialin et al - Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2303.15647)
- [LoRA](https://huggingface.co/docs/peft/conceptual_guides/lora)
- [Practical tips](https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms)

# 4. Alignment

![](res/07_alignment.png)

#### RLHF

![](2022-2023/res_nlp/rlhf_overview.png)

Подробнее:
- [Stiennon et al - Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)

#### не RL

- LIMA: предлагают использовать SFT на тщательно отобранных примерах (1000), без использования RL
- DPO: предлагают новый алгоритм оптимизации, который не требует RL, но результаты DPO лучше, чем у RLHF

#### MORE
- [Zhout et al - LIMA: Less Is More for Alignment](https://arxiv.org/abs/2305.11206)
- [Rafailov et al - Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)
- https://wandb.ai/ayush-thakur/RLHF/reports/Understanding-Reinforcement-Learning-from-Human-Feedback-RLHF-Part-1--VmlldzoyODk5MTIx
- [Illustrating Reinforcement Learning from Human Feedback (RLHF)](https://huggingface.co/blog/rlhf)
- https://github.com/opendilab/awesome-RLHF
- https://rail.eecs.berkeley.edu/deeprlcourse/
- [лекция](https://www.youtube.com/watch?v=f5JFXvX7FLE&list=PL6Wui14DvQPzMqtOOnfL00ZQ61JcN8V87&index=7) с gpt week
- [+ семинар](https://www.youtube.com/watch?v=dcN0BFa0OAI&list=PL6Wui14DvQPzMqtOOnfL00ZQ61JcN8V87&index=8&pp=iAQB) по алайменту
- [блог hf по алайменту](https://huggingface.co/blog/pref-tuning): DPO, IPO, KTO
- [свежий блог](https://huggingface.co/blog/constitutional_ai) от hf по Constitutional AI, тоже алаймент
- [Новые методы алаймента языковых моделей — Борис Шапошников, Тинькофф](https://www.youtube.com/watch?v=IwA1ZgM5RFA)
- [Статья от Anthropic, в которой вводится терминология Harmless, Helpful, Honest агента, и в целом описан процесс обучения модели предпочтений](https://arxiv.org/abs/2112.00861)
- [C-RLFT, алаймент для Openchat](https://arxiv.org/pdf/2309.11235.pdf)

# 5. LLMs for Code

|**model**     | **team** |**year** | **context** | **terms**   | **reference**|
|--------------|----------|---------|-------------|-------------|--------------|
| GPT-4        | OpenAI   | 2023    | 32K         | [pricing](https://openai.com/pricing)|[gpt-4](https://openai.com/gpt-4) |
| codellama-7b-instruct-hf| meta | 2023 | 16K     |[terms](https://github.com/facebookresearch/llama/blob/main/LICENSE)|[codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) |
| codellama-13b-instruct-hf| meta | 2023 | 16K     |[terms](https://github.com/facebookresearch/llama/blob/main/LICENSE)|[codellama/CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf) |
| codellama-34b-instruct-hf| meta | 2023 | 16K     |[terms](https://github.com/facebookresearch/llama/blob/main/LICENSE)|[codellama/CodeLlama-34b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf) |
| codellama-70b-instruct-hf| meta | 2023 | 16K     |[terms](https://github.com/facebookresearch/llama/blob/main/LICENSE)|[codellama/CodeLlama-70b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-70b-Instruct-hf) |
| deepseek-coder-33b-instruct | deepseek | 2024 | 16K | [terms](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL) | [deepseek-ai/deepseek-coder-33b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)|
| codegemma-2b | google   | 2024    | 8K          |[terms](https://ai.google.dev/gemma/terms)|[google/codegemma-2b](https://huggingface.co/google/codegemma-2b)|
| codegemma-7b | google   | 2024    | 8K          |[terms](https://ai.google.dev/gemma/terms)|[google/codegemma-7b](https://huggingface.co/google/codegemma-7b)|
| codegemma-7b-it| google | 2024    | 8K          |[terms](https://ai.google.dev/gemma/terms)|[google/codegemma-7b-it](https://huggingface.co/google/codegemma-7b-it)|
|              |          |         |                          |                    |             |              |
|              |          |         |                          |                    |             |              |

#### MORE

- https://github.com/huybery/Awesome-Code-LLM
- [Code Llama: Open Foundation Models for Code](https://arxiv.org/abs/2308.12950)
- [DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence](https://arxiv.org/abs/2401.14196)
- [CodeGemma: Open Code Models Based on Gemma](https://goo.gle/codegemma)


# 6. Exercise

Необходимо провести мини-исследование на тему применимости больших языковых исследований для задачи детекции клонов.

Возможный план:
1. выбираете любую открытую модель (codellama, codegemma etc.)
2. выбираете любой датасет для клонов, или его часть, или сами придумываете небольшое кол-во примеров
3. подбираете промпт
4. получаете ответы модели
5. превращаете ответы модели в метки (любым способом: классификация, регулярки, вручную...)
6. считаете метрики, делаете выводы

# 7. References

- [Self-Play fIne-tuNing](https://arxiv.org/abs/2401.01335)
- [Comprehensive Language Model Fine Tuning](https://www.ntentional.com/nlp/datasets/tokenization/processing/2020/10/09/comprehensive-datasets.html)