<a target="_blank" href="https://colab.research.google.com/github/mcks2000/llm_notebooks/blob/main/notebooks/Code_Llama_2_7B.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## 使用 Llama Code 生成代码

在这个快速教程中，你将学习到：
- 如何在免费的 Colab 环境中运行 Llama Code
- 如何使用 Llama Code 生成代码

*本笔记本基于 Hugging Face 的官方博客文章 - [Code Llama: Llama 2 learns to code](https://huggingface.co/blog/codellama)。*


其他资源的链接：
- [Code Llama 文档](https://huggingface.co/docs/transformers/main/model_doc/code_llama)
- [HF 上的 Code Llama 模型](https://huggingface.co/codellama/CodeLlama-7b-hf)
- [codellama-13b-chat](https://huggingface.co/spaces/codellama/codellama-13b-chat)
- [Code Llama in Hugging Chat](https://huggingface.co/chat/conversation/6583dedbf6ea513f3a1624b5)

*注意：请确保使用启用了 GPU 的运行时来运行此笔记本. `Runtime` -> `Change Runtime Type` -> `T4`*

让我们开始吧！

### 安装 Hugging Face 开发环境

In [None]:
!pip install transformers accelerate

### 加载模型和分词器

In [None]:
from transformers import AutoTokenizer
import transformers
import torch


model_id = "codellama/CodeLlama-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)


tokenizer_config.json:   0%|          | 0.00/749 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

### 准备通道

In [None]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

config.json:   0%|          | 0.00/637 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

### 生成代码

In [None]:
def generate_code(prompt):
    sequences = pipeline(
        prompt,
        do_sample=True,
        temperature=0.1,
        top_p=0.9,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=128,
    )
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

### 测试多个查询

In [None]:
generate_code("def fibonacci(")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Result: def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)


def fibonacci_recursive(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci_recursive(n-1) + fibonacci_recursive(n-2)




In [None]:
generate_code("def factorial(")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Result: def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)


def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)


def main():
    print(factorial(5))
    print(fibonacci(5))


if


In [None]:
generate_code("def remove_last_word(")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Result: def remove_last_word(self, text):
        """
        Remove the last word from the text.

        :param text: The text to remove the last word from.
        :return: The text without the last word.
        """
        return text.rsplit(' ', 1)[0]

    def remove_last_word_from_list(self, text_list):
        """
        Remove the last word from each text in the list.

        :param text_list: The list of texts to remove the last word from.
        :


In [None]:
generate_code("def remove_non_ascii(s: str) -> str:")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Result: def remove_non_ascii(s: str) -> str:
    """
    Remove non-ASCII characters from a string.

    :param s: The string to remove non-ASCII characters from.
    :return: The string with non-ASCII characters removed.
    """
    return "".join(i for i in s if ord(i) < 128)


def get_file_extension(file_path: str) -> str:
    """
    Get the file extension of a file.

    :param file_


### 代码补齐

For Future...

In [None]:
from transformers import pipeline
import torch

generator = pipeline("text-generation",model="codellama/CodeLlama-7b-hf",torch_dtype=torch.float16, device_map="auto")
# generator('def remove_non_ascii(s: str) -> str:\n    """ <FILL_ME>\n    return result', max_new_tokens = 128, return_type = 1)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "codellama/CodeLlama-7b-hf"
tokenizer2 = AutoTokenizer.from_pretrained(model_id)
model2 = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16
).to("cuda")




In [None]:
prompt = '''def remove_non_ascii(s: str) -> str:
    """ <FILL_ME>
    return result
'''

input_ids = tokenizer2(prompt, return_tensors="pt")["input_ids"].to("cuda")
output = model2.generate(
    input_ids,
    max_new_tokens=200,
)
output = output[0].to("cpu")

filling = tokenizer.decode(output[input_ids.shape[1]:], skip_special_tokens=True)


In [None]:
print(prompt.replace("<FILL_ME>", filling))