<a href="https://colab.research.google.com/github/ftnext/practice-dl-nlp/blob/master/llmjp/fine_tuning/Beluuuuuuga_japanese_instruction_linux_command.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

ref: https://github.com/Beluuuuuuga/nikkei-linux-2023-9-llm/blob/11dfe7d2f3ae53c5c0e56498ce72d1a16b3e0839/notebook/japanese_instruction_linux_command.ipynb

## 環境構築

In [1]:
! pip install transformers[sentencepiece] accelerate datasets peft bitsandbytes -qq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m52.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m30.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.3/519.3 kB[0m [31m47.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m81.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m78.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━

## LLMを単に利用

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

In [3]:
model_name = "rinna/japanese-gpt-neox-3.6b"

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)

text = """### 指示:
Linuxでシェルを終了するコマンドを教えてください。

### 回答:
"""

model0 = AutoModelForCausalLM.from_pretrained(model_name)
if torch.cuda.is_available():
    model0 = model0.to("cuda")

Downloading (…)okenizer_config.json:   0%|          | 0.00/284 [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/786k [00:00<?, ?B/s]

You are using the legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565


Downloading (…)lve/main/config.json:   0%|          | 0.00/534 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/7.37G [00:00<?, ?B/s]

In [5]:
token_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
    output_ids = model0.generate(
        token_ids.to(model0.device),
        max_new_tokens=30,
        min_new_tokens=30,
        do_sample=True,
        temperature=0.1,
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

output = tokenizer.decode(output_ids.tolist()[0])
print(output)

### 指示: Linuxでシェルを終了するコマンドを教えてください。  ### 回答: Linuxでシェルを終了するコマンドは、 shutdownコマンドです。  ### 回答: Linuxでシェルを終了するコマンドは


## ファインチューニング

### ファインチューニング用データセットの読み込み

In [6]:
from datasets import load_dataset

In [7]:
VAL_SET_SIZE = 34

dataset = load_dataset("Beluuuuuuga/Japanese-Instruction-Linux-Command-169")
splitted = dataset["train"].train_test_split(
    test_size=VAL_SET_SIZE, shuffle=True, seed=42
)
train_data = splitted["train"]
val_data = splitted["test"]

Downloading readme:   0%|          | 0.00/106 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/26.5k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [8]:
CUTOFF_LEN = 256


def tokenize(prompt, tokenizer):
    result = tokenizer(
        prompt,
        truncation=True,
        max_length=CUTOFF_LEN,
        padding=False
    )
    return {
        "input_ids": result["input_ids"],
        "attention_mask": result["attention_mask"]
    }

In [9]:
def generate_prompt(data_point):
    prompt = f"""### 指示:
{data_point["instruction"]}

### 回答:
{data_point["output"]}"""
    return prompt  # 改行は取り除かないで統一してみる

In [10]:
# 再現性を確保するため、shuffleのseedを指定
train_data = train_data.shuffle(seed=42).map(lambda x: tokenize(generate_prompt(x), tokenizer))
val_data = val_data.shuffle(seed=42).map(lambda x: tokenize(generate_prompt(x), tokenizer))

Map:   0%|          | 0/135 [00:00<?, ? examples/s]

Map:   0%|          | 0/34 [00:00<?, ? examples/s]

In [11]:
model = AutoModelForCausalLM.from_pretrained(
    model_name, load_in_8bit=True, device_map="auto"
)

### LoRA

In [12]:
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType

In [13]:
lora_config = LoraConfig(
    r=10,
    lora_alpha=16,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)
model = prepare_model_for_int8_training(model)  # prepare_model_for_kbit_training
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()



trainable params: 4,055,040 || all params: 3,611,300,352 || trainable%: 0.1122875309375541


In [14]:
from transformers import DataCollatorForLanguageModeling, TrainingArguments, Trainer

In [15]:
arguments = TrainingArguments(
    num_train_epochs=4,
    learning_rate=1e-4,
    logging_steps=10,
    evaluation_strategy="steps",
    save_strategy="steps",
    eval_steps=10,
    save_steps=10,
    output_dir="lora-japanese-gpt-neox-3.6b-result",
    report_to="none",
    save_total_limit=6,
    push_to_hub=False,
    auto_find_batch_size=True,
)

trainer = Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=arguments,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

In [16]:
model.config.use_cache = False
trainer.train()
model.config.use_cache = True

peft_name = "lora-japanese-gpt-neox-3.6b"
trainer.model.save_pretrained(peft_name)



Step,Training Loss,Validation Loss
10,2.9325,2.385284
20,1.9688,1.559895
30,1.313,1.009713
40,0.9417,0.833838
50,0.787,0.778101
60,0.7717,0.762381




## ファインチューニングしたモデルを利用

In [17]:
from peft import PeftModel, PeftConfig

In [18]:
model2 = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True, device_map="auto")
lora_model = PeftModel.from_pretrained(model2, peft_name, device_map="auto")
lora_model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPTNeoXForCausalLM(
      (gpt_neox): GPTNeoXModel(
        (embed_in): Embedding(32000, 2816)
        (emb_dropout): Dropout(p=0.0, inplace=False)
        (layers): ModuleList(
          (0-35): 36 x GPTNeoXLayer(
            (input_layernorm): LayerNorm((2816,), eps=1e-05, elementwise_affine=True)
            (post_attention_layernorm): LayerNorm((2816,), eps=1e-05, elementwise_affine=True)
            (post_attention_dropout): Dropout(p=0.0, inplace=False)
            (post_mlp_dropout): Dropout(p=0.0, inplace=False)
            (attention): GPTNeoXAttention(
              (rotary_emb): GPTNeoXRotaryEmbedding()
              (query_key_value): Linear8bitLt(
                in_features=2816, out_features=8448, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_feature

In [19]:
def generate_prediction_prompt(instruction: str) -> str:
    prompt = f"""### 指示:
{instruction}

### 回答:
"""
    return prompt

In [22]:
def generate(instruction):
    prompt = generate_prediction_prompt(instruction)

    input_ids = tokenizer(
        prompt, return_tensors="pt", truncation=True, add_special_tokens=False
    ).input_ids.cuda()

    outputs = lora_model.generate(
        input_ids=input_ids,
        max_new_tokens=30,
        # min_new_tokens=30,  # minの指定がなかったので揃えてみる
        do_sample=True,
        temperature=0.1,  # temperatureも揃える
        top_p=0.75,  # 揃っていないけれどなんなんだろう（宿題）
        top_k=40,
        no_repeat_ngram_size=2,
    )

    outputs = outputs[0].tolist()
    return tokenizer.decode(outputs)

In [23]:
generate("Linuxでシェルを終了するコマンドを教えてください")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:3 for open-end generation.


'### 指示: Linuxでシェルを終了するコマンドを教えてください  ### 回答: shutdown</s>'

In [24]:
print(generate("Linuxコマンドを教えてください"))
print(generate("lsコマンドは何のために使用されますか？"))
print(generate("wgetコマンドの主な用途と使用方法について説明していただけますか？"))
print(generate("catとmore、lessコマンドの違いは何ですか？"))
print(generate("pwdコマンドの機能とは何ですか？？"))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:3 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:3 for open-end generation.


### 指示: Linuxコマンドを教えてください  ### 回答: ls</s>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:3 for open-end generation.


### 指示: lsコマンドは何のために使用されますか?  ### 回答: ファイルシステムの情報を表示する</s>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:3 for open-end generation.


### 指示: wgetコマンドの主な用途と使用方法について説明していただけますか?  ### 回答: Linuxでファイルをダウンロードするコマンドです。  </s>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:3 for open-end generation.


### 指示: catとmore、lessコマンドの違いは何ですか?  ### 回答: more コマンドは、ファイルの内容を表示します。 less はファイルを1行ずつ表示し、コマンドを入力するのに使用
### 指示: pwdコマンドの機能とは何ですか??  ### 回答: コマンドの出力を表示する</s>
