# KoAlpaca

- github: [Beomi/KoAlpaca](https://github.com/Beomi/KoAlpaca)
- huggingface: [beomi/KoAlpaca](https://huggingface.co/beomi/KoAlpaca)

## Download Model

```py
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("beomi/KoAlpaca")
>>> model = AutoModelForCausalLM.from_pretrained("beomi/KoAlpaca")
Downloading (…)l-00001-of-00003.bin: 100%|████████████████| 9.88G/9.88G [02:49<00:00, 58.2MB/s]
Downloading (…)l-00002-of-00003.bin: 100%|████████████████| 9.89G/9.89G [02:55<00:00, 56.2MB/s]
Downloading (…)l-00003-of-00003.bin: 100%|████████████████| 7.18G/7.18G [03:01<00:00, 39.6MB/s]
Downloading shards: 100%|█████████████████████████████████| 3/3 [08:47<00:00, 175.99s/it]
Loading checkpoint shards: 100%|██████████████████████████| 3/3 [00:09<00:00,  3.18s/it]
Downloading (…)neration_config.json: 100%|████████████████| 137/137 [00:00<00:00, 150kB/s]
```


## Cuda

```bash
nvidia-smi
```

```bash
Fri Apr 14 23:21:36 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:2B:00.0  On |                  N/A |
|  0%   50C    P3    56W / 270W |   1243MiB /  8192MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1640      G   /usr/lib/xorg/Xorg                701MiB |
|    0   N/A  N/A      1849      G   /usr/bin/gnome-shell              141MiB |
|    0   N/A  N/A      7145      G   ...tExperimentalSharedMemory      323MiB |
|    0   N/A  N/A      7962      G   ...local/kitty.app/bin/kitty        9MiB |
+-----------------------------------------------------------------------------+
```

In [1]:
import torch, gc

In [2]:
gc.collect()

0

In [3]:
torch.cuda.empty_cache()

In [4]:
# torch.cuda.memory_summary(device=None, abbreviated=False)

In [5]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Model

In [6]:
from transformers import LlamaTokenizer, LlamaForCausalLM

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
tokenizer = LlamaTokenizer.from_pretrained("beomi/KoAlpaca")

In [8]:
model = LlamaForCausalLM.from_pretrained("beomi/KoAlpaca") #.to(device)

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:09<00:00,  3.16s/it]


In [9]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32001, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNo

In [11]:
model.generate(**tokenizer('안녕하세요?', return_tensors='pt'))

tensor([[    2, 29871, 31734,   238,   136,   152, 30944, 31578, 31527, 29973,
         29871,   239,   163,   131, 31081, 29871,   239,   184,   159,   237]])

In [12]:
PROMPT_DICT = {
    "prompt_input": (
        "Below is an instruction that describes a task, paired with an input that provides further context.\n"
        "아래는 작업을 설명하는 명령어와 추가적 맥락을 제공하는 입력이 짝을 이루는 예제입니다.\n\n"
        "Write a response that appropriately completes the request.\n요청을 적절히 완료하는 응답을 작성하세요.\n\n"
        "### Instruction(명령어):\n{instruction}\n\n### Input(입력):\n{input}\n\n### Response(응답):"
    ),
    "prompt_no_input": (
        "Below is an instruction that describes a task.\n"
        "아래는 작업을 설명하는 명령어입니다.\n\n"
        "Write a response that appropriately completes the request.\n명령어에 따른 요청을 적절히 완료하는 응답을 작성하세요.\n\n"
        "### Instruction(명령어):\n{instruction}\n\n### Response(응답):"
    ),
}

In [13]:
def gen(prompt, user_input=None, max_new_tokens=128, temperature=0.5):
    if user_input:
        x = PROMPT_DICT['prompt_input'].format(instruction=prompt, input=user_input)
    else:
        x = PROMPT_DICT['prompt_no_input'].format(instruction=prompt)
    
    input_ids = tokenizer.encode(x, return_tensors="pt") #.to('cuda:0')
    gen_tokens = model.generate(
        input_ids, 
        max_new_tokens=max_new_tokens, 
        num_return_sequences=1, 
        temperature=temperature,
        no_repeat_ngram_size=6,
        do_sample=True,
    )
    gen_text = tokenizer.decode(gen_tokens[0], skip_special_tokens=True)
    
    return gen_text.replace(x, '')

In [14]:
# Example usage:
prompt = "Python으로 uptime을 찾는 코드"
generated_text = gen(prompt)
print(generated_text)

import time
print(time.uptime())


In [16]:
# Example usage:
prompt = "내 오늘의 운세를 알려줘..."
user_input = "오늘 날짜는 2023년 4월 14일 금요일"
generated_text = gen(prompt, user_input, max_new_tokens=300, temperature=0.8)
print(generated_text)

"내 오늝의 운생은 어떤 일이 일어날지 말 알 수 없습니다. 예를 들어, 오늘은 휴가일 또는 새 취미 생활 계획을 세워보는 것입니다."
