### How to Download LLM 
1. Meta 홈페이지 
2. huggingface CLI 
3. transformers
4. snapshot download
5. Ollama -> `test_llama3_ollama.ipynb` 파일 참고

### 2. huggingface CLI Downloads  

We also provide downloads on [Hugging Face](https://huggingface.co/meta-llama), in both transformers and native `llama3` formats. To download the weights from Hugging Face, please follow these steps:
  
- Visit one of the repos, for example [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
- Read and accept the license. Once your request is approved, you'll be granted access to all the Llama 3 models. Note that requests used to take up to one hour to get processed.
- To download the original native weights to use with this repo, click on the "Files and versions" tab and download the contents of the `original` folder. You can also download them from the command line if you   
- `pip install huggingface-hub`:  
- `huggingface-cli login` 
-  ```
    huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original/*" --local-dir meta-llama/Meta-Llama-3-8B-Instruct
    ```
- 이와 같이 원본 라마3 체크포인트를 다운 받은 후, 간단히 pipeline방법을 사용하여 inference가능
  - [pipeline](https://huggingface.co/docs/transformers/en/main_classes/pipelines) snippet will download and cache the weights
- pipeline은 사용방법이 직관적이고 매우 쉬운 반면, custom할 수 있는 폭이 훨신 좁아서 여러가지 시도를 하고싶다면 AutoModelForCausalLM.from_pretrained()로 불러오는 것이 좋다.   

Pipeline

In [None]:
import transformers
import torch

# model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation", 
    model = model_id,
    model_kwargs = {"torch_dtype": torch.bfloat16},
    device_map="auto",
)



terminators = [
    pipeline.tokenizer.eos_token_id,
    pipelone.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    messages,
    max_new_tokens = 256,
    eos_token_id = terminators,
    do_sample=True,
    temperature = 0.6,
    top_p=0.9,
)


In [None]:
print(outputs[0]["generated_text"][-1])

### 3. Transformers Library 사용하여 Downloads

- `pip install accelerate`

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# 모델 로드
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype = torch.bfloat16,
    device_map = "auto",
)

# 모델 저장
save_dir = "./llama_3_8b_instruct"
tokenizer.save_pretrained(save_dir)
model.save_pretrained(save_dir)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained(save_dir)
model = AutoModelForCausalLM.from_pretrained(save_dir)


robot_command = """The robot is equipped with wheels and a manipulator arm.
When given the command "Water, please" to the robot:

1. What locations does the robot need to move to in order to execute the command? Please provide at least three common locations where that is typically found.
2. What actions does the robot need to perform?""" 

messages = [
    {"role": "user", "content": robot_command}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

### 4. Snapshot Download
- snapshot_download 함수는 Hugging Face Hub에서 특정 모델이나 데이터셋의 스냅샷을 로컬 디렉토리에 다운로드하는 데 사용됨. 이 함수는 특히 모델 체크포인트, 구성 파일, 토크나이저 파일 등을 포함한 전체 리포지토리를 다운로드하는 데 유용함
- llama.cpp 를 사용하기 위한 모델 다운로드를 위해 해당 방법 이용

In [1]:
from huggingface_hub import snapshot_download

snapshot_download(repo_id="meta-llama/Meta-Llama-3-8B-Instruct", local_dir="./models/llama_3_8b_instruct",
                  local_dir_use_symlinks=False, ignore_patterns=["original/*"],)


  from .autonotebook import tqdm as notebook_tqdm
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
Fetching 14 files: 100%|██████████| 14/14 [00:22<00:00,  1.60s/it]


'/home/jetson/llamaR/Llama3-Playground/models/llama_3_8b_instruct'