## model和tokenizer的加载


非常建议去看官方文档  [HuggingFace的AutoClass文档](https://huggingface.co/docs/transformers/model_doc/auto#auto-classes)

### 分词器tokenizer一般使用Autotokenizer加载,使用from_pretrained加载

![这是图片](../static/屏幕截图%202024-03-03%20161245.png)

### 模型model一般使用Automodel或者使用AutoModelForCausalLM加载,
### 使用from_pretrained或者from_config加载

![图片](../static/屏幕截图%202024-03-03%20161437.png)
![图片](../static/屏幕截图%202024-03-03%20161411.png)


### 把模型加载到gpu可以使用model.cuda()  或者   model.to(device)
``` shell
model=AutoModelForCausalLM.from_pretrained('E:\model\language\opt-125m',trust_remote_code=True,device_map='auto').cuda()
```

```shell
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
```

In [1]:
from transformers import AutoTokenizer,AutoModel,AutoModelForCausalLM
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [2]:
model=AutoModelForCausalLM.from_pretrained('E:\model\language\opt-125m',trust_remote_code=True,device_map='auto').to(device)

In [3]:
model

OPTForCausalLM(
  (model): OPTModel(
    (decoder): OPTDecoder(
      (embed_tokens): Embedding(50272, 768, padding_idx=1)
      (embed_positions): OPTLearnedPositionalEmbedding(2050, 768)
      (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (layers): ModuleList(
        (0-11): 12 x OPTDecoderLayer(
          (self_attn): OPTAttention(
            (k_proj): Linear(in_features=768, out_features=768, bias=True)
            (v_proj): Linear(in_features=768, out_features=768, bias=True)
            (q_proj): Linear(in_features=768, out_features=768, bias=True)
            (out_proj): Linear(in_features=768, out_features=768, bias=True)
          )
          (activation_fn): ReLU()
          (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (fc1): Linear(in_features=768, out_features=3072, bias=True)
          (fc2): Linear(in_features=3072, out_features=768, bias=True)
          (final_layer_norm): LayerNorm((768,), ep

In [5]:
n=0
for name, param in model.named_parameters():
    n=n+1
    print(f"Parameter {name} data type: {param.dtype}")
print(n)

Parameter model.decoder.embed_tokens.weight data type: torch.float32
Parameter model.decoder.embed_positions.weight data type: torch.float32
Parameter model.decoder.final_layer_norm.weight data type: torch.float32
Parameter model.decoder.final_layer_norm.bias data type: torch.float32
Parameter model.decoder.layers.0.self_attn.k_proj.weight data type: torch.float32
Parameter model.decoder.layers.0.self_attn.k_proj.bias data type: torch.float32
Parameter model.decoder.layers.0.self_attn.v_proj.weight data type: torch.float32
Parameter model.decoder.layers.0.self_attn.v_proj.bias data type: torch.float32
Parameter model.decoder.layers.0.self_attn.q_proj.weight data type: torch.float32
Parameter model.decoder.layers.0.self_attn.q_proj.bias data type: torch.float32
Parameter model.decoder.layers.0.self_attn.out_proj.weight data type: torch.float32
Parameter model.decoder.layers.0.self_attn.out_proj.bias data type: torch.float32
Parameter model.decoder.layers.0.self_attn_layer_norm.weight da

In [7]:
# 获取当前模型占用的 GPU显存（差值为预留给 PyTorch 的显存）
memory_footprint_bytes = model.get_memory_footprint()
memory_footprint_mib = memory_footprint_bytes / (1024 ** 2)  # 转换为 MiB

print(f"{memory_footprint_mib:.2f}MiB")

477.75MiB


In [13]:
tokenizer=AutoTokenizer.from_pretrained('E:\model\language\opt-125m')

### 把变量inputs加载入GPU,也是两种方式
```shell
inputs=inputs.to(device)
```
```shell
inputs = {key: value.cuda() for key, value in inputs.items()}
```

In [14]:
def chat(text):
    inputs=tokenizer(text,return_tensors="pt").to(device)
    print("inputs:{}".format(inputs))
    output=model.generate(**inputs,max_length=20)
    print("output:{}".format(output))
    res=tokenizer.decode(output[0],skip_special_tokens=True)
    print("回答：{}".format(res))

In [15]:
chat('hello')

inputs:{'input_ids': tensor([[    2, 42891]], device='cuda:0'), 'attention_mask': tensor([[1, 1]], device='cuda:0')}
output:tensor([[    2, 42891,     6,    38,   437,    10,    92,   869,     8,    38,
           437,   546,    13,    10,   205,   165,     7,   310,    19,     4]],
       device='cuda:0')
回答：hello, I'm a new player and I'm looking for a good team to play with.


In [16]:
chat('中国')

inputs:{'input_ids': tensor([[    2, 47643, 47516, 10809]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1]], device='cuda:0')}
output:tensor([[    2, 47643, 47516, 10809, 47973, 48570,  3602, 48549, 47341, 36714,
         15389, 15264, 47516, 10809, 47973, 48570,  3602, 48538, 36714, 15389]],
       device='cuda:0')
回答：中国人民は、米国人民が�


In [17]:
model=model.half()

In [18]:
def chat(text):
    inputs=tokenizer(text,return_tensors="pt").to(device)
    print("inputs:{}".format(inputs))
    output=model.generate(**inputs,max_length=20)
    print("output:{}".format(output))
    res=tokenizer.decode(output[0],skip_special_tokens=True)
    print("回答：{}".format(res))

In [19]:
chat('hello')

inputs:{'input_ids': tensor([[    2, 42891]], device='cuda:0'), 'attention_mask': tensor([[1, 1]], device='cuda:0')}
output:tensor([[    2, 42891,     6,    38,   437,    10,    92,   869,     8,    38,
           437,   546,    13,    10,   205,   165,     7,   310,    19,     4]],
       device='cuda:0')
回答：hello, I'm a new player and I'm looking for a good team to play with.


In [20]:
chat('中国')

inputs:{'input_ids': tensor([[    2, 47643, 47516, 10809]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1]], device='cuda:0')}
output:tensor([[    2, 47643, 47516, 10809, 47973, 48570,  3602, 48549, 47341, 36714,
         15389, 15264, 47516, 10809, 47973, 48570,  3602, 48538, 36714, 15389]],
       device='cuda:0')
回答：中国人民は、米国人民が�
