# Hugging face

从meta公司下载参数为bfloat16, 需要转换为fb16格式才能被Hugging face框架使用

In [1]:
import platform

if platform.system() == "Windows":
    MODEL_PATH = "E:/THUDM/llama2.hf/llama-2-7b-chat"
else:
    MODEL_PATH = "/opt/Data/THUDM/llama2.hf/llama-2-7b-chat-hf"

# 1 LlamaForCausalLM

## 1.1 CPU计算

In [2]:
# https://huggingface.co/docs/transformers/main_classes/model
# torch_dtype
# torch.float16 or torch.bfloat16 or torch.float: load in a specified dtype, ignoring the model’s config.torch_dtype if one exists. If not specified
# the model will get loaded in torch.float (fp32).

import torch

from transformers import LlamaForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH)
model = LlamaForCausalLM.from_pretrained(MODEL_PATH)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [5]:
prompt = "Hey, are you conscious? Can you talk to me?"

inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(inputs.input_ids, max_length=4096, pad_token_id=tokenizer.eos_token_id)
result = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(result[0])

Hey, are you conscious? Can you talk to me?

I'm just an AI, I don't have consciousness or the ability to talk like a human. I'm here to help answer your questions and provide information to the best of my ability. Is there something specific you would like to know or discuss?


## 1.2 GPU计算

In [2]:
import torch

from transformers import LlamaForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH)
model = LlamaForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=torch.float16).to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [5]:
prompt = "Hey, are you conscious? Can you talk to me?"

# Generate
inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(inputs.input_ids.cuda(), max_length=4096, pad_token_id=tokenizer.eos_token_id)
result = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(result[0])

Hey, are you conscious? Can you talk to me?

I'm just an AI, I don't have consciousness or the ability to talk like a human. I'm here to help answer your questions and provide information to the best of my ability. Is there something specific you would like to know or discuss?


## 1.3 TextStreamer

定义一个TextStreamer对象

In [16]:
import torch
from transformers import TextStreamer

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True, pad_token_id=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. 
Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

A. 第1次调用

In [15]:
prompt = instruction.format("what's the resulat of 10*4?")
inputs = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
generate_ids = model.generate(inputs, max_new_tokens=4096, pad_token_id=tokenizer.eos_token_id, streamer=streamer)

Thank you for asking! The result of 10 x 4 is 40. I'm here to help and provide accurate information, while being safe and respectful. Please let me know if you have any other questions!


B. 第2次调用

In [18]:
# https://github.com/LinkSoul-AI/Chinese-Llama-2-7b

prompt = instruction.format("When is the best time to visit Beijing, and do you have any suggestions for me?")
inputs = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
generate_ids = model.generate(inputs, max_new_tokens=4096, pad_token_id=tokenizer.eos_token_id, streamer=streamer)

Thank you for your kind and respectful question! I'm happy to help you plan your trip to Beijing.
The best time to visit Beijing depends on your preferences and what you want to experience. Beijing has a continental climate with four distinct seasons, and each season offers something unique. Here are some general guidelines to help you decide:
Spring (March to May): This is a great time to visit Beijing if you want to avoid the crowds and enjoy mild weather. The temperatures are comfortable, with average highs around 15°C (59°F), and lows around 5°C (41°F). You can enjoy the city's blooming flowers, visit the famous Cherry Blossom Festival, and explore the city's many parks and gardens.
Summer (June to August): Summer in Beijing can be hot and humid, with average highs around 28°C (82°F) and lows around 22°C (72°F). While it can be challenging to visit during this time, it's a great opportunity to experience the city's vibrant nightlife and cultural events. Be sure to stay hydrated and

# 2 pipeline

定义pipleline对象

In [2]:
import transformers
import torch

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
pipeline = transformers.pipeline(
    "text-generation",
    model=MODEL_PATH,
    torch_dtype=torch.float16,
    device_map="cuda"
)

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

使用pipleline对象

In [4]:
sequences = pipeline(
    'I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?\n',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    max_length=4096,
    truncation=True,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Result: I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?

Answer: Yes, I do! Based on your preferences, here are a few TV shows you might enjoy:

1. "The Sopranos" - This HBO series is about a New Jersey mob boss, Tony Soprano, and his family. Like "Breaking Bad," it explores themes of loyalty, family, and the blurred lines between right and wrong.
2. "Narcos" - This Netflix series tells the true story of drug kingpin Pablo Escobar and the DEA agents tasked with bringing him down. If you enjoyed the gripping storytelling and morally complex characters of "Breaking Bad," you'll likely enjoy "Narcos" too.
3. "The Wire" - This HBO series explores the drug trade in Baltimore from multiple perspectives, including law enforcement, drug dealers, and politicians. Like "Breaking Bad," it's known for its nuanced characters and complex storytelling.
4. "True Detective" - This anthology series follows a different cast of characters and sto