### **Downloading, Saving, and Loading Model Weights Successful using Transformer Pipelines**

In [1]:
import os
from huggingface_hub import hf_hub_download, login, whoami
import dotenv

dotenv.load_dotenv()
HUGGINGFACE_TOKEN = os.getenv("HUGGINGFACE_API_KEY")
login(token=HUGGINGFACE_TOKEN)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.


  from .autonotebook import tqdm as notebook_tqdm


Token is valid (permission: read).
Your token has been saved to /home/dpaul/.cache/huggingface/token
Login successful


In [2]:
import torch
from transformers import AutoTokenizer

model = "google/gemma-2b-it"

tokenizer = AutoTokenizer.from_pretrained(model)



In [3]:
from transformers import pipeline

pipeline = pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)


Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards: 100%|██████████| 2/2 [00:24<00:00, 12.38s/it]


In [4]:
messages = [
        {"role": "user", "content": "What is 2+2?"},
]
prompt = pipeline.tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True)


In [5]:
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    add_special_tokens=True,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)


In [6]:
print(outputs[0]["generated_text"][len(prompt):])

Sure. 2+2 is 4. 

It is a simple addition problem that can be solved by adding the two numbers together.


In [7]:
pipeline.save_pretrained("gemma_model_saved")

### Loading from Disk

In [8]:
from transformers import AutoTokenizer
from transformers import pipeline
import torch

model = "gemma_model_saved"
tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00,  2.69s/it]


In [9]:
# Setting msg and applying prompt template
# messages = [
#         {"role": "user", "content": "Write a poem about the sea."},
# ]
# prompt = pipeline.tokenizer.apply_chat_template(
#     messages, tokenize=False, add_generation_prompt=True)
prompt = "What is 136+254?"

# Generating response
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    add_special_tokens=True,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

In [10]:
print(outputs[0]["generated_text"][len(prompt):])



136 + 254 = 390.


In [11]:
outputs

[{'generated_text': 'What is 136+254?\n\n136 + 254 = 390.'}]