<a href="https://colab.research.google.com/github/shake/colab-Llama-2-ipynb/blob/main/02-quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Quick Start Notebook

This notebook shows how to train a Llama 2 model on a single GPU (e.g. A10 with 24GB) using int8 quantization and LoRA.

### Step 0: Install pre-requirements and convert checkpoint

The example uses the Hugging Face trainer and model which means that the checkpoint has to be converted from its original format into the dedicated Hugging Face format.
The conversion can be achieved by running the `convert_llama_weights_to_hf.py` script provided with the transformer package.
Given that the original checkpoint resides under `models/7B` we can install all requirements and convert the checkpoint with:

此笔记本展示了如何使用 int8 量化和 LoRA 在单个 GPU（例如 24GB 的 A10）上训练 Llama 2 模型。

该示例使用 Hugging Face 训练器和模型，这意味着检查点必须从其原始格式转换为专用的 Hugging Face 格式。可以通过运行转换器包提供的 convert_llama_weights_to_hf.py 脚本来实现转换。鉴于原始检查点位于以下位置，我们可以安装所有要求并使用以下 models/7B 命令转换检查点：

In [None]:
# %%bash
# pip install llama-recipes transformers datasets accelerate sentencepiece protobuf==3.20 py7zr scipy peft bitsandbytes fire torch_tb_profiler ipywidgets
# TRANSFORM=`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`
# python ${TRANSFORM} --input_dir models --model_size 7B --output_dir models_hf/7B

In [1]:
# 安装需要的包
!pip -q install gradio huggingface_hub

# import
import os
import shutil
import huggingface_hub as hh
import pandas as pd

# 下载llama 2，需要使用HuggingFace的token通过验证才能下载，其他模型，这一步可以省掉。
# 配置git存储密钥
! git config --global credential.helper store
!huggingface-cli login --token hf_FqOyPDAURgkbG --add-to-git-credential

# 下载模型，设置huggingface的repo_id，更换不同的模型，
# 只需要在repo_id设置就可以。其他地方无需调整。
repo_id = "meta-llama/Llama-2-7b"
repo_name = repo_id.replace("/","---")

# 定义容量显示和下载路径

def format_size(bytes, precision=2):
	"""
	Convert a file size in bytes to a human-readable format like KB, MB, GB, etc.
	Huggingface use 1000 not 1024
	"""
	units = ["B", "KB", "MB", "GB", "TB", "PB"]
	size = float(bytes)
	index = 0

	while size >= 1000 and index < len(units) - 1:
		index += 1
		size /= 1000

	return f"{size:.{precision}f} {units[index]}"


def list_repo_files_info(repo_id,token=None):
	data_ls = []
	for file in list(hh.list_files_info(repo_id)):
		data_ls.append([file.path,format_size(file.size)])
	files = [file[0] for file in data_ls]
	data = pd.DataFrame(data_ls,columns = ['文件名','大小'])
	return data, files

# 模型下载到当前目录下的"./download"目录
def download_file(repo_id,filenames):
	print(filenames)
	repo_name = repo_id.replace("/","---")

	for filename in filenames:
		print(filename)
		out = hh.hf_hub_download(repo_id=repo_id,filename=filename,local_dir=f"./download/{repo_name}",local_dir_use_symlinks=False,force_download =True)
	out_path = f"./download/{repo_name}"
	return out_path

# 查看模型的文件
data, filenames = list_repo_files_info(repo_id)
filenames

# 开始下载模型
out_path = download_file(repo_id,filenames)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m82.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m305.1/305.1 kB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.7/138.7 kB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m381.9/381.9 kB[0m [31m42.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.7/45.7 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.3/60.3 kB[0m [31m8.1

.gitattributes:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

LICENSE.txt


LICENSE.txt:   0%|          | 0.00/7.02k [00:00<?, ?B/s]

README.md


README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

Responsible-Use-Guide.pdf


Responsible-Use-Guide.pdf:   0%|          | 0.00/1.25M [00:00<?, ?B/s]

USE_POLICY.md


USE_POLICY.md:   0%|          | 0.00/4.77k [00:00<?, ?B/s]

checklist.chk


checklist.chk:   0%|          | 0.00/100 [00:00<?, ?B/s]

consolidated.00.pth


consolidated.00.pth:   0%|          | 0.00/13.5G [00:00<?, ?B/s]

params.json


params.json:   0%|          | 0.00/102 [00:00<?, ?B/s]

tokenizer.model


tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer_checklist.chk


tokenizer_checklist.chk:   0%|          | 0.00/50.0 [00:00<?, ?B/s]

In [2]:
!wget https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 huggingface_hub sentencepiece

--2023-12-28 05:43:33--  https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13283 (13K) [text/plain]
Saving to: ‘convert_llama_weights_to_hf.py’


2023-12-28 05:43:33 (93.9 MB/s) - ‘convert_llama_weights_to_hf.py’ saved [13283/13283]

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:
!mv /content/download/meta-llama---Llama-2-7b /content/Llama-2-7b
!mkdir /content/Llama-2-7b/7B
!cp /content/Llama-2-7b/params.json /content/Llama-2-7b/7B/params.json

In [4]:
ls /content/Llama-2-7b/7B/

params.json


In [5]:
from transformers.utils.hub import move_cache

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

In [6]:
# 开始从llama 原始格式转换成hf格式，
!python3 convert_llama_weights_to_hf.py \
    --input_dir ./Llama-2-7b  --model_size 7B --output_dir ./Llama-2-7b-hf

2023-12-28 05:45:04.584900: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-28 05:45:04.585029: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-28 05:45:04.586943: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Fetching all parameters from the checkpoint at ./L

In [7]:
#解决colab字符集错误
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

In [8]:
!pip3 install -q gekko pandas
!git clone https://github.com/PanQiWei/AutoGPTQ

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.2/12.2 MB[0m [31m36.0 MB/s[0m eta [36m0:00:00[0m
[?25hCloning into 'AutoGPTQ'...
remote: Enumerating objects: 3965, done.[K
remote: Counting objects: 100% (117/117), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 3965 (delta 71), reused 78 (delta 42), pack-reused 3848[K
Receiving objects: 100% (3965/3965), 7.93 MiB | 15.47 MiB/s, done.
Resolving deltas: 100% (2538/2538), done.


In [9]:
%cd AutoGPTQ
!pwd


/content/AutoGPTQ
/content/AutoGPTQ


In [10]:
!pip3 install .

Processing /content/AutoGPTQ
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting accelerate>=0.22.0 (from auto-gptq==0.7.0.dev0+cu1222)
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Collecting rouge (from auto-gptq==0.7.0.dev0+cu1222)
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Collecting peft>=0.5.0 (from auto-gptq==0.7.0.dev0+cu1222)
  Downloading peft-0.7.1-py3-none-any.whl (168 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.3/168.3 kB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: auto-gptq
  Building wheel for auto-gptq (setup.py) ... [?25l[?25hdone
  Created wheel for auto-gptq: filename=auto_gptq-0.7.0.dev0+cu1222-cp310-cp310-linux_x86_64.whl size=14389692 sha256=e6d9fc6afa99adfded25ad1e6cf132d4e536c8efb128dbc21e87a8beb8ac6068
  Stored in directory: /tmp/pip-ephem-w

In [11]:
%cd /content/
!pwd

/content
/content


In [None]:
!wget https://gist.githubusercontent.com/TheBloke/b47c50a70dd4fe653f64a12928286682/raw/ebcee019d90a178ee2e6a8107fdd7602c8f1192a/quant_autogptq.py

In [13]:
!ls


AutoGPTQ			download    Llama-2-7b-hf      sample_data
convert_llama_weights_to_hf.py	Llama-2-7b  quant_autogptq.py


In [14]:
!python3 quant_autogptq.py ./Llama-2-7b-hf ./llama-2-7b-hf-gptq \
wikitext --bits 4 --group_size 128 --desc_act 0 --damp 0.1 \
--dtype float16 --seqlen 4096 --num_samples 128 --use_fast

2023-12-28 06:00:07 INFO [__main__] Loading tokenizer
Downloading data: 100% 733k/733k [00:00<00:00, 960kB/s]
Downloading data: 100% 6.36M/6.36M [00:03<00:00, 1.80MB/s]
Downloading data: 100% 657k/657k [00:00<00:00, 2.40MB/s]
Generating test split: 100% 4358/4358 [00:00<00:00, 102212.00 examples/s]
Generating train split: 100% 36718/36718 [00:00<00:00, 733182.52 examples/s]
Generating validation split: 100% 3760/3760 [00:00<00:00, 700696.81 examples/s]
2023-12-28 06:00:19 INFO [__main__] Tokenising wikitext2
2023-12-28 06:02:40 INFO [__main__] Quantising with bits=4 group_size=128 desc_act=False damp=0.1 to ./llama-2-7b-hf-gptq
2023-12-28 06:02:42 INFO [__main__] Loading model from ./Llama-2-7b-hf with trust_remote_code=False and dtype=torch.float16
Loading checkpoint shards: 100% 2/2 [00:10<00:00,  5.48s/it]
2023-12-28 06:02:53 INFO [__main__] Starting quantization to ./llama-2-7b-hf-gptq with use_triton=False
2023-12-28 06:02:54 INFO [auto_gptq.modeling._base] Start quantizing layer 

In [21]:
!ls

AutoGPTQ			download    Llama-2-7b-hf	quant_autogptq.py
convert_llama_weights_to_hf.py	Llama-2-7b  llama-2-7b-hf-gptq	sample_data


In [20]:
!du -sh ./llama-2-7b-hf-gptq

3.7G	./llama-2-7b-hf-gptq


### Step 1: Load the model

Point model_id to model weight folder

In [22]:
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

model_id="./Llama-2-7b-hf"

tokenizer = LlamaTokenizer.from_pretrained(model_id)

model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Step 2: Load the preprocessed dataset

We load and preprocess the samsum dataset which consists of curated pairs of dialogs and their summarization:

In [24]:
!pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 llama-recipes


Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/test/cu118
Collecting llama-recipes
  Downloading llama_recipes-0.0.1-py3-none-any.whl (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Collecting black (from llama-recipes)
  Downloading black-23.12.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
Collecting fire (from llama-recipes)
  Downloading fire-0.5.0.tar.gz (88 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.3/88.3 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting loralib (from llama-recipes)
  Downloading loralib-0.1.2-py3-none-any.whl (10 kB)
Collecting optimum (from llama-recipes)
  Downloading optimum-1.16.1-py3-none-any.whl (403 kB)
[2K     

In [25]:
from llama_recipes.utils.dataset_utils import get_preprocessed_dataset
from llama_recipes.configs.datasets import samsum_dataset

train_dataset = get_preprocessed_dataset(tokenizer, samsum_dataset, 'train')

Downloading data:   0%|          | 0.00/6.06M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/347k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/335k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/14732 [00:00<?, ? examples/s]

Map:   0%|          | 0/14732 [00:00<?, ? examples/s]

Map:   0%|          | 0/14732 [00:00<?, ? examples/s]

### Step 3: Check base model

Run the base model on an example input:

In [26]:
eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))


Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a pu

We can see that the base model only repeats the conversation.

### Step 4: Prepare model for PEFT

Let's prepare the model for Parameter Efficient Fine Tuning (PEFT):

In [27]:
model.train()

def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_int8_training,
    )

    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules = ["q_proj", "v_proj"]
    )

    # prepare int-8 model for training
    model = prepare_model_for_int8_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model, peft_config

# create peft config
model, lora_config = create_peft_config(model)



trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199




### Step 5: Define an optional profiler

In [28]:
from transformers import TrainerCallback
from contextlib import nullcontext
enable_profiler = False
output_dir = "tmp/llama-output"

config = {
    'lora_config': lora_config,
    'learning_rate': 1e-4,
    'num_train_epochs': 1,
    'gradient_accumulation_steps': 2,
    'per_device_train_batch_size': 2,
    'gradient_checkpointing': False,
}

# Set up profiler
if enable_profiler:
    wait, warmup, active, repeat = 1, 1, 2, 1
    total_steps = (wait + warmup + active) * (1 + repeat)
    schedule =  torch.profiler.schedule(wait=wait, warmup=warmup, active=active, repeat=repeat)
    profiler = torch.profiler.profile(
        schedule=schedule,
        on_trace_ready=torch.profiler.tensorboard_trace_handler(f"{output_dir}/logs/tensorboard"),
        record_shapes=True,
        profile_memory=True,
        with_stack=True)

    class ProfilerCallback(TrainerCallback):
        def __init__(self, profiler):
            self.profiler = profiler

        def on_step_end(self, *args, **kwargs):
            self.profiler.step()

    profiler_callback = ProfilerCallback(profiler)
else:
    profiler = nullcontext()

### Step 6: Fine tune the model

Here, we fine tune the model for a single epoch which takes a bit more than an hour on a A100.

In [29]:
from transformers import default_data_collator, Trainer, TrainingArguments



# Define training args
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    bf16=True,  # Use BF16 if available
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="no",
    optim="adamw_torch_fused",
    max_steps=total_steps if enable_profiler else -1,
    **{k:v for k,v in config.items() if k != 'lora_config'}
)

with profiler:
    # Create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        data_collator=default_data_collator,
        callbacks=[profiler_callback] if enable_profiler else [],
    )

    # Start training
    trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss
10,1.9424
20,1.8251
30,1.7823
40,1.7504
50,1.7215
60,1.6919
70,1.6968
80,1.7011
90,1.6792
100,1.6898


### Step 7:
Save model checkpoint

In [30]:
model.save_pretrained(output_dir)

### Step 8:
Try the fine tuned model on the same example again to see the learning progress:

In [31]:
model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))



Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
A wants to get a puppy for his son. He took him to the animal shelter last Monday. He showed him one that he really liked. He wants to name it after his dead hamster - Lemmy.
