## Instruct Lab demo

Let's try out the demo for instruct lab using our sno-llama environment - [you can see the youtube video here](https://www.youtube.com/watch?v=pgK-70iLz_o).

First we need to build an instructlab notebook image.

Take a look at the [REDAME.md](instructlab/README.md), run the instructions and build a custom notebook image for this environment.

Tag the image so we can use it to launch an ilab notebook.

We can check that ilab is deployed using:

In [1]:
!ilab

Usage: ilab [OPTIONS] COMMAND [ARGS]...

  CLI for interacting with InstructLab.

  If this is your first time running InstructLab, it's best to start with
  `ilab init` to create the environment.

Options:
  --config PATH  Path to a configuration file.  [default: config.yaml]
  --version      Show the version and exit.
  --help         Show this message and exit.

Commands:
  chat      Run a chat using the modified model
  check     (Deprecated) Check that taxonomy is valid
  convert   Converts model to GGUF
  diff      Lists taxonomy files that have changed since <taxonomy-base>...
  download  Download the model(s) to train
  generate  Generates synthetic data to enhance your example data
  init      Initializes environment for InstructLab
  list      (Deprecated) Lists taxonomy files that have changed since
            <taxonomy-base>.
  serve     Start a local server
  test      Runs basic test to ensure model correctness
  train     Takes synthetic data generated locally with `ila

Create a directory called instructlab. We will use this to train a model using instructlab. We can do most of this work on the terminal. You may find it easier to just `oc rsh jupyter-nb-admin-0` into your running notebook and run the instructions from there.

In [7]:
!mkdir -p $HOME/instructlab/models && cd $HOME/instructlab

Now we need to initialize instructlab. Ask it to initize a blank taxonomy directory.

```bash
(app-root) ilab init
Welcome to InstructLab CLI. This guide will help you to setup your environment.
Please provide the following values to initiate the environment [press Enter for defaults]:
Path to taxonomy repo [taxonomy]: 
`taxonomy` seems to not exist or is empty. Should I clone https://github.com/instructlab/taxonomy.git for you? [y/N]: y
Cloning https://github.com/instructlab/taxonomy.git...
Generating `config.yaml` in the current directory...
Initialization completed successfully, you're ready to start using `ilab`. Enjoy!
(app-root) ls
config.yaml models taxonomy
(app-root)
```

The default configuration uses a `merlinite-7b-lab-Q4_K_M` based model - lets swap it out for our granite model we used in the sno-granite notebook demo.

In [8]:
!wget -O $HOME/instructlab/models/granite-7b-lab-Q4_K_M.gguf https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf?download=true

--2024-05-12 01:50:57--  https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf?download=true
Resolving huggingface.co (huggingface.co)... 3.160.5.109, 3.160.5.102, 3.160.5.25, ...
Connecting to huggingface.co (huggingface.co)|3.160.5.109|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs-us-1.huggingface.co/repos/b4/aa/b4aa486646dd15418b4143b2bd1fe324e82dcd41f9d35edd4c178194d9b57df9/6adeaad8c048b35ea54562c55e454cc32c63118a32c7b8152cf706b290611487?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27granite-7b-lab-Q4_K_M.gguf%3B+filename%3D%22granite-7b-lab-Q4_K_M.gguf%22%3B&Expires=1715737858&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNTczNzg1OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2I0L2FhL2I0YWE0ODY2NDZkZDE1NDE4YjQxNDNiMmJkMWZlMzI0ZTgyZGNkNDFmOWQzNWVkZDRjMTc4MTk0ZDliNTdkZjkvNmFkZWFhZDhjMDQ4YjM1ZWE1NDU2MmM

Now modify the `config.yaml` to point to our granite model

In [14]:
%%bash
cat <<'EOF' > $HOME/instructlab/config.yaml
chat:
  context: default
  greedy_mode: false
  logs_dir: data/chatlogs
  model: granite-7b-lab-Q4_K_M
  session: null
  vi_mode: false
  visible_overflow: true
general:
  log_level: INFO
generate:
  chunk_word_count: 1000
  model: granite-7b-lab-Q4_K_M
  num_cpus: 10
  num_instructions: 100
  output_dir: generated
  prompt_file: prompt.txt
  seed_file: seed_tasks.json
  taxonomy_base: origin/main
  taxonomy_path: taxonomy
serve:
  gpu_layers: -1
  host_port: 127.0.0.1:8000
  max_ctx_size: 4096
  model_path: models/granite-7b-lab-Q4_K_M.gguf
EOF

Open 3 terminals.

  (1) In the first we will run `ilab serve` to serve the model

```bash
(app-root) ilab serve
INFO 2024-05-12 01:56:50,024 lab.py:313 Using model 'models/granite-7b-lab-Q4_K_M.gguf' with -1 gpu-layers and 4096 max context size.
INFO 2024-05-12 01:56:51,387 server.py:196 Starting server process, press CTRL+C to shutdown server...
INFO 2024-05-12 01:56:51,387 server.py:197 After application startup complete see http://127.0.0.1:8000/docs for API.
```

  (2) In the second we will run `nvtop` utility built into the ilab image.

```bash
(app-root) nvtop
```

![ilab-nvtop.png](images/ilab-nvtop.png)

  (3) In the third we rill run `ilab chat` to chat to the model interactively.
    
```bash
(app-root) (app-root) ilab chat
╭──────────────────────────────────────────────────────────────────────────────── system ────────────────╮
│ Welcome to InstructLab Chat w/ GRANITE-7B-LAB-Q4_K_M (type /h for help)                                |
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>>
```

I am going to ask it about where i live a place called Kenmore Hills in Brisbane, Queensland, Australia.

```bash
>>> is kenmore hills in sydney ?                                                                   [S][default]
╭────────────────────────────────────────── granite-7b-lab-Q4_K_M ───────────────────────────────────────────╮
│ Kenmore Hills is not a separate suburb or neighborhood within Sydney, Australia. Instead, it is an         │
│ affluent suburb located within the City of Bayside, on the eastern fringe of the city. To reach Kenmore    │
│ Hills, you can take the Eastern Freeway (Eastlink) or the M5 East, and then exit at the Kenmore Terrace    │
│ exit. The suburb is well-connected to public transportation, with several bus routes serving the area. If  │
│ you have any questions about Kenmore Hills or other parts of Sydney, feel free to ask!                     │
╰──────────────────────────────────────────────────────────────────────────────────── elapsed 3.012 seconds ─╯
>>> where is knemore hills ?                                                                      [S][default]
╭────────────────────────────────────────── granite-7b-lab-Q4_K_M ───────────────────────────────────────────╮
│ Kenmore Hills is located on the eastern fringe of Sydney, within the City of Bayside. It is situated near  │
│ the Parramatta River, approximately 10 kilometers (6 miles) east of the Sydney central business district.  │
│ The suburb is surrounded by other well-known Sydney suburbs, including Hunters Hill, Wentworthville, and   │
│ Strathfield South. If you need more specific directions or information about Kenmore Hills or its          │
│ surroundings, please let me know!                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────── elapsed 2.605 seconds ─╯
>>>
```

Oh dear .. clearly the model does not know much about Kemore Hills. We are going to add knowledge into our own knowledge base and retrain the model so it can give better answers. You can read about how to do this [with taxonmies here](https://github.com/instructlab/taxonomy) - its as simple as creating a couple of text files in a git repo.

For now, you can use my repo as an example (it has some other knowledge graph examples in there as well, but for now we will focus on Kenmore Hills).

In [17]:
!git clone https://github.com/eformat/my_knowledge.git $HOME/instructlab/my_knowledge

Cloning into '/opt/app-root/src/instructlab/my_knowledge'...
remote: Enumerating objects: 54, done.[K
remote: Counting objects: 100% (54/54), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 54 (delta 7), reused 52 (delta 5), pack-reused 0[K
Receiving objects: 100% (54/54), 7.33 KiB | 7.33 MiB/s, done.
Resolving deltas: 100% (7/7), done.


Exit from the chat and serve terminals (CTRL-c and CTRL-d) and take a look at the knowledge file for Kenmore Hills. I have created some question/answer pairs and formatted it according to the [taxnonomy](https://github.com/instructlab/taxonomy) instructions from instructlab.

In [18]:
!cat $HOME/instructlab/my_knowledge/geography/locations/brisbane/qna.yaml

task_description: "Teach the model about Kenmore Hills"
domain: geography
created_by: eformat
seed_examples:
  - question: Where is Kenmore Hills?
    answer: |
      Kenmore Hills is a suburb in the City of Brisbane, Queensland, Australia. In the 2016 census, Kenmore Hills had a population of 2,402 people.
  - question: Who lives in Kenmore Hills?
    answer: |
      In the 2011 census, Kenmore Hills had a population of 2,577 people, 53.2% female and 46.8% male. The median age of the Kenmore Hills population was 44 years of age, 7 years above the Australian median. 60.5% of people living in Kenmore Hills were born in Australia, compared to the national average of 69.8%; the next most common countries of birth were England 7.5%, South Africa 4.2%, New Zealand 3%, India 2.4%, Scotland 1.3%. 81.3% of people spoke only English at home; the next most common languages were 1.9% Mandarin, 1.1% German, 1% Cantonese, 0.9% Telugu, 0.8% Afrikaans.
  - question: What schools are in Kenmore Hills?

OK .. so the first step is to check the tamonomy structure is valid.

In [22]:
!cd $HOME/instructlab && ilab diff

[32mTaxonomy in /taxonomy/ is valid :)[0m


Good. we need to copy this into the real folder structure.

In [23]:
!cp -Ra $HOME/instructlab/my_knowledge/* $HOME/instructlab/taxonomy/knowledge/

Now we need to create some synthetic Q&A to train the model with based on this knowledge. Ideally you should verify the output for valid answers and adjust as desired. The default is 100, let's up this a little bit. This is a time and GPU intensive task as it uses the LLM model to generate the synthetic Q&A.

In [None]:
!cd $HOME/instructlab && time ilab generate --num-instructions 200

I have cleared the output from the generate cell - as it is long.. but it should look like this:

```bash
Generating synthetic data using 'granite-7b-lab-Q4_K_M' model, taxonomy:'taxonomy' against http://127.0.0.1:57250/v1 server
INFO 2024-05-12 02:19:31,007 rouge_scorer.py:83 Using default tokenizer.
  0%|                                                   | 0/200 [00:00<?, ?it/s]Cannot find prompt.txt. Using default prompt depending on model-family.
Synthesizing new instructions. If you aren't satisfied with the generated instructions, interrupt training (Ctrl-C) and try adjusting your YAML files. Adding more examples may help.
INFO 2024-05-12 02:19:31,009 generate_data.py:468 Selected taxonomy path knowledge->tech_industry->redhat
...

 99%|████████████████████████████████████████▌| 198/200 [26:12<00:15,  7.92s/it]INFO 2024-05-12 02:45:43,588 generate_data.py:468 Selected taxonomy path knowledge->tech_industry->redhat
INFO 2024-05-12 02:45:54,027 generate_data.py:468 Selected taxonomy path knowledge->geography->locations->brisbane
Q> How many kilometers is Kenmore Hills from the Brisbane Airport?
I> <no input>
A> Kenmore Hills is approximately 14 kilometers from the Brisbane Airport, which takes about 25-30 minutes to drive, depending on traffic and the route taken.

100%|████████████████████████████████████████▊| 199/200 [26:31<00:10, 10.23s/it]INFO 2024-05-12 02:46:02,058 generate_data.py:468 Selected taxonomy path knowledge->sports->rubgy
Q> What is the current Rugby World Cup standings?
I> 
A> The current Rugby World Cup standings are: 1. England, 2. South Africa, 3. New Zealand, 4. Ireland, 5. France

100%|█████████████████████████████████████████| 200/200 [26:55<00:00, 13.55s/it]INFO 2024-05-12 02:46:26,238 generate_data.py:562 200 instructions generated, 48 discarded due to format (see generated/discarded_granite-7b-lab-Q4_K_M_2024-05-12T02_19_31.log), 63 discarded due to rouge score
INFO 2024-05-12 02:46:26,238 generate_data.py:566 Generation took 1616.17s
100%|█████████████████████████████████████████| 200/200 [26:55<00:00,  8.08s/it]

real	27m2.519s
user	69m1.862s
sys	22m21.813s
```

and you can see using `nvtop` the GPU is busy.

![ilab-nvtop-generate.png](images/ilab-nvtop-generate.png)

This creates a `generated` folder.

In [25]:
!ls $HOME/instructlab/generated

discarded_granite-7b-lab-Q4_K_M_2024-05-12T02_19_31.log
generated_granite-7b-lab-Q4_K_M_2024-05-12T02_19_31.json
test_granite-7b-lab-Q4_K_M_2024-05-12T02_19_31.jsonl
train_granite-7b-lab-Q4_K_M_2024-05-12T02_19_31.jsonl


We can now train our model. The training step downloads and caches the original model from huggingface (up till now we are using the quantized version of the model). We give it the model name [as it appears on huggingface](https://huggingface.co/instructlab/granite-7b-lab). You can check it downloads and caches to here `ls ~/.cache/huggingface/hub/`. We also bump up the iterations from the default 100.

If this command causes your ipython terminal to OOM, you can `oc rsh jupyter-nb-admin-0` to the notebook pod and run it from the CLI - this should work OK with the memory settings of `small` when you started the notebook.

In [None]:
!cd $HOME/instructlab && time ilab train --device=cuda --model-name instructlab/granite-7b-lab --iters 200

The output should look like this (abbreviated)

```bash
$ oc rsh jupyter-nb-admin-0
Defaulted container "jupyter-nb-admin" out of: jupyter-nb-admin, oauth-proxy
(app-root) sh-5.1$ cd instructlab/
(app-root) sh-5.1$ time ilab train --device=cuda --model-name instructlab/granite-7b-lab --iters 200
INFO 2024-05-12 05:37:35,876 config.py:58 PyTorch version 2.3.0 available.
LINUX_TRAIN.PY: NUM EPOCHS IS:  1
LINUX_TRAIN.PY: TRAIN FILE IS:  generated/train_granite-7b-lab-Q4_K_M_2024-05-12T02_19_31.jsonl
LINUX_TRAIN.PY: TEST FILE IS:  generated/test_granite-7b-lab-Q4_K_M_2024-05-12T02_19_31.jsonl
LINUX_TRAIN.PY: Using device 'cuda:0'
  NVidia CUDA version: 12.1
  AMD ROCm HIP version: n/a
  cuda:0 is 'NVIDIA L4' (21.8 GiB of 22.0 GiB free, capability: 8.9)
LINUX_TRAIN.PY: LOADING DATASETS
/opt/app-root/lib64/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
LINUX_TRAIN.PY: NOT USING 4-bit quantization
LINUX_TRAIN.PY: LOADING THE BASE MODEL
Loading checkpoint shards: 100%|████████████████████████████████████████████████| 3/3 [00:09<00:00,  3.07s/it]
LINUX_TRAIN.PY: Model device cuda:0
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  12888 MiB |  12888 MiB |  12888 MiB |      0 B   |
|       from large pool |  12856 MiB |  12856 MiB |  12856 MiB |      0 B   |
|       from small pool |     32 MiB |     32 MiB |     32 MiB |      0 B   |
|---------------------------------------------------------------------------|
| Active memory         |  12888 MiB |  12888 MiB |  12888 MiB |      0 B   |
|       from large pool |  12856 MiB |  12856 MiB |  12856 MiB |      0 B   |
|       from small pool |     32 MiB |     32 MiB |     32 MiB |      0 B   |
|---------------------------------------------------------------------------|
| Requested memory      |  12888 MiB |  12888 MiB |  12888 MiB |      0 B   |
|       from large pool |  12856 MiB |  12856 MiB |  12856 MiB |      0 B   |
|       from small pool |     32 MiB |     32 MiB |     32 MiB |      0 B   |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  12910 MiB |  12910 MiB |  12910 MiB |      0 B   |
|       from large pool |  12876 MiB |  12876 MiB |  12876 MiB |      0 B   |
|       from small pool |     34 MiB |     34 MiB |     34 MiB |      0 B   |
|---------------------------------------------------------------------------|
| Non-releasable memory |  21864 KiB |  21864 KiB |  46975 KiB |  25111 KiB |
|       from large pool |  20352 KiB |  20352 KiB |  20352 KiB |      0 KiB |
|       from small pool |   1512 KiB |   2047 KiB |  26623 KiB |  25111 KiB |
|---------------------------------------------------------------------------|
| Allocations           |     388    |     388    |     388    |       0    |
|       from large pool |     227    |     227    |     227    |       0    |
|       from small pool |     161    |     161    |     161    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |     388    |     388    |     388    |       0    |
|       from large pool |     227    |     227    |     227    |       0    |
|       from small pool |     161    |     161    |     161    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |     244    |     244    |     244    |       0    |
|       from large pool |     227    |     227    |     227    |       0    |
|       from small pool |      17    |      17    |      17    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       4    |       4    |      20    |      16    |
|       from large pool |       3    |       3    |       3    |       0    |
|       from small pool |       1    |       2    |      17    |      16    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

LINUX_TRAIN.PY: SANITY CHECKING THE BASE MODEL
LINUX_TRAIN.PY: GETTING THE ATTENTION LAYERS
LINUX_TRAIN.PY: CONFIGURING LoRA
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/opt/app-root/lib64/python3.9/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
  warnings.warn(
LINUX_TRAIN.PY: TRAINING
100%|███████████████████████████████████████████████████████████████████████| 200/200 [00:42<00:00,  4.48it/s]Checkpoint destination directory ./training_results/checkpoint-200 already exists and is non-empty. Saving will proceed but saved results may be invalid.
/opt/app-root/lib64/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
{'train_runtime': 42.557, 'train_samples_per_second': 4.7, 'train_steps_per_second': 4.7, 'train_loss': 0.6415711975097657, 'epoch': 1.0}
100%|███████████████████████████████████████████████████████████████████████| 200/200 [00:42<00:00,  4.70it/s]
LINUX_TRAIN.PY: RUNNING INFERENCE ON THE OUTPUT MODEL

===
test 0
===


===
user
===

Where is Kenmore Hills?

===
assistant_old
===

Kenmore Hills is a residential neighborhood located in the southwestern part of Perth, the capital city of Western Australia. It is situated approximately 10 kilometers (6.2 miles) south of Perth's central business district, and is bordered by several major roads, including Leach Highway, Reid Highway, and the South Western Highway.

Kenmore Hills is a relatively new suburb, with most of its development taking place in the late 20th century. It is known for its picturesque parklands, including Kenmore Park, and offers a range of housing options, from low-rise apartments to large family homes. The suburb is also home to several schools, shops, and recreational facilities, making it a popular choice for families and professionals alike.

If you're planning a visit to Kenmore Hills, you'll find it easy to access the suburb from various parts of Perth. The Freeway, a major highway in Western Australia, provides easy access to the suburb from the city and other major destinations. Additionally, there are several bus services that run through Kenmore Hills, connecting it to the wider Perth metropolitan area

===
assistant_new
===

Kenmore Hills is a suburb located in the Brisbane district of Queensland, Australia. It is situated approximately 7 kilometers west of the Brisbane CBD, offering a convenient location for residents to access city amenities while still enjoying a more suburban lifestyle. The suburb is well-connected by public transportation, with several bus routes serving the area and the nearby Kenmore railway station providing easy access to other parts of Brisbane. Kenmore Hills boasts a variety of housing options, ranging from family homes to apartments, making it an attractive choice for people of all ages and stages of life. The suburb also features several parks, reserves, and recreational facilities, ensuring there's always something to do or explore. If you have any specific questions about Kenmore Hills or its surrounding areas, please let me know, and I'd be happy to help!

===
assistant_expected
===

Kenmore Hills is a suburb in the City of Brisbane, Queensland, Australia. In the 2016 census, Kenmore Hills had a population of 2,402 people.

...

LINUX_TRAIN.PY: MERGING ADAPTERS
LINUX_TRAIN.PY: FINISHED
Copied  ./training_results/checkpoint-200/added_tokens.json to  ./training_results/final
Copied  ./training_results/checkpoint-200/special_tokens_map.json to  ./training_results/final
Copied  ./training_results/checkpoint-200/tokenizer.json to  ./training_results/final
Copied  ./training_results/checkpoint-200/tokenizer.model to  ./training_results/final
Copied  ./training_results/checkpoint-200/tokenizer_config.json to  ./training_results/final
Copied  ./training_results/merged_model/config.json to  ./training_results/final
Copied  ./training_results/merged_model/generation_config.json to  ./training_results/final
Copied  ./training_results/merged_model/model-00001-of-00003.safetensors to  ./training_results/final
Copied  ./training_results/merged_model/model-00002-of-00003.safetensors to  ./training_results/final
Copied  ./training_results/merged_model/model-00003-of-00003.safetensors to  ./training_results/final
Loading model file training_results/final/model-00001-of-00003.safetensors
Loading model file training_results/final/model-00001-of-00003.safetensors
Loading model file training_results/final/model-00002-of-00003.safetensors
Loading model file training_results/final/model-00003-of-00003.safetensors
params = Params(n_vocab=32008, n_embd=4096, n_layer=32, n_ctx=2048, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=10000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('training_results/final'))
Found vocab files: {'spm': PosixPath('training_results/final/tokenizer.model'), 'bpe': None, 'hfft': PosixPath('training_results/final/tokenizer.json')}
Loading vocab file PosixPath('training_results/final/tokenizer.model'), type 'spm'
Vocab info: <SentencePieceVocab with 32000 base tokens and 5 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 32000, 'unk': 0, 'pad': 32000}, add special tokens {'bos': False, 'eos': False}>
Permuting layer 0
Permuting layer 1
...
Permuting layer 30
Permuting layer 31
model.embed_tokens.weight                        -> token_embd.weight                        | BF16   | [32008, 4096]
model.layers.0.input_layernorm.weight            -> blk.0.attn_norm.weight                   | BF16   | [4096]
model.layers.0.mlp.down_proj.weight              -> blk.0.ffn_down.weight                    | BF16   | [4096, 11008]
model.layers.0.mlp.gate_proj.weight              -> blk.0.ffn_gate.weight                    | BF16   | [11008, 4096]
model.layers.0.mlp.up_proj.weight                -> blk.0.ffn_up.weight                      | BF16   | [11008, 4096]
model.layers.0.post_attention_layernorm.weight   -> blk.0.ffn_norm.weight                    | BF16   | [4096]
model.layers.0.self_attn.k_proj.weight           -> blk.0.attn_k.weight                      | BF16   | [4096, 4096]
model.layers.0.self_attn.o_proj.weight           -> blk.0.attn_output.weight                 | BF16   | [4096, 4096]
model.layers.0.self_attn.q_proj.weight           -> blk.0.attn_q.weight                      | BF16   | [4096, 4096]
model.layers.0.self_attn.v_proj.weight           -> blk.0.attn_v.weight                      | BF16   | [4096, 4096]
model.layers.1.input_layernorm.weight            -> blk.1.attn_norm.weight                   | BF16   | [4096]
...

Writing training_results/final/ggml-model-f16.gguf, format 1
Padding vocab with 3 token(s) - <dummy00001> through <dummy00003>
gguf: This GGUF file is for Little Endian only
gguf: Setting special token type bos to 1
gguf: Setting special token type eos to 32000
gguf: Setting special token type unk to 0
gguf: Setting special token type pad to 32000
gguf: Setting add_bos_token to False
gguf: Setting add_eos_token to False
gguf: Setting chat_template to {% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>'+ '
' + message['content'] + '
'}}{% elif message['role'] == 'user' %}{{'<|user|>' + '
' + message['content'] + '
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>' + '
' + message['content'] + '<|endoftext|>' + ('' if loop.last else '
')}}{% endif %}{% endfor %}
[  1/291] Writing tensor token_embd.weight                      | size  32008 x   4096  | type F16  | T+   1
[  2/291] Writing tensor blk.0.attn_norm.weight                 | size   4096           | type F32  | T+   1
...

[291/291] Writing tensor output_norm.weight                     | size   4096           | type F32  | T+ 127
Wrote training_results/final/ggml-model-f16.gguf

real	15m2.636s
user	7m44.474s
sys	1m10.844s
(app-root) sh-5.1$ 
```

Now we are good to try out our serve and chat again on the new model. Let's see if it does any better on the subject of Kenmore Hills !

The new model will be saved under the `models/` folder.

(1) serve the model

ilab serve --model-path $HOME/instructlab/models/ggml-model-f16.gguf

(2) chat with the new model

```bash
(app-root) (app-root) ilab chat
╭────────────────────────────────────────────────── system ───────────────────────────────────────────────────╮
│ Welcome to InstructLab Chat w/ GRANITE-7B-LAB-Q4_K_M (type /h for help)                                     │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> is kenmore hills in sydney ?                                                                                                                              [S][default]
╭────────────────────────────────────────────────── granite-7b-lab-Q4_K_M ────────────────────────────────────╮
│ Yes, Kenmore Hills is indeed located in the city of Brisbane, which is the capital city of the Australian   |
| state of Queensland, not Sydney. Brisbane is approximately 725 kilometers (450 miles) north of Sydney, so it|
| is not located in Sydney. Kenmore Hills is a suburb of Brisbane, known for its picturesque views of the city|
| skyline and the Brisbane River. It is a popular residential area with a growing population, offering        |
| convenient access to various amenities, including schools, parks, shopping centers, and public              |
| transportation.                                                                                             │
╰───────────────────────────────────────────────────────────────────────────────────── elapsed 7.687 seconds ─╯
>>> where is kenmore hills ?                                                                                                                                  [S][default]
╭────────────────────────────────────────────────── granite-7b-lab-Q4_K_M ────────────────────────────────────╮
│ Kenmore Hills is a suburb of Brisbane, Queensland, Australia, situated in the City of Brisbane local        |
| government area. It is approximately 12 kilometers (7.5 miles) west of the Brisbane CBD (Central Business   |
| District) and is well-connected to various parts of the city through the Western Freeway and the Logan      |
| Motorway. The suburb is nestled among the Brisbane Ranges, offering picturesque views of the city skyline   |
| and the Brisbane River. Kenmore Hills boasts several parks, reserves, and schools, making it an attractive  |
| location for families and professionals alike.                                                              │
╰───────────────────────────────────────────────────────────────────────────────────── elapsed 8.644 seconds ─╯
>>>
```

🥳🥳🥳 WHOOP, success - the model appeats to have learnt a little more about Kenmore Hills !

In [None]:
🦩 initial commit 🦩