# Finetune Your Chatbot on a Single Node Xeon SPR 

NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook will introduce how to finetune your chatbot on the customized data on a single node Xeon SPR.

## Prepare Environment

Install intel extension for transformers:

In [1]:
!pip install intel-extension-for-transformers

Defaulting to user installation because normal site-packages is not writeable


Install Requirements:

In [3]:
!git clone https://github.com/intel/intel-extension-for-transformers.git

fatal: destination path 'intel-extension-for-transformers' already exists and is not an empty directory.


In [2]:
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements.txt
%cd ../../../

/home/u41a5ebdbca214faa0191cc237aed1b9/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/docs/notebooks/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


Defaulting to user installation because normal site-packages is not writeable
/home/u41a5ebdbca214faa0191cc237aed1b9/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/docs/notebooks


In [3]:
!huggingface-cli login

/bin/bash: line 1: huggingface-cli: command not found


In [5]:
!pip install -q diffusers transformers accelerate peft

In [6]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [1]:
import torch
print(torch.cuda.is_available())

False


## Prepare the Dataset
We select 3 kind of datasets to conduct the finetuning process for different tasks.

1. Text Generation (General domain instruction): We use the [Alpaca dataset](https://github.com/tatsu-lab/stanford_alpaca) from Stanford University as the general domain dataset to fine-tune the model. This dataset is provided in the form of a JSON file, [alpaca_data.json](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). In Alpaca, researchers have manually crafted 175 seed tasks to guide `text-davinci-003` in generating 52K instruction data for diverse tasks.

2. Summarization: An English-language dataset [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail, is used for this task.

3. Code Generation: To enhance code performance of LLMs (Large Language Models), we use the [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1).



## Finetune Your Chatbot

We employ the [LoRA approach](https://arxiv.org/pdf/2106.09685.pdf) to finetune the LLM efficiently.

Finetune the model on Alpaca-format dataset to conduct text generation:

In [1]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model

model_args = ModelArguments(model_name_or_path="meta-llama/Llama-2-7b-chat-hf")
data_args = DataArguments(train_file="alpacadata.json", validation_split_percentage=1)
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=1,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True,
)

finetune_args = FinetuningArguments()
finetune_cfg = TextGenerationFinetuningConfig(
    model_args=model_args,
    data_args=data_args,
    training_args=training_args,
    finetune_args=finetune_args,
)

finetune_model(finetune_cfg)


2024-07-05 22:32:36.490624: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-05 22:32:37.127035: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-05 22:32:37.289934: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-05 22:32:37.290020: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-05 22:32:37.453875: I tensorflow/core/platform/cpu_feature_gua

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[INFO|modeling_utils.py:4280] 2024-07-05 22:33:26,204 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4288] 2024-07-05 22:33:26,206 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-chat-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:917] 2024-07-05 22:33:26,313 >> loading configuration file generation_config.json from cache at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/generation_config.json
[INFO|configuration_utils.py:962] 2024-07-05 22:33:26,314 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9

trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199


[INFO|trainer.py:641] 2024-07-05 22:33:26,938 >> Using cpu_amp half precision backend
[INFO|trainer.py:2078] 2024-07-05 22:33:27,186 >> ***** Running training *****
[INFO|trainer.py:2079] 2024-07-05 22:33:27,187 >>   Num examples = 2,940
[INFO|trainer.py:2080] 2024-07-05 22:33:27,188 >>   Num Epochs = 3
[INFO|trainer.py:2081] 2024-07-05 22:33:27,190 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:2084] 2024-07-05 22:33:27,191 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2085] 2024-07-05 22:33:27,192 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2086] 2024-07-05 22:33:27,193 >>   Total optimization steps = 1,104
[INFO|trainer.py:2087] 2024-07-05 22:33:27,196 >>   Number of trainable parameters = 4,194,304


Step,Training Loss
500,1.1725
1000,1.1175


[INFO|trainer.py:2329] 2024-07-06 02:08:26,885 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3410] 2024-07-06 02:08:26,894 >> Saving model checkpoint to ./tmp
[INFO|tokenization_utils_base.py:2513] 2024-07-06 02:08:26,921 >> tokenizer config file saved in ./tmp/tokenizer_config.json
[INFO|tokenization_utils_base.py:2522] 2024-07-06 02:08:26,925 >> Special tokens file saved in ./tmp/special_tokens_map.json
2024-07-06 02:08:26,948 - finetuning.py - intel_extension_for_transformers.transformers.llm.finetuning.finetuning - INFO - *** Evaluate After Training***
[INFO|trainer.py:3719] 2024-07-06 02:08:26,955 >> ***** Running Evaluation *****
[INFO|trainer.py:3721] 2024-07-06 02:08:26,956 >>   Num examples = 30
[INFO|trainer.py:3724] 2024-07-06 02:08:26,957 >>   Batch size = 8


***** eval metrics *****
  epoch                   =        3.0
  eval_loss               =     1.1063
  eval_ppl                =     3.0231
  eval_runtime            = 0:00:13.99
  eval_samples            =         30
  eval_samples_per_second =      2.144
  eval_steps_per_second   =      0.286


Finetune the model on the summarization task:

In [1]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="meta-llama/Llama-2-7b-chat-hf")
data_args = DataArguments(train_file="cnndailymail.csv", validation_split_percentage=1)
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=1,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True
)
finetune_args = FinetuningArguments(task='summarization')
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

2024-07-08 13:19:06.916606: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-08 13:19:08.323295: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-08 13:19:08.640094: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-08 13:19:08.640254: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-08 13:19:08.932555: I tensorflow/core/platform/cpu_feature_gua

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[INFO|modeling_utils.py:4280] 2024-07-08 13:20:40,188 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4288] 2024-07-08 13:20:40,190 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-chat-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:917] 2024-07-08 13:20:40,292 >> loading configuration file generation_config.json from cache at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/generation_config.json
[INFO|configuration_utils.py:962] 2024-07-08 13:20:40,294 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9

trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199


[INFO|trainer.py:804] 2024-07-08 13:20:41,298 >> The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: decoder_labels, decoder_input_ids, decoder_attention_mask. If decoder_labels, decoder_input_ids, decoder_attention_mask are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
[INFO|trainer.py:2078] 2024-07-08 13:20:41,337 >> ***** Running training *****
[INFO|trainer.py:2079] 2024-07-08 13:20:41,338 >>   Num examples = 545
[INFO|trainer.py:2080] 2024-07-08 13:20:41,338 >>   Num Epochs = 3
[INFO|trainer.py:2081] 2024-07-08 13:20:41,339 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:2084] 2024-07-08 13:20:41,339 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2085] 2024-07-08 13:20:41,339 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2086] 2024-07-08 13:20:41,340 >>   Total optimization steps = 207
[INFO|trai

Step,Training Loss


[INFO|trainer.py:2329] 2024-07-08 16:30:13,397 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3410] 2024-07-08 16:30:13,403 >> Saving model checkpoint to ./tmp
[INFO|tokenization_utils_base.py:2513] 2024-07-08 16:30:13,460 >> tokenizer config file saved in ./tmp/tokenizer_config.json
[INFO|tokenization_utils_base.py:2522] 2024-07-08 16:30:13,477 >> Special tokens file saved in ./tmp/special_tokens_map.json
2024-07-08 16:30:13,505 - finetuning.py - intel_extension_for_transformers.transformers.llm.finetuning.finetuning - INFO - *** Evaluate After Training***
[INFO|trainer.py:804] 2024-07-08 16:30:13,507 >> The following columns in the evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: decoder_labels, decoder_input_ids, decoder_attention_mask. If decoder_labels, decoder_input_ids, decoder_attention_mask are not expected by `PeftModelForCausalLM.forward`,  you can safely ign

***** eval metrics *****
  epoch                   =        3.0
  eval_loss               =      1.113
  eval_ppl                =     3.0435
  eval_runtime            = 0:00:12.09
  eval_samples            =          6
  eval_samples_per_second =      0.496
  eval_steps_per_second   =      0.083


Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

2024-07-08 16:33:12,750 - chatbot.py - intel_extension_for_transformers.neural_chat.chatbot - ERROR - Exception: 
**********************************************************************
  Resource [93mpunkt[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('punkt')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtokenizers/punkt/PY3/english.pickle[0m

  Searched in:
    - '/home/u41a5ebdbca214faa0191cc237aed1b9/nltk_data'
    - '/home/u41a5ebdbca214faa0191cc237aed1b9/.conda/envs/itrex-l/nltk_data'
    - '/home/u41a5ebdbca214faa0191cc237aed1b9/.conda/envs/itrex-l/share/nltk_data'
    - '/home/u41a5ebdbca214faa0191cc237aed1b9/.conda/envs/itrex-l/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

2024-07-08 16:33:

Finetune the model on the code generation task:

In [8]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="meta-llama/Llama-2-7b-chat-hf")
data_args = DataArguments(train_file="evolcodealpaca.csv",validation_split_percentage=1)
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=1,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True
)
finetune_args = FinetuningArguments(task='code-generation')
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

[INFO|training_args.py:2052] 2024-07-07 23:30:43,899 >> PyTorch: setting up devices
[INFO|training_args.py:1727] 2024-07-07 23:30:43,901 >> The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args.py:2052] 2024-07-07 23:30:43,913 >> PyTorch: setting up devices
distributed training: True, 16-bits training: True
2024-07-07 23:30:43,916 - finetuning.py - intel_extension_for_transformers.transformers.llm.finetuning.finetuning - INFO - Training/evaluation parameters TrainingArguments(
_n_gpu=0,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_siz

Generating train split: 0 examples [00:00, ? examples/s]

Unable to verify splits sizes.
2024-07-07 23:30:44,622 - info_utils.py - datasets.utils.info_utils - INFO - Unable to verify splits sizes.
Dataset csv downloaded and prepared to /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/datasets/csv/default-c98437dc1a94c717/0.0.0/9ea1179385ff7ad1e756d327ffccaa3b801175702a2d91528226ba2c66873f52. Subsequent calls will reuse this data.
2024-07-07 23:30:44,627 - builder.py - datasets.builder - INFO - Dataset csv downloaded and prepared to /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/datasets/csv/default-c98437dc1a94c717/0.0.0/9ea1179385ff7ad1e756d327ffccaa3b801175702a2d91528226ba2c66873f52. Subsequent calls will reuse this data.
Using custom data configuration default-c98437dc1a94c717
2024-07-07 23:30:44,878 - builder.py - datasets.builder - INFO - Using custom data configuration default-c98437dc1a94c717
Loading Dataset Infos from /home/u41a5ebdbca214faa0191cc237aed1b9/.conda/envs/itrex-l/lib/python3.10/site-packages/datasets/p

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[INFO|modeling_utils.py:4280] 2024-07-07 23:30:49,133 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4288] 2024-07-07 23:30:49,134 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-chat-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:917] 2024-07-07 23:30:49,240 >> loading configuration file generation_config.json from cache at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/generation_config.json
[INFO|configuration_utils.py:962] 2024-07-07 23:30:49,242 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9

Map:   0%|          | 0/394 [00:00<?, ? examples/s]

Caching processed dataset at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/datasets/csv/default-c98437dc1a94c717/0.0.0/9ea1179385ff7ad1e756d327ffccaa3b801175702a2d91528226ba2c66873f52/cache-7ead664883c9cfe2.arrow
2024-07-07 23:30:52,607 - arrow_dataset.py - datasets.arrow_dataset - INFO - Caching processed dataset at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/datasets/csv/default-c98437dc1a94c717/0.0.0/9ea1179385ff7ad1e756d327ffccaa3b801175702a2d91528226ba2c66873f52/cache-7ead664883c9cfe2.arrow


Map:   0%|          | 0/4 [00:00<?, ? examples/s]

Caching processed dataset at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/datasets/csv/default-c98437dc1a94c717/0.0.0/9ea1179385ff7ad1e756d327ffccaa3b801175702a2d91528226ba2c66873f52/cache-7b8dda294857f4df.arrow
2024-07-07 23:30:52,817 - arrow_dataset.py - datasets.arrow_dataset - INFO - Caching processed dataset at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/datasets/csv/default-c98437dc1a94c717/0.0.0/9ea1179385ff7ad1e756d327ffccaa3b801175702a2d91528226ba2c66873f52/cache-7b8dda294857f4df.arrow
2024-07-07 23:30:52,835 - finetuning.py - intel_extension_for_transformers.transformers.llm.finetuning.finetuning - INFO - Using data collator of type DataCollatorForSeq2Seq
[INFO|trainer.py:641] 2024-07-07 23:30:53,007 >> Using cpu_amp half precision backend


trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199


[INFO|trainer.py:2078] 2024-07-07 23:30:53,544 >> ***** Running training *****
[INFO|trainer.py:2079] 2024-07-07 23:30:53,544 >>   Num examples = 394
[INFO|trainer.py:2080] 2024-07-07 23:30:53,545 >>   Num Epochs = 3
[INFO|trainer.py:2081] 2024-07-07 23:30:53,545 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:2084] 2024-07-07 23:30:53,546 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2085] 2024-07-07 23:30:53,546 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2086] 2024-07-07 23:30:53,546 >>   Total optimization steps = 150
[INFO|trainer.py:2087] 2024-07-07 23:30:53,550 >>   Number of trainable parameters = 4,194,304


Step,Training Loss


[INFO|trainer.py:2329] 2024-07-08 01:32:14,301 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3410] 2024-07-08 01:32:14,309 >> Saving model checkpoint to ./tmp
[INFO|tokenization_utils_base.py:2513] 2024-07-08 01:32:14,380 >> tokenizer config file saved in ./tmp/tokenizer_config.json
[INFO|tokenization_utils_base.py:2522] 2024-07-08 01:32:14,394 >> Special tokens file saved in ./tmp/special_tokens_map.json
2024-07-08 01:32:14,421 - finetuning.py - intel_extension_for_transformers.transformers.llm.finetuning.finetuning - INFO - *** Evaluate After Training***
[INFO|trainer.py:3719] 2024-07-08 01:32:14,434 >> ***** Running Evaluation *****
[INFO|trainer.py:3721] 2024-07-08 01:32:14,435 >>   Num examples = 4
[INFO|trainer.py:3724] 2024-07-08 01:32:14,436 >>   Batch size = 8


***** eval metrics *****
  epoch                   =        3.0
  eval_loss               =     1.0583
  eval_ppl                =     2.8814
  eval_runtime            = 0:00:11.66
  eval_samples            =          4
  eval_samples_per_second =      0.343
  eval_steps_per_second   =      0.086


In [4]:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def interact_with_chatbot(query):
    inputs = tokenizer.encode(query, return_tensors="pt")
    response = model.generate(inputs, max_length=500, num_return_sequences=1, no_repeat_ngram_size=2)
    return tokenizer.decode(response[0], skip_special_tokens=True)

query = "The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. "
response = interact_with_chatbot(query)
print("Chatbot response:", response)


[INFO|configuration_utils.py:733] 2024-07-08 16:39:46,432 >> loading configuration file config.json from cache at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/config.json
[INFO|configuration_utils.py:796] 2024-07-08 16:39:46,435 >> Model config LlamaConfig {
  "_name_or_path": "meta-llama/Llama-2-7b-chat-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "tran

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[INFO|modeling_utils.py:4280] 2024-07-08 16:40:00,769 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4288] 2024-07-08 16:40:00,771 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-chat-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:917] 2024-07-08 16:40:00,876 >> loading configuration file generation_config.json from cache at /home/u41a5ebdbca214faa0191cc237aed1b9/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/generation_config.json
[INFO|configuration_utils.py:962] 2024-07-08 16:40:00,878 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9

Chatbot response: The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists.  (10 points)

Here is the current script:
```
def print_list(list):
  for element in list:  # Bug is in this line
    print(element) 
  
list = [1, 2,  3,4,5]
print_List( list )
 ```
Please modify it to fix the bug and use a 'whiles' loops instead.
Thank you!
(15 points total)
