<a href="https://www.kaggle.com/code/schock/training-tinyllama-for-tool-calling?scriptVersionId=199135162" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Training TinyLlama for Tool-calling

In [1]:
# Function to safely install a package
def safe_install(package):
    !pip install --no-deps -q {package}
    !pip install -q {package}

# List of packages to install
packages = [
    "accelerate==0.34.2",
    "bitsandbytes==0.44.1",
    "datasets==2.16.0", # https://github.com/huggingface/datasets/issues/6753
    "evaluate==0.4.3",
    "fsspec==2023.10.0", # https://github.com/huggingface/datasets/issues/6753
    "gcsfs==2023.10.0", # https://github.com/huggingface/datasets/issues/6753
    # "ipykernel==6.29.5",
    "ipywidgets==8.1.5",
    "jupyter==1.0.0",
    "mlflow==2.16.2",
    "openai==1.50.2",
    "peft==0.13.0",
    "scipy==1.14.1",
    "torch==2.4.1",
    "transformers==4.45.1",
    "wheel==0.44.0",
    "dill==0.3.7",
    "packaging==23.1",
    "pyarrow==14.0.1"
]

# Install packages
for package in packages:
    safe_install(package)

# Verify installations
#!pip freeze

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.8.3 requires cubinlinker, which is not installed.
cudf 24.8.3 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.8.3 requires ptxcompiler, which is not installed.
cuml 24.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 24.8.3 requires cupy-cuda11x>=12.0.0, which is not installed.
apache-beam 2.46.0 requires cloudpickle~=2.2.1, but you have cloudpickle 3.0.0 which is incompatible.
apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.7 which is incompatible.
apache-beam 2.46.0 requires numpy<1.25.0,>=1.14.3, but you have numpy 1.26.4 which is incompatible.
apache-beam 2.46.0 requires pyarrow<10.0.0,>=3.0.0, but you have pyarrow 16.1.0 which is incompatible.
bigframes 0.22.0 requires google-cloud-bigquery[bqstorage,pandas]>=3.10.0, but you hav

In [2]:
import gc
import os
import pandas as pd
import warnings

import torch
from datasets import Dataset, DatasetDict, load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

In [3]:
HF_TOKEN = os.environ.get("HF_TOKEN", None)
SEED = 42

In [4]:
model_name_input = "TinyLlama-1.1B-Chat-v1.0"
model_name_output = f"{model_name_input}-ft"
model_org_input = "TinyLlama"
model_org_output = "mjschock"
pretrained_model_name_or_path = f"{model_org_input}/{model_name_input}"

In [5]:
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
# os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:128'
# os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True,max_split_size_mb:128'

warnings.filterwarnings('ignore')

def fix_torch_seed(seed=SEED):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

fix_torch_seed()

In [6]:
# https://docs.anthropic.com/en/docs/build-with-claude/tool-use
# https://github.com/abetlen/llama-cpp-python/blob/c032fc65b0873337ed39e5d63e15468a5d797646/llama_cpp/llama_chat_format.py#L3387
# https://github.com/Mozilla-Ocho/llamafile/blob/66a84d8aea2990895fc4f64786406fea64e79197/llama.cpp/server/server.cpp#L480 (need <|im_start|> b/c Mozilla)
# https://github.com/openai/openai-python/blob/120d225b91a8453e15240a49fb1c6794d8119326/chatml.md
# https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html#prompt
# https://huggingface.co/blog/unified-tool-use
chat_template = """{%- set system_message_present = messages | selectattr('role', 'equalto', 'system') | list -%}
{%- if not system_message_present -%}
    {%- set messages = [{ "content": "You are an AI agent acting as a human assistant.", "role": "system" }] + messages -%}
{%- endif -%}

{% for message in messages %}
<|im_start|>{{ message.role }}
{% if message.role == 'system' %}
{{ message.content }}

You are aware of the following tools in your environment:
{
  "tools": [
  {% for tool in tools %}
    {
      "function": {
        "description": "{{ tool.function.description }}",
        "name": "{{ tool.function.name }}",
        "parameters": {{ tool.function.parameters | tojson }}
      },
      "type": "{{ tool.type }}"
    }{% if not loop.last %},{% endif %}

  {% endfor %}
  ]
}

You must respond with a single JSON object in this exact format:
{
  "content": "Optional text goes here and only here."
  "tool_calls": [
    {
      "arguments": {"param1": "value1", "param2": "value2"},
      "id": "call_0",
      "name": "tool_a_name"
    },
    {
      "arguments": {"param1": "value1", "param2": "value2"},
      "id": "call_1",
      "name": "tool_b_name"
    }
  ]
}
<|im_end|>
{% elif message.role == 'user' %}
{{ message.content }}
<|im_end|>
{% elif message.role == 'assistant' %}
  {%- if message.weights | default(1) > 0 -%}
    {% generation %}
    {% if message.content %}{{ message.content }}{% endif %}
    {% if message.tool_calls %}
{
  "tool_calls": [
    {% for tool_call in message.tool_calls %}
    {
      "arguments": {{ tool_call.function.arguments | tojson }},
      "id": "{{ tool_call.id }}",
      "name": "{{ tool_call.function.name }}"
    }{% if not loop.last %},{% endif %}

    {% endfor %}
  ]
}
    {% endif %}
    {% endgeneration %}
  {%- else -%}
    {% if message.content %}{{ message.content }}{% endif %}
    {% if message.tool_calls %}
{
  "tool_calls": [
    {% for tool_call in message.tool_calls %}
    {
      "arguments": {{ tool_call.function.arguments | tojson }},
      "id": "{{ tool_call.id }}",
      "name": "{{ tool_call.function.name }}"
    }{% if not loop.last %},{% endif %}

    {% endfor %}
  ]
}
    {% endif %}
  {%- endif -%}
<|im_end|>
{% elif message.role == 'tool' %}
{
  "content": "{{ message.content }}",
  "name": "{{ message.name }}",
  "tool_call_id": "{{ message.tool_call_id }}",
}
<|im_end|>
{% endif %}
{{ eos_token }}
{% endfor %}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant' }}
{%- endif %}
"""

dspy_inspired_chat_template = """Given the fields `messages`, `tools`, produce the fields `response`.

---

Follow the following format.

<|im_start|>

Messages: ${messages}

Tools: ${tools}

Reasoning: Let's think step by step in order to produce the response.

Response: ${response}

<|im_end|>{{ eos_token }}

---

Here's an example:

<|im_start|>

Messages: [{"content": "You are an AI that controls a device. Given a command from the user, call a function to complete the request. If it cannot be completed, call the reject_request function.", "name": null, "role": "system", "tool_call_id": null, "tool_calls": null}, {"content": "Can you turn off the light?", "name": null, "role": "user", "tool_call_id": null, "tool_calls": null}]

Tools: [{"function": {"name": "set_light", "parameters": {"properties": {"status": {"enum": ["on", "off"], "type": "string"}}, "required": ["status"], "type": "object"}}, "type": "function"}, {"function": {"name": "reject_request", "parameters": {"type": "object"}}, "type": "function"}]

Reasoning: Let's think step by step in order to produce the response. The user asked to turn off the light. The tool 'set_light' allows us to set the light status to either 'on' or 'off'. Since the user wants it off, we'll call 'set_light' with 'status' set to 'off'.

Response: {"content": null, "name": null, "role": "assistant", "tool_call_id": null, "tool_calls": [{"function": {"arguments": "{\"status\": \"off\"}", "name": "set_light"}, "id": "call_id", "type": "function"}]}

<|im_end|>{{ eos_token }}

---

{% if add_generation_prompt %}

<|im_start|>

Messages: [{% for message in messages %}{% if message.role == 'assistant' %}{% generation %}{{ message | tojson }}{% endgeneration %}{% else %}{{ message | tojson }}{% endif %}{% endfor %}]

Tools: [{% for tool in tools %}{{ tool | tojson }}{% endfor %}]

Reasoning: Let's think step by step in order to produce the response. {% endif %}"""

def load_model_and_tokenizer(pretrained_model_name_or_path):
    """
    Load the model and tokenizer with 4-bit quantization.

    Args:
        pretrained_model_name_or_path (str): The model name or path to load.

    Returns:
        tuple: Loaded model and tokenizer.
    """
    # Configure 4-bit quantization
    quantization_config = BitsAndBytesConfig(
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        # bnb_4bit_use_double_quant=False,
        load_in_4bit=True,
    )

    # Load model with 4-bit quantization
    model = AutoModelForCausalLM.from_pretrained(
        pretrained_model_name_or_path,
        # attn_implementation="flash_attention_2",
        device_map="auto",
        quantization_config=quantization_config,
        torch_dtype=torch.bfloat16,
    )

    # model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path, device_map="auto", offload_buffers=True)
    # model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path, device_map="auto", offload_buffers=False)

    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
        pretrained_model_name_or_path,
    )

    # Set pad token if not defined
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token = tokenizer.eos_token

    tokenizer.chat_template = chat_template
    #tokenizer.chat_template = dspy_inspired_chat_template

    return model, tokenizer

In [7]:
def load_and_preprocess_data(dataset, tokenizer):
    """
    Load and preprocess the dataset for training.
    
    Args:
        dataset: The dataset to preprocess
        tokenizer: Tokenizer to use for preprocessing.
    
    Returns:
        datasets.Dataset: Preprocessed dataset.
    """
    def preprocess_function(examples):
        # Extract the messages from the example
        conversation = examples['messages']
        documents = examples.get('documents', [])
        tools = examples.get('tools', [])

        # Apply chat template to generate tokenized input and assistant mask
        tokenized_output = tokenizer.apply_chat_template(
            add_generation_prompt=False,
            conversation=conversation,
            documents=documents,
            max_length=4096,
            # max_length=2048,
            # padding="max_length",
            padding="longest",
            return_assistant_tokens_mask=True,
            return_dict=True,
            return_tensors="pt",
            tokenize=True,
            tools=tools,
            truncation=True,
        )

        # Extract the input IDs and assistant tokens mask
        input_ids = tokenized_output['input_ids'][0]
        assistant_masks = torch.tensor(tokenized_output['assistant_masks'])
        attention_mask = tokenized_output['attention_mask'][0]

        # Use the assistant mask to create labels
        labels = torch.where(assistant_masks == 1, input_ids, torch.tensor(-100))

        return {
            'attention_mask': attention_mask,
            'input_ids': input_ids,
            'labels': labels
        }

    # Preprocess the dataset
    # return dataset.map(preprocess_function, batched=False, remove_columns=dataset.column_names)
    return dataset.map(preprocess_function, batched=False, num_proc=1, remove_columns=dataset["train"].column_names) # TODO: use batched=True and figure out how to pass tools

In [8]:
chat_threads = [
    {
        "documents": [],
        "messages": [
            {"role": "user", "content": "What's the weather like in San Francisco and New York?"},
            {"role": "assistant", "tool_calls": [
                {"id": "call_sf", "type": "function", "function": {"name": "get_current_weather", "arguments": '{"location": "San Francisco, USA", "format": "celsius"}'}},
                {"id": "call_ny", "type": "function", "function": {"name": "get_current_weather", "arguments": '{"location": "New York, USA", "format": "celsius"}'}}
            ]},
            {"role": "tool", "name": "get_current_weather", "tool_call_id": "call_sf", "content": "21.0"},
            {"role": "tool", "name": "get_current_weather", "tool_call_id": "call_ny", "content": "18.5"},
            {"role": "assistant", "content": "The current temperature in San Francisco is 21°C (70°F), while in New York it's 18.5°C (65°F)."}
        ],
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "get_current_weather",
                    "description": "Get the current weather",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and country, eg. San Francisco, USA"
                            },
                            "format": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                        },
                        "required": ["location", "format"]
                    }
                }
            }
        ]
    },
]

drone_training_dataset_df = pd.read_json(
    "https://raw.githubusercontent.com/openai/openai-cookbook/main/examples/data/drone_training.jsonl",
    lines=True,
)[['messages', 'tools']]

chat_threads_df = pd.concat(
    [pd.DataFrame(chat_threads), drone_training_dataset_df], # TODO: https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/data/README.md
    ignore_index=True
)

# TODO: validate schema using the OpenAI library

train_test_ds = Dataset.from_pandas(chat_threads_df).train_test_split(seed=SEED, test_size=0.2)
test_validation_ds = train_test_ds['test'].train_test_split(seed=SEED, test_size=0.5)

chat_threads_ds = DatasetDict({
    'train': train_test_ds['train'],
    'test': test_validation_ds['test'],
    'validation': test_validation_ds['train']
})

# TODO: add references to where the examples in the dataset came from to the Hub README
try:
    chat_threads_ds.push_to_hub(f"{model_org_output}/chat_threads", token=HF_TOKEN)

except Exception as e:
    print(e)

401 Client Error: Unauthorized for url: https://huggingface.co/api/repos/create (Request ID: Root=1-66fcc9f9-6acc339a0acc2dc725a622bf;7d9d9827-1f46-4e9f-9d5c-c94e9e3f8baf)

Invalid username or password.


In [9]:
def get_formatted_chat(chat_thread, tokenizer):
    formatted_chat = tokenizer.apply_chat_template(
        add_generation_prompt=True,
        conversation=chat_thread.get("messages"),
        documents=chat_thread.get("documents"),
        tokenize=False,
        tools=chat_thread.get("tools"),
    )

    return formatted_chat

In [10]:
def generate_formatted_response(formatted_chat, model, tokenizer):
    inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False)
    inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}

    outputs = model.generate(**inputs, max_new_tokens=512)

    decoded_output = tokenizer.decode(outputs[0][inputs["input_ids"].size(1):], skip_special_tokens=True)

    return decoded_output

In [11]:
# Load model and tokenizer
model, tokenizer = load_model_and_tokenizer(pretrained_model_name_or_path)

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

In [12]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 2048)
    (layers): ModuleList(
      (0-21): 22 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=2048, out_features=5632, bias=False)
          (up_proj): Linear4bit(in_features=2048, out_features=5632, bias=False)
          (down_proj): Linear4bit(in_features=5632, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((2048,), eps=1e-05

In [13]:
def prepare_model_for_training(model):
    """
    Prepare the model for k-bit training with LoRA.
    
    Args:
        model: The model to prepare.
    
    Returns:
        peft.PeftModel: The prepared model.
    """
    # Prepare the model for k-bit training
    model = prepare_model_for_kbit_training(model)

    # Configure LoRA
    peft_config = LoraConfig(
        bias="none",
        # init_lora_weights="gaussian",
        lora_alpha=16,
        # lora_alpha=8,
        lora_dropout=0.1,
        # modules_to_save=["lm_head"],
        r=8,
        # target_modules=["q_proj", "v_proj"],
        # target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
        # target_modules=["all_linear"],
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
        # target_modules=["gate_proj", "up_proj", "down_proj"],
        task_type="CAUSAL_LM",
        # use_dora=True, # optional DoRA
    )

    # Apply LoRA to the model
    model = get_peft_model(model, peft_config)

    # last_layer_name = "model.layers.21"
    # last_layer_start_i = None

    # for param_i, (param_name, param) in enumerate(model.named_parameters()):
    #     if last_layer_name in param_name:
    #         last_layer_start_i = param_i

    #     if last_layer_start_i is not None and param_i >= last_layer_start_i:
    #         param.requires_grad = True

    #     else:
    #         param.requires_grad = False

    return model

In [14]:
chat_thread = {
    "documents": [],
    "messages": [
        {"role": "user", "content": "What's the weather like in Oakland and Atlanta?"},
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and country, eg. San Francisco, USA"
                        },
                        "format": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                    },
                    "required": ["location", "format"]
                }
            }
        }
    ]
}

# chat_thread = chat_threads[0]

In [15]:
formatted_chat = get_formatted_chat(chat_thread, tokenizer)
print(formatted_chat)

<|im_start|>system
You are an AI agent acting as a human assistant.

You are aware of the following tools in your environment:
{
  "tools": [
    {
      "function": {
        "description": "Get the current weather",
        "name": "get_current_weather",
        "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and country, eg. San Francisco, USA"}, "format": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["location", "format"]}
      },
      "type": "function"
    }
  ]
}

You must respond with a single JSON object in this exact format:
{
  "content": "Optional text goes here and only here."
  "tool_calls": [
    {
      "arguments": {"param1": "value1", "param2": "value2"},
      "id": "call_0",
      "name": "tool_a_name"
    },
    {
      "arguments": {"param1": "value1", "param2": "value2"},
      "id": "call_1",
      "name": "tool_b_name"
    }
  ]
}
<|im_end|>
</s>
<|im_start|>user
What's the weather

In [16]:
print(generate_formatted_response(formatted_chat, model, tokenizer))

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)



I do not have access to real-time weather data. However, according to the given text, the weather in oakland and atlanta is mentioned as "optional text goes here and only here."


In [17]:
# Load and preprocess data
dataset = load_dataset("mjschock/chat_threads")
tokenized_dataset = load_and_preprocess_data(dataset, tokenizer)

# Prepare model for training
model = prepare_model_for_training(model)

# Print trainable parameters
model.print_trainable_parameters()

# Define training arguments
training_args = TrainingArguments(
    adam_beta2=0.999,
    auto_find_batch_size=True,
    bf16=True,
    dataloader_pin_memory=False,
    eval_on_start=True,
    eval_steps=200,
    # eval_strategy="epoch",
    eval_strategy="steps",
    # fp16=True,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    # learning_rate=1e-4,
    learning_rate=2e-5,
    logging_dir="./logs",
    logging_steps=10,
    # logging_steps=250,
    load_best_model_at_end=True,
    num_train_epochs=10,
    optim="adamw_hf",
#     optim="adamw_torch",
    output_dir="./checkpoints",
    per_device_eval_batch_size=1,
    per_device_train_batch_size=1,
    push_to_hub=False,
    remove_unused_columns=False,
    report_to=["mlflow"],
    run_name=f"{model_org_output}/{model_name_output}",
    save_steps=200,
    save_strategy="steps",
    # save_strategy="no",
    # save_strategy="epoch",
    save_total_limit=1,
    warmup_steps=100,
    # warmup_steps=2,
    weight_decay=0.01,
    # weight_decay=1e-6,
)

# Data collator for language modeling
# mlm (bool, *optional*, defaults to True):
#     Whether or not to use masked language modeling. If set to False, the labels are the same as the inputs with the padding tokens ignored (by setting them to -100). Otherwise, the labels are -100 for non-masked tokens and the value to predict for the masked token.
# mlm_probability (float, *optional*, defaults to 0.15):
#     The probability with which to (randomly) mask tokens in the input, when mlm is set to True.
data_collator = DataCollatorForLanguageModeling(mlm=False, tokenizer=tokenizer)

# Initialize Trainer
trainer = Trainer(
    args=training_args,
    data_collator=data_collator,
    eval_dataset=tokenized_dataset["validation"],
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_dataset["train"],
)

# Fine-tune the model
trainer.train(
    # resume_from_checkpoint=True,
)

# Save the fine-tuned model and tokenizer
model.save_pretrained(f"{model_org_output}/{model_name_output}")
tokenizer.save_pretrained(f"{model_org_output}/{model_name_output}")

Downloading readme:   0%|          | 0.00/3.22k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/31.2k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/27.2k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/83 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10 [00:00<?, ? examples/s]

Map:   0%|          | 0/83 [00:00<?, ? examples/s]

Map:   0%|          | 0/11 [00:00<?, ? examples/s]

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

trainable params: 2,252,800 || all params: 1,102,301,184 || trainable%: 0.2044


Step,Training Loss,Validation Loss
0,No log,1.919198
200,0.633400,0.635177


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


('mjschock/TinyLlama-1.1B-Chat-v1.0-ft/tokenizer_config.json',
 'mjschock/TinyLlama-1.1B-Chat-v1.0-ft/special_tokens_map.json',
 'mjschock/TinyLlama-1.1B-Chat-v1.0-ft/tokenizer.model',
 'mjschock/TinyLlama-1.1B-Chat-v1.0-ft/added_tokens.json',
 'mjschock/TinyLlama-1.1B-Chat-v1.0-ft/tokenizer.json')

In [18]:
print(generate_formatted_response(formatted_chat, model, tokenizer))


I do not have access to real-time weather data. However, according to the given text, the weather in oakland and atlanta is mentioned as "optional text goes here and only here."
