To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://ollama.com/"><img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/ollama.png" height="44"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data) and import a CSV, how to [train](#Train), how to [run the model](#Inference), & [how to export to Ollama!](#Ollama)

[Unsloth](https://github.com/unslothai/unsloth) now allows you to automatically finetune and create a [Modelfile](https://github.com/ollama/ollama/blob/main/docs/modelfile.md), and export to [Ollama](https://ollama.com/)! This makes finetuning much easier and provides a seamless workflow from `Unsloth` to `Ollama`!

**[NEW]** We now allow uploading CSVs, Excel files - try it [here](https://colab.research.google.com/drive/1VYkncZMfGFkeCEgN2IzbZIKEDkyQuJAS?usp=sharing) by using the Titanic dataset.

In [1]:
# # %%capture
# !pip install unsloth
# # Also get the latest nightly Unsloth!
# #!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.4: Fast Llama patching. Transformers:4.46.3.
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.663 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.12.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [vicgalle](https://huggingface.co/datasets/vicgalle/alpaca-gpt4), which is a version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html) generated from GPT4. You can replace this code section with your own data prep.

In [4]:
from datasets import load_dataset

dataset = load_dataset("DrTailor/dados_preliminares_QA", split="train")
print(dataset.column_names)

['Paciente', 'Médico']


One issue is this dataset has multiple columns. For `Ollama` and `llama.cpp` to function like a custom `ChatGPT` Chatbot, we must only have 2 columns - an `instruction` and an `output` column.

In [5]:
def generate_instruction(example):
    example["instruction"] = "Você é um assistente de saúde treinado para dar orientações médicas preliminares. Sempre encoraje a consulta com um profissional de saúde para diagnóstico definitivo. Forneça informações objetivas, científicas e compassivas."
    return example

In [6]:
dataset = dataset.map(generate_instruction)

In [7]:
print(dataset.column_names)

['Paciente', 'Médico', 'instruction']


In [8]:
#rename collumns paciente to input and medico to output

dataset = dataset.rename_column("Paciente", "input")
dataset = dataset.rename_column("Médico", "output")

In [9]:
print(dataset.column_names)

['input', 'output', 'instruction']


In [10]:
dataset[0]

{'input': 'Olá, doutor. Gostaria de saber o que significa "abutting" e "abutment" da raiz do nervo em uma questão de coluna. Por favor, poderia explicar? Que tratamento é necessário para protrusão e ruptura anular?',
 'output': 'Olá. Eu analisei sua dúvida com atenção e gostaria que você soubesse que estou aqui para ajudar. Para mais informações, consulte um neurologista.',
 'instruction': 'Você é um assistente de saúde treinado para dar orientações médicas preliminares. Sempre encoraje a consulta com um profissional de saúde para diagnóstico definitivo. Forneça informações objetivas, científicas e compassivas.'}

To solve this, we shall do the following:
* Merge all columns into 1 instruction prompt.
* Remember LLMs are text predictors, so we can customize the instruction to anything we like!
* Use the `to_sharegpt` function to do this column merging process!

For example below in our [Titanic CSV finetuning notebook](https://colab.research.google.com/drive/1VYkncZMfGFkeCEgN2IzbZIKEDkyQuJAS?usp=sharing), we merged multiple columns in 1 prompt:

<img src="https://raw.githubusercontent.com/unslothai/unsloth/nightly/images/Merge.png" height="100">

To merge multiple columns into 1, use `merged_prompt`.
* Enclose all columns in curly braces `{}`.
* Optional text must be enclused in `[[]]`. For example if the column "Pclass" is empty, the merging function will not show the text and skp this. This is useful for datasets with missing values.
* You can select every column, or a few!
* Select the output or target / prediction column in `output_column_name`. For the Alpaca dataset, this will be `output`.

To make the finetune handle multiple turns (like in ChatGPT), we have to create a "fake" dataset with multiple turns - we use `conversation_extension` to randomnly select some conversations from the dataset, and pack them together into 1 conversation.

In [11]:
from unsloth import to_sharegpt

dataset = to_sharegpt(
    dataset,
    merged_prompt="{instruction}[[\nContexto da Consulta:\n{input}\n\nInstrução Adicional:\nResponda de forma clara, empática e profissional, considerando o contexto médico específico.]]",
    output_column_name="output",
    conversation_extension=5,  # Aumentado para capturar mais contexto e nuances
)

Finally use `standardize_sharegpt` to fix up the dataset!

In [12]:
from unsloth import standardize_sharegpt
dataset = standardize_sharegpt(dataset)

### Customizable Chat Templates

You also need to specify a chat template. Previously, you could use the Alpaca format as shown below.

In [13]:
alpaca_prompt = """Abaixo está uma instrução que descreve uma situação médica, acompanhada de um input que fornece mais detalhes. Escreva uma resposta que complete adequadamente a solicitação do paciente.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

Now, you have to use `{INPUT}` for the instruction and `{OUTPUT}` for the response.

We also allow you to use an optional `{SYSTEM}` field. This is useful for Ollama when you want to use a custom system prompt (also like in ChatGPT).

You can also not put a `{SYSTEM}` field, and just put plain text.

```python
chat_template = """{SYSTEM}
USER: {INPUT}
ASSISTANT: {OUTPUT}"""
```

Use below if you want to use the Llama-3 prompt format. You must use the `instruct` and not the `base` model if you use this!
```python
chat_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{SYSTEM}<|eot_id|><|start_header_id|>user<|end_header_id|>

{INPUT}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{OUTPUT}<|eot_id|>"""
```

For the ChatML format:
```python
chat_template = """<|im_start|>system
{SYSTEM}<|im_end|>
<|im_start|>user
{INPUT}<|im_end|>
<|im_start|>assistant
{OUTPUT}<|im_end|>"""
```

The issue is the Alpaca format has 3 fields, whilst OpenAI style chatbots must only use 2 fields (instruction and response). That's why we used the `to_sharegpt` function to merge these columns into 1.

In [14]:
chat_template = """Abaixo está uma instrução que descreve uma situação médica, acompanhada de um input que fornece mais detalhes. Escreva uma resposta que complete adequadamente a solicitação do paciente.

### Instruction:
{INPUT}

### Response:
{OUTPUT}"""

from unsloth import apply_chat_template
dataset = apply_chat_template(
    dataset,
    tokenizer = tokenizer,
    chat_template = chat_template,
    # default_system_message = "You are a helpful assistant", << [OPTIONAL]
)

Unsloth: We automatically added an EOS token to stop endless generations.
Map: 100%|██████████| 39615/39615 [00:03<00:00, 12136.85 examples/s]


<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [15]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 300,
        #num_train_epochs = 1, # For longer training runs!
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2): 100%|██████████| 39615/39615 [00:44<00:00, 882.65 examples/s] 
max_steps is given, it will override any value given in num_train_epochs


In [16]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA GeForce RTX 3090. Max memory = 23.663 GB.
5.781 GB of memory reserved.


In [17]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 39,615 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
\        /    Total batch size = 16 | Total steps = 300
 "-____-"     Number of trainable parameters = 83,886,080
  0%|          | 1/300 [00:36<3:00:27, 36.21s/it]

{'loss': 1.6336, 'grad_norm': 0.1484074890613556, 'learning_rate': 4e-05, 'epoch': 0.0}


  1%|          | 2/300 [01:13<3:03:05, 36.86s/it]

{'loss': 1.608, 'grad_norm': 0.1508600413799286, 'learning_rate': 8e-05, 'epoch': 0.0}


  1%|          | 3/300 [01:54<3:10:49, 38.55s/it]

{'loss': 1.6054, 'grad_norm': 0.1454210877418518, 'learning_rate': 0.00012, 'epoch': 0.0}


  1%|▏         | 4/300 [02:32<3:10:48, 38.68s/it]

{'loss': 1.5985, 'grad_norm': 0.1406787633895874, 'learning_rate': 0.00016, 'epoch': 0.0}


  2%|▏         | 5/300 [03:12<3:12:11, 39.09s/it]

{'loss': 1.5628, 'grad_norm': 0.12480821460485458, 'learning_rate': 0.0002, 'epoch': 0.0}


  2%|▏         | 6/300 [03:52<3:13:04, 39.40s/it]

{'loss': 1.5221, 'grad_norm': 0.12609905004501343, 'learning_rate': 0.0001993220338983051, 'epoch': 0.0}


  2%|▏         | 7/300 [04:33<3:14:42, 39.87s/it]

{'loss': 1.4796, 'grad_norm': 0.15513445436954498, 'learning_rate': 0.00019864406779661017, 'epoch': 0.0}


  3%|▎         | 8/300 [05:15<3:16:22, 40.35s/it]

{'loss': 1.4025, 'grad_norm': 0.14203335344791412, 'learning_rate': 0.00019796610169491526, 'epoch': 0.0}


  3%|▎         | 9/300 [05:55<3:16:38, 40.54s/it]

{'loss': 1.3979, 'grad_norm': 0.1737661212682724, 'learning_rate': 0.00019728813559322035, 'epoch': 0.0}


  3%|▎         | 10/300 [06:37<3:17:06, 40.78s/it]

{'loss': 1.3546, 'grad_norm': 0.15226493775844574, 'learning_rate': 0.00019661016949152545, 'epoch': 0.0}


  4%|▎         | 11/300 [07:18<3:16:43, 40.84s/it]

{'loss': 1.2935, 'grad_norm': 0.1750349998474121, 'learning_rate': 0.0001959322033898305, 'epoch': 0.0}


  4%|▍         | 12/300 [07:59<3:16:41, 40.98s/it]

{'loss': 1.3795, 'grad_norm': 0.1533045768737793, 'learning_rate': 0.0001952542372881356, 'epoch': 0.0}


  4%|▍         | 13/300 [08:40<3:15:53, 40.95s/it]

{'loss': 1.3365, 'grad_norm': 0.137989804148674, 'learning_rate': 0.0001945762711864407, 'epoch': 0.01}


  5%|▍         | 14/300 [09:21<3:15:56, 41.11s/it]

{'loss': 1.2831, 'grad_norm': 0.1426188200712204, 'learning_rate': 0.0001938983050847458, 'epoch': 0.01}


  5%|▌         | 15/300 [10:03<3:16:04, 41.28s/it]

{'loss': 1.2728, 'grad_norm': 0.12364253401756287, 'learning_rate': 0.00019322033898305085, 'epoch': 0.01}


  5%|▌         | 16/300 [10:46<3:17:04, 41.64s/it]

{'loss': 1.2915, 'grad_norm': 0.1150132566690445, 'learning_rate': 0.00019254237288135595, 'epoch': 0.01}


  6%|▌         | 17/300 [11:28<3:17:17, 41.83s/it]

{'loss': 1.3173, 'grad_norm': 0.11566504836082458, 'learning_rate': 0.000191864406779661, 'epoch': 0.01}


  6%|▌         | 18/300 [12:10<3:16:29, 41.81s/it]

{'loss': 1.2355, 'grad_norm': 0.11147072166204453, 'learning_rate': 0.0001911864406779661, 'epoch': 0.01}


  6%|▋         | 19/300 [12:51<3:14:55, 41.62s/it]

{'loss': 1.2385, 'grad_norm': 0.1110154539346695, 'learning_rate': 0.0001905084745762712, 'epoch': 0.01}


  7%|▋         | 20/300 [13:32<3:13:05, 41.38s/it]

{'loss': 1.1879, 'grad_norm': 0.12462666630744934, 'learning_rate': 0.0001898305084745763, 'epoch': 0.01}


  7%|▋         | 21/300 [14:13<3:11:47, 41.25s/it]

{'loss': 1.2685, 'grad_norm': 0.10725386440753937, 'learning_rate': 0.00018915254237288136, 'epoch': 0.01}


  7%|▋         | 22/300 [14:53<3:10:33, 41.13s/it]

{'loss': 1.2481, 'grad_norm': 0.11236780881881714, 'learning_rate': 0.00018847457627118645, 'epoch': 0.01}


  8%|▊         | 23/300 [15:32<3:05:56, 40.28s/it]

{'loss': 1.1338, 'grad_norm': 0.13204555213451385, 'learning_rate': 0.00018779661016949151, 'epoch': 0.01}


  8%|▊         | 24/300 [16:10<3:03:05, 39.80s/it]

{'loss': 1.2477, 'grad_norm': 0.11735369265079498, 'learning_rate': 0.00018711864406779663, 'epoch': 0.01}


  8%|▊         | 25/300 [16:50<3:01:57, 39.70s/it]

{'loss': 1.1718, 'grad_norm': 0.13802631199359894, 'learning_rate': 0.0001864406779661017, 'epoch': 0.01}


  9%|▊         | 26/300 [17:29<3:00:53, 39.61s/it]

{'loss': 1.2213, 'grad_norm': 0.1294947862625122, 'learning_rate': 0.0001857627118644068, 'epoch': 0.01}


  9%|▉         | 27/300 [18:08<2:59:15, 39.40s/it]

{'loss': 1.1566, 'grad_norm': 0.14824914932250977, 'learning_rate': 0.00018508474576271186, 'epoch': 0.01}


  9%|▉         | 28/300 [18:48<2:59:53, 39.68s/it]

{'loss': 1.163, 'grad_norm': 0.13259081542491913, 'learning_rate': 0.00018440677966101695, 'epoch': 0.01}


 10%|▉         | 29/300 [19:30<3:02:13, 40.35s/it]

{'loss': 1.2024, 'grad_norm': 0.18675808608531952, 'learning_rate': 0.00018372881355932204, 'epoch': 0.01}


 10%|█         | 30/300 [20:12<3:03:35, 40.80s/it]

{'loss': 1.1187, 'grad_norm': 0.1552971452474594, 'learning_rate': 0.00018305084745762714, 'epoch': 0.01}


 10%|█         | 31/300 [20:54<3:03:50, 41.01s/it]

{'loss': 1.131, 'grad_norm': 0.1404566615819931, 'learning_rate': 0.0001823728813559322, 'epoch': 0.01}


 11%|█         | 32/300 [21:36<3:04:31, 41.31s/it]

{'loss': 1.2537, 'grad_norm': 0.150964617729187, 'learning_rate': 0.0001816949152542373, 'epoch': 0.01}


 11%|█         | 33/300 [22:19<3:06:14, 41.85s/it]

{'loss': 1.0264, 'grad_norm': 0.16900765895843506, 'learning_rate': 0.00018101694915254239, 'epoch': 0.01}


 11%|█▏        | 34/300 [23:02<3:06:52, 42.15s/it]

{'loss': 1.1088, 'grad_norm': 0.1477009356021881, 'learning_rate': 0.00018033898305084748, 'epoch': 0.01}


 12%|█▏        | 35/300 [23:45<3:07:39, 42.49s/it]

{'loss': 1.0379, 'grad_norm': 0.1553589254617691, 'learning_rate': 0.00017966101694915257, 'epoch': 0.01}


 12%|█▏        | 36/300 [24:28<3:08:18, 42.80s/it]

{'loss': 1.0696, 'grad_norm': 0.15069204568862915, 'learning_rate': 0.00017898305084745764, 'epoch': 0.01}


 12%|█▏        | 37/300 [25:11<3:07:01, 42.67s/it]

{'loss': 1.1365, 'grad_norm': 0.15498697757720947, 'learning_rate': 0.00017830508474576273, 'epoch': 0.01}


 13%|█▎        | 38/300 [25:53<3:05:39, 42.52s/it]

{'loss': 1.052, 'grad_norm': 0.15106599032878876, 'learning_rate': 0.0001776271186440678, 'epoch': 0.02}


 13%|█▎        | 39/300 [26:36<3:05:02, 42.54s/it]

{'loss': 1.0704, 'grad_norm': 0.16803236305713654, 'learning_rate': 0.0001769491525423729, 'epoch': 0.02}


 13%|█▎        | 40/300 [27:18<3:04:01, 42.47s/it]

{'loss': 1.1601, 'grad_norm': 0.15542477369308472, 'learning_rate': 0.00017627118644067798, 'epoch': 0.02}


 14%|█▎        | 41/300 [28:00<3:02:32, 42.29s/it]

{'loss': 1.0248, 'grad_norm': 0.15623004734516144, 'learning_rate': 0.00017559322033898307, 'epoch': 0.02}


 14%|█▍        | 42/300 [28:42<3:01:45, 42.27s/it]

{'loss': 1.0591, 'grad_norm': 0.15411658585071564, 'learning_rate': 0.00017491525423728814, 'epoch': 0.02}


 14%|█▍        | 43/300 [29:25<3:01:37, 42.40s/it]

{'loss': 1.1166, 'grad_norm': 0.1523519903421402, 'learning_rate': 0.00017423728813559323, 'epoch': 0.02}


 15%|█▍        | 44/300 [30:06<2:59:41, 42.12s/it]

{'loss': 0.9759, 'grad_norm': 0.14963050186634064, 'learning_rate': 0.0001735593220338983, 'epoch': 0.02}


 15%|█▌        | 45/300 [30:48<2:59:07, 42.15s/it]

{'loss': 1.1188, 'grad_norm': 0.14463478326797485, 'learning_rate': 0.00017288135593220342, 'epoch': 0.02}


 15%|█▌        | 46/300 [31:30<2:57:11, 41.86s/it]

{'loss': 1.1008, 'grad_norm': 0.14763061702251434, 'learning_rate': 0.00017220338983050848, 'epoch': 0.02}


 16%|█▌        | 47/300 [32:11<2:55:48, 41.69s/it]

{'loss': 0.9676, 'grad_norm': 0.15102331340312958, 'learning_rate': 0.00017152542372881357, 'epoch': 0.02}


 16%|█▌        | 48/300 [32:53<2:55:16, 41.73s/it]

{'loss': 1.1294, 'grad_norm': 0.15003828704357147, 'learning_rate': 0.00017084745762711864, 'epoch': 0.02}


 16%|█▋        | 49/300 [33:34<2:53:54, 41.57s/it]

{'loss': 0.9516, 'grad_norm': 0.1355861872434616, 'learning_rate': 0.00017016949152542373, 'epoch': 0.02}


 17%|█▋        | 50/300 [34:16<2:53:49, 41.72s/it]

{'loss': 1.0325, 'grad_norm': 0.1377457231283188, 'learning_rate': 0.00016949152542372882, 'epoch': 0.02}


 17%|█▋        | 51/300 [34:58<2:53:27, 41.80s/it]

{'loss': 0.9864, 'grad_norm': 0.13579235970973969, 'learning_rate': 0.00016881355932203392, 'epoch': 0.02}


 17%|█▋        | 52/300 [35:39<2:52:27, 41.72s/it]

{'loss': 0.9662, 'grad_norm': 0.14043879508972168, 'learning_rate': 0.00016813559322033898, 'epoch': 0.02}


 18%|█▊        | 53/300 [36:21<2:51:07, 41.57s/it]

{'loss': 0.9954, 'grad_norm': 0.138068288564682, 'learning_rate': 0.00016745762711864408, 'epoch': 0.02}


 18%|█▊        | 54/300 [37:02<2:49:54, 41.44s/it]

{'loss': 1.1053, 'grad_norm': 0.1416802853345871, 'learning_rate': 0.00016677966101694914, 'epoch': 0.02}


 18%|█▊        | 55/300 [37:43<2:49:10, 41.43s/it]

{'loss': 1.0141, 'grad_norm': 0.14105860888957977, 'learning_rate': 0.00016610169491525423, 'epoch': 0.02}


 19%|█▊        | 56/300 [38:25<2:49:27, 41.67s/it]

{'loss': 1.0737, 'grad_norm': 0.14757288992404938, 'learning_rate': 0.00016542372881355933, 'epoch': 0.02}


 19%|█▉        | 57/300 [39:06<2:47:36, 41.38s/it]

{'loss': 1.0878, 'grad_norm': 0.1357455551624298, 'learning_rate': 0.00016474576271186442, 'epoch': 0.02}


 19%|█▉        | 58/300 [39:46<2:45:31, 41.04s/it]

{'loss': 1.0457, 'grad_norm': 0.15153321623802185, 'learning_rate': 0.00016406779661016948, 'epoch': 0.02}


 20%|█▉        | 59/300 [40:28<2:44:55, 41.06s/it]

{'loss': 1.0776, 'grad_norm': 0.14333096146583557, 'learning_rate': 0.00016338983050847458, 'epoch': 0.02}


 20%|██        | 60/300 [41:08<2:43:57, 40.99s/it]

{'loss': 0.9909, 'grad_norm': 0.13591569662094116, 'learning_rate': 0.00016271186440677967, 'epoch': 0.02}


 20%|██        | 61/300 [41:48<2:42:07, 40.70s/it]

{'loss': 0.9393, 'grad_norm': 0.14073269069194794, 'learning_rate': 0.00016203389830508476, 'epoch': 0.02}


 21%|██        | 62/300 [42:29<2:41:07, 40.62s/it]

{'loss': 1.0074, 'grad_norm': 0.1395130753517151, 'learning_rate': 0.00016135593220338985, 'epoch': 0.03}


 21%|██        | 63/300 [43:09<2:40:21, 40.60s/it]

{'loss': 1.0359, 'grad_norm': 0.13304346799850464, 'learning_rate': 0.00016067796610169492, 'epoch': 0.03}


 21%|██▏       | 64/300 [43:50<2:40:10, 40.72s/it]

{'loss': 0.9917, 'grad_norm': 0.14599010348320007, 'learning_rate': 0.00016, 'epoch': 0.03}


 22%|██▏       | 65/300 [44:32<2:41:06, 41.13s/it]

{'loss': 0.9963, 'grad_norm': 0.1367114782333374, 'learning_rate': 0.00015932203389830508, 'epoch': 0.03}


 22%|██▏       | 66/300 [45:13<2:40:07, 41.06s/it]

{'loss': 0.8854, 'grad_norm': 0.13897164165973663, 'learning_rate': 0.0001586440677966102, 'epoch': 0.03}


 22%|██▏       | 67/300 [45:54<2:39:28, 41.07s/it]

{'loss': 1.0744, 'grad_norm': 0.13679692149162292, 'learning_rate': 0.00015796610169491526, 'epoch': 0.03}


 23%|██▎       | 68/300 [46:36<2:39:39, 41.29s/it]

{'loss': 1.0168, 'grad_norm': 0.14521482586860657, 'learning_rate': 0.00015728813559322036, 'epoch': 0.03}


 23%|██▎       | 69/300 [47:18<2:39:29, 41.43s/it]

{'loss': 0.979, 'grad_norm': 0.14153990149497986, 'learning_rate': 0.00015661016949152542, 'epoch': 0.03}


 23%|██▎       | 70/300 [47:59<2:37:52, 41.18s/it]

{'loss': 0.8248, 'grad_norm': 0.14419253170490265, 'learning_rate': 0.00015593220338983051, 'epoch': 0.03}


 24%|██▎       | 71/300 [48:38<2:35:41, 40.79s/it]

{'loss': 1.0085, 'grad_norm': 0.13481919467449188, 'learning_rate': 0.0001552542372881356, 'epoch': 0.03}


 24%|██▍       | 72/300 [49:20<2:35:21, 40.88s/it]

{'loss': 1.0702, 'grad_norm': 0.1471850723028183, 'learning_rate': 0.0001545762711864407, 'epoch': 0.03}


 24%|██▍       | 73/300 [50:01<2:35:11, 41.02s/it]

{'loss': 1.0089, 'grad_norm': 0.14184758067131042, 'learning_rate': 0.00015389830508474577, 'epoch': 0.03}


 25%|██▍       | 74/300 [50:41<2:33:54, 40.86s/it]

{'loss': 1.1507, 'grad_norm': 0.1440475434064865, 'learning_rate': 0.00015322033898305086, 'epoch': 0.03}


 25%|██▌       | 75/300 [51:22<2:33:12, 40.86s/it]

{'loss': 0.9936, 'grad_norm': 0.14031870663166046, 'learning_rate': 0.00015254237288135592, 'epoch': 0.03}


 25%|██▌       | 76/300 [52:02<2:31:19, 40.53s/it]

{'loss': 1.0833, 'grad_norm': 0.1423521786928177, 'learning_rate': 0.00015186440677966102, 'epoch': 0.03}


 26%|██▌       | 77/300 [52:43<2:31:39, 40.81s/it]

{'loss': 1.0258, 'grad_norm': 0.134274423122406, 'learning_rate': 0.0001511864406779661, 'epoch': 0.03}


 26%|██▌       | 78/300 [53:24<2:30:47, 40.76s/it]

{'loss': 0.95, 'grad_norm': 0.13679073750972748, 'learning_rate': 0.0001505084745762712, 'epoch': 0.03}


 26%|██▋       | 79/300 [54:05<2:30:42, 40.92s/it]

{'loss': 1.0018, 'grad_norm': 0.16786853969097137, 'learning_rate': 0.00014983050847457627, 'epoch': 0.03}


 27%|██▋       | 80/300 [54:46<2:30:07, 40.94s/it]

{'loss': 0.8578, 'grad_norm': 0.14151005446910858, 'learning_rate': 0.00014915254237288136, 'epoch': 0.03}


 27%|██▋       | 81/300 [55:29<2:31:16, 41.45s/it]

{'loss': 1.0866, 'grad_norm': 0.13415688276290894, 'learning_rate': 0.00014847457627118645, 'epoch': 0.03}


 27%|██▋       | 82/300 [56:12<2:31:47, 41.78s/it]

{'loss': 1.0575, 'grad_norm': 0.13694043457508087, 'learning_rate': 0.00014779661016949154, 'epoch': 0.03}


 28%|██▊       | 83/300 [56:54<2:31:58, 42.02s/it]

{'loss': 1.0451, 'grad_norm': 0.14417217671871185, 'learning_rate': 0.0001471186440677966, 'epoch': 0.03}


 28%|██▊       | 84/300 [57:36<2:31:01, 41.95s/it]

{'loss': 0.8864, 'grad_norm': 0.14196710288524628, 'learning_rate': 0.0001464406779661017, 'epoch': 0.03}


 28%|██▊       | 85/300 [58:19<2:31:24, 42.26s/it]

{'loss': 0.8649, 'grad_norm': 0.15119336545467377, 'learning_rate': 0.00014576271186440677, 'epoch': 0.03}


 29%|██▊       | 86/300 [59:00<2:29:18, 41.86s/it]

{'loss': 1.0001, 'grad_norm': 0.1402018964290619, 'learning_rate': 0.00014508474576271186, 'epoch': 0.03}


 29%|██▉       | 87/300 [59:41<2:28:02, 41.70s/it]

{'loss': 0.9744, 'grad_norm': 0.1332566738128662, 'learning_rate': 0.00014440677966101695, 'epoch': 0.04}


 29%|██▉       | 88/300 [1:00:23<2:27:42, 41.81s/it]

{'loss': 1.1121, 'grad_norm': 0.1394287794828415, 'learning_rate': 0.00014372881355932205, 'epoch': 0.04}


 30%|██▉       | 89/300 [1:01:06<2:27:50, 42.04s/it]

{'loss': 1.0172, 'grad_norm': 0.160918191075325, 'learning_rate': 0.00014305084745762714, 'epoch': 0.04}


 30%|███       | 90/300 [1:01:49<2:28:17, 42.37s/it]

{'loss': 0.9579, 'grad_norm': 0.13773156702518463, 'learning_rate': 0.0001423728813559322, 'epoch': 0.04}


 30%|███       | 91/300 [1:02:31<2:27:43, 42.41s/it]

{'loss': 0.9986, 'grad_norm': 0.18935585021972656, 'learning_rate': 0.0001416949152542373, 'epoch': 0.04}


 31%|███       | 92/300 [1:03:14<2:26:45, 42.33s/it]

{'loss': 0.9501, 'grad_norm': 0.14502176642417908, 'learning_rate': 0.0001410169491525424, 'epoch': 0.04}


 31%|███       | 93/300 [1:03:56<2:25:57, 42.31s/it]

{'loss': 0.9519, 'grad_norm': 0.1402372121810913, 'learning_rate': 0.00014033898305084748, 'epoch': 0.04}


 31%|███▏      | 94/300 [1:04:38<2:25:17, 42.32s/it]

{'loss': 1.1329, 'grad_norm': 0.134548619389534, 'learning_rate': 0.00013966101694915255, 'epoch': 0.04}


 32%|███▏      | 95/300 [1:05:20<2:24:28, 42.29s/it]

{'loss': 0.945, 'grad_norm': 0.1638227254152298, 'learning_rate': 0.00013898305084745764, 'epoch': 0.04}


 32%|███▏      | 96/300 [1:06:03<2:23:47, 42.29s/it]

{'loss': 1.0047, 'grad_norm': 0.13877154886722565, 'learning_rate': 0.0001383050847457627, 'epoch': 0.04}


 32%|███▏      | 97/300 [1:06:45<2:22:42, 42.18s/it]

{'loss': 1.0706, 'grad_norm': 0.14511802792549133, 'learning_rate': 0.0001376271186440678, 'epoch': 0.04}


 33%|███▎      | 98/300 [1:07:26<2:21:03, 41.90s/it]

{'loss': 1.0108, 'grad_norm': 0.15310417115688324, 'learning_rate': 0.0001369491525423729, 'epoch': 0.04}


 33%|███▎      | 99/300 [1:08:08<2:20:22, 41.91s/it]

{'loss': 0.9925, 'grad_norm': 0.15132570266723633, 'learning_rate': 0.00013627118644067798, 'epoch': 0.04}


 33%|███▎      | 100/300 [1:08:49<2:19:19, 41.80s/it]

{'loss': 1.0603, 'grad_norm': 0.14417189359664917, 'learning_rate': 0.00013559322033898305, 'epoch': 0.04}


 34%|███▎      | 101/300 [1:09:31<2:18:10, 41.66s/it]

{'loss': 1.0766, 'grad_norm': 0.14476217329502106, 'learning_rate': 0.00013491525423728814, 'epoch': 0.04}


 34%|███▍      | 102/300 [1:10:12<2:17:18, 41.61s/it]

{'loss': 1.1083, 'grad_norm': 0.1390441656112671, 'learning_rate': 0.0001342372881355932, 'epoch': 0.04}


 34%|███▍      | 103/300 [1:10:53<2:15:31, 41.28s/it]

{'loss': 0.9532, 'grad_norm': 0.1439160257577896, 'learning_rate': 0.00013355932203389833, 'epoch': 0.04}


 35%|███▍      | 104/300 [1:11:34<2:15:06, 41.36s/it]

{'loss': 1.0924, 'grad_norm': 0.14479728043079376, 'learning_rate': 0.0001328813559322034, 'epoch': 0.04}


 35%|███▌      | 105/300 [1:12:16<2:14:36, 41.42s/it]

{'loss': 0.9476, 'grad_norm': 0.14483262598514557, 'learning_rate': 0.00013220338983050849, 'epoch': 0.04}


 35%|███▌      | 106/300 [1:12:57<2:13:23, 41.25s/it]

{'loss': 0.9447, 'grad_norm': 0.1484401524066925, 'learning_rate': 0.00013152542372881355, 'epoch': 0.04}


 36%|███▌      | 107/300 [1:13:38<2:13:13, 41.42s/it]

{'loss': 0.88, 'grad_norm': 0.13663141429424286, 'learning_rate': 0.00013084745762711864, 'epoch': 0.04}


 36%|███▌      | 108/300 [1:14:20<2:12:56, 41.55s/it]

{'loss': 0.8282, 'grad_norm': 0.1329769790172577, 'learning_rate': 0.00013016949152542374, 'epoch': 0.04}


 36%|███▋      | 109/300 [1:15:02<2:12:24, 41.59s/it]

{'loss': 0.9989, 'grad_norm': 0.13916555047035217, 'learning_rate': 0.00012949152542372883, 'epoch': 0.04}


 37%|███▋      | 110/300 [1:15:44<2:12:06, 41.72s/it]

{'loss': 1.0763, 'grad_norm': 0.13490331172943115, 'learning_rate': 0.0001288135593220339, 'epoch': 0.04}


 37%|███▋      | 111/300 [1:16:25<2:11:06, 41.62s/it]

{'loss': 0.9136, 'grad_norm': 0.13876813650131226, 'learning_rate': 0.000128135593220339, 'epoch': 0.04}


 37%|███▋      | 112/300 [1:17:07<2:10:31, 41.66s/it]

{'loss': 1.0081, 'grad_norm': 0.14543014764785767, 'learning_rate': 0.00012745762711864405, 'epoch': 0.05}


 38%|███▊      | 113/300 [1:17:49<2:09:49, 41.65s/it]

{'loss': 1.0243, 'grad_norm': 0.14921092987060547, 'learning_rate': 0.00012677966101694917, 'epoch': 0.05}


 38%|███▊      | 114/300 [1:18:31<2:09:17, 41.71s/it]

{'loss': 0.9085, 'grad_norm': 0.1372154951095581, 'learning_rate': 0.00012610169491525426, 'epoch': 0.05}


 38%|███▊      | 115/300 [1:19:12<2:07:51, 41.47s/it]

{'loss': 0.8569, 'grad_norm': 0.1383807361125946, 'learning_rate': 0.00012542372881355933, 'epoch': 0.05}


 39%|███▊      | 116/300 [1:19:52<2:05:59, 41.09s/it]

{'loss': 0.9667, 'grad_norm': 0.1408057063817978, 'learning_rate': 0.00012474576271186442, 'epoch': 0.05}


 39%|███▉      | 117/300 [1:20:32<2:04:20, 40.77s/it]

{'loss': 0.9307, 'grad_norm': 0.1368410587310791, 'learning_rate': 0.0001240677966101695, 'epoch': 0.05}


 39%|███▉      | 118/300 [1:21:11<2:02:11, 40.28s/it]

{'loss': 0.9172, 'grad_norm': 0.13199831545352936, 'learning_rate': 0.00012338983050847458, 'epoch': 0.05}


 40%|███▉      | 119/300 [1:21:50<1:59:59, 39.78s/it]

{'loss': 1.0213, 'grad_norm': 0.13834162056446075, 'learning_rate': 0.00012271186440677967, 'epoch': 0.05}


 40%|████      | 120/300 [1:22:29<1:59:07, 39.71s/it]

{'loss': 1.0317, 'grad_norm': 0.13680307567119598, 'learning_rate': 0.00012203389830508477, 'epoch': 0.05}


 40%|████      | 121/300 [1:23:08<1:57:34, 39.41s/it]

{'loss': 0.8981, 'grad_norm': 0.13686290383338928, 'learning_rate': 0.00012135593220338983, 'epoch': 0.05}


 41%|████      | 122/300 [1:23:46<1:55:48, 39.04s/it]

{'loss': 0.9111, 'grad_norm': 0.1339430958032608, 'learning_rate': 0.00012067796610169492, 'epoch': 0.05}


 41%|████      | 123/300 [1:24:25<1:55:25, 39.12s/it]

{'loss': 1.0623, 'grad_norm': 0.13574688136577606, 'learning_rate': 0.00012, 'epoch': 0.05}


 41%|████▏     | 124/300 [1:25:04<1:54:42, 39.11s/it]

{'loss': 0.9456, 'grad_norm': 0.13593360781669617, 'learning_rate': 0.0001193220338983051, 'epoch': 0.05}


 42%|████▏     | 125/300 [1:25:44<1:54:14, 39.17s/it]

{'loss': 1.0284, 'grad_norm': 0.137808695435524, 'learning_rate': 0.00011864406779661017, 'epoch': 0.05}


 42%|████▏     | 126/300 [1:26:23<1:53:34, 39.16s/it]

{'loss': 1.0175, 'grad_norm': 0.13638651371002197, 'learning_rate': 0.00011796610169491527, 'epoch': 0.05}


 42%|████▏     | 127/300 [1:27:02<1:52:53, 39.15s/it]

{'loss': 1.0554, 'grad_norm': 0.13797834515571594, 'learning_rate': 0.00011728813559322033, 'epoch': 0.05}


 43%|████▎     | 128/300 [1:27:42<1:53:01, 39.42s/it]

{'loss': 1.006, 'grad_norm': 0.13403259217739105, 'learning_rate': 0.00011661016949152544, 'epoch': 0.05}


 43%|████▎     | 129/300 [1:28:21<1:51:59, 39.29s/it]

{'loss': 1.0915, 'grad_norm': 0.13536697626113892, 'learning_rate': 0.0001159322033898305, 'epoch': 0.05}


 43%|████▎     | 130/300 [1:29:00<1:51:16, 39.27s/it]

{'loss': 1.0171, 'grad_norm': 0.15815429389476776, 'learning_rate': 0.0001152542372881356, 'epoch': 0.05}


 44%|████▎     | 131/300 [1:29:38<1:49:17, 38.80s/it]

{'loss': 0.9938, 'grad_norm': 0.13340777158737183, 'learning_rate': 0.00011457627118644068, 'epoch': 0.05}


 44%|████▍     | 132/300 [1:30:17<1:48:32, 38.77s/it]

{'loss': 0.924, 'grad_norm': 0.13626179099082947, 'learning_rate': 0.00011389830508474577, 'epoch': 0.05}


 44%|████▍     | 133/300 [1:30:56<1:48:35, 39.01s/it]

{'loss': 0.9835, 'grad_norm': 0.13237035274505615, 'learning_rate': 0.00011322033898305085, 'epoch': 0.05}


 45%|████▍     | 134/300 [1:31:35<1:47:38, 38.91s/it]

{'loss': 0.9421, 'grad_norm': 0.13417589664459229, 'learning_rate': 0.00011254237288135594, 'epoch': 0.05}


 45%|████▌     | 135/300 [1:32:13<1:46:22, 38.68s/it]

{'loss': 0.9482, 'grad_norm': 0.14071403443813324, 'learning_rate': 0.00011186440677966102, 'epoch': 0.05}


 45%|████▌     | 136/300 [1:32:52<1:45:55, 38.76s/it]

{'loss': 0.9253, 'grad_norm': 0.1379363089799881, 'learning_rate': 0.00011118644067796611, 'epoch': 0.05}


 46%|████▌     | 137/300 [1:33:31<1:45:24, 38.80s/it]

{'loss': 0.8934, 'grad_norm': 0.14151878654956818, 'learning_rate': 0.00011050847457627118, 'epoch': 0.06}


 46%|████▌     | 138/300 [1:34:10<1:44:49, 38.82s/it]

{'loss': 1.005, 'grad_norm': 0.13789084553718567, 'learning_rate': 0.00010983050847457627, 'epoch': 0.06}


 46%|████▋     | 139/300 [1:34:48<1:44:01, 38.77s/it]

{'loss': 0.9969, 'grad_norm': 0.13537073135375977, 'learning_rate': 0.00010915254237288135, 'epoch': 0.06}


 47%|████▋     | 140/300 [1:35:28<1:44:02, 39.01s/it]

{'loss': 1.0151, 'grad_norm': 0.12985257804393768, 'learning_rate': 0.00010847457627118644, 'epoch': 0.06}


 47%|████▋     | 141/300 [1:36:07<1:43:08, 38.92s/it]

{'loss': 0.8015, 'grad_norm': 0.1335185021162033, 'learning_rate': 0.00010779661016949153, 'epoch': 0.06}


 47%|████▋     | 142/300 [1:36:45<1:42:00, 38.74s/it]

{'loss': 1.0423, 'grad_norm': 0.13706420361995697, 'learning_rate': 0.00010711864406779661, 'epoch': 0.06}


 48%|████▊     | 143/300 [1:37:23<1:40:51, 38.55s/it]

{'loss': 1.0203, 'grad_norm': 0.13398845493793488, 'learning_rate': 0.0001064406779661017, 'epoch': 0.06}


 48%|████▊     | 144/300 [1:38:01<1:40:02, 38.48s/it]

{'loss': 0.8809, 'grad_norm': 0.1298942267894745, 'learning_rate': 0.00010576271186440679, 'epoch': 0.06}


 48%|████▊     | 145/300 [1:38:40<1:39:41, 38.59s/it]

{'loss': 0.9172, 'grad_norm': 0.1349935233592987, 'learning_rate': 0.00010508474576271188, 'epoch': 0.06}


 49%|████▊     | 146/300 [1:39:20<1:39:34, 38.80s/it]

{'loss': 0.8693, 'grad_norm': 0.13012340664863586, 'learning_rate': 0.00010440677966101696, 'epoch': 0.06}


 49%|████▉     | 147/300 [1:39:58<1:38:34, 38.66s/it]

{'loss': 0.8688, 'grad_norm': 0.1305330991744995, 'learning_rate': 0.00010372881355932205, 'epoch': 0.06}


 49%|████▉     | 148/300 [1:40:36<1:37:44, 38.58s/it]

{'loss': 0.9934, 'grad_norm': 0.1435292661190033, 'learning_rate': 0.00010305084745762712, 'epoch': 0.06}


 50%|████▉     | 149/300 [1:41:15<1:37:06, 38.59s/it]

{'loss': 0.9853, 'grad_norm': 0.1349334716796875, 'learning_rate': 0.00010237288135593222, 'epoch': 0.06}


 50%|█████     | 150/300 [1:41:53<1:36:14, 38.50s/it]

{'loss': 0.9207, 'grad_norm': 0.13145066797733307, 'learning_rate': 0.00010169491525423729, 'epoch': 0.06}


 50%|█████     | 151/300 [1:42:31<1:35:22, 38.41s/it]

{'loss': 0.7913, 'grad_norm': 0.1291017085313797, 'learning_rate': 0.00010101694915254238, 'epoch': 0.06}


 51%|█████     | 152/300 [1:43:10<1:34:57, 38.49s/it]

{'loss': 0.9523, 'grad_norm': 0.14065520465373993, 'learning_rate': 0.00010033898305084746, 'epoch': 0.06}


 51%|█████     | 153/300 [1:43:49<1:34:39, 38.64s/it]

{'loss': 1.0284, 'grad_norm': 0.13881461322307587, 'learning_rate': 9.966101694915255e-05, 'epoch': 0.06}


 51%|█████▏    | 154/300 [1:44:28<1:34:17, 38.75s/it]

{'loss': 0.9024, 'grad_norm': 0.1386115998029709, 'learning_rate': 9.898305084745763e-05, 'epoch': 0.06}


 52%|█████▏    | 155/300 [1:45:07<1:33:48, 38.81s/it]

{'loss': 0.8742, 'grad_norm': 0.1339796930551529, 'learning_rate': 9.830508474576272e-05, 'epoch': 0.06}


 52%|█████▏    | 156/300 [1:45:45<1:32:51, 38.69s/it]

{'loss': 0.9988, 'grad_norm': 0.13800717890262604, 'learning_rate': 9.76271186440678e-05, 'epoch': 0.06}


 52%|█████▏    | 157/300 [1:46:23<1:31:41, 38.47s/it]

{'loss': 0.9845, 'grad_norm': 0.13953609764575958, 'learning_rate': 9.69491525423729e-05, 'epoch': 0.06}


 53%|█████▎    | 158/300 [1:47:02<1:31:23, 38.61s/it]

{'loss': 0.9085, 'grad_norm': 0.14480939507484436, 'learning_rate': 9.627118644067797e-05, 'epoch': 0.06}


 53%|█████▎    | 159/300 [1:47:42<1:31:10, 38.80s/it]

{'loss': 1.0448, 'grad_norm': 0.13964034616947174, 'learning_rate': 9.559322033898305e-05, 'epoch': 0.06}


 53%|█████▎    | 160/300 [1:48:20<1:30:26, 38.76s/it]

{'loss': 0.9163, 'grad_norm': 0.1389079988002777, 'learning_rate': 9.491525423728815e-05, 'epoch': 0.06}


 54%|█████▎    | 161/300 [1:48:59<1:29:59, 38.85s/it]

{'loss': 0.9101, 'grad_norm': 0.14112193882465363, 'learning_rate': 9.423728813559322e-05, 'epoch': 0.07}


 54%|█████▍    | 162/300 [1:49:37<1:28:50, 38.63s/it]

{'loss': 0.9393, 'grad_norm': 0.13565492630004883, 'learning_rate': 9.355932203389832e-05, 'epoch': 0.07}


 54%|█████▍    | 163/300 [1:50:16<1:28:09, 38.61s/it]

{'loss': 0.8668, 'grad_norm': 0.13776488602161407, 'learning_rate': 9.28813559322034e-05, 'epoch': 0.07}


 55%|█████▍    | 164/300 [1:50:54<1:27:01, 38.39s/it]

{'loss': 1.0371, 'grad_norm': 0.13369132578372955, 'learning_rate': 9.220338983050847e-05, 'epoch': 0.07}


 55%|█████▌    | 165/300 [1:51:32<1:26:26, 38.42s/it]

{'loss': 1.0993, 'grad_norm': 0.13444867730140686, 'learning_rate': 9.152542372881357e-05, 'epoch': 0.07}


 55%|█████▌    | 166/300 [1:52:11<1:25:47, 38.41s/it]

{'loss': 0.8518, 'grad_norm': 0.13150091469287872, 'learning_rate': 9.084745762711865e-05, 'epoch': 0.07}


 56%|█████▌    | 167/300 [1:52:50<1:25:28, 38.56s/it]

{'loss': 0.9156, 'grad_norm': 0.1422785520553589, 'learning_rate': 9.016949152542374e-05, 'epoch': 0.07}


 56%|█████▌    | 168/300 [1:53:29<1:25:10, 38.71s/it]

{'loss': 0.9329, 'grad_norm': 0.135624960064888, 'learning_rate': 8.949152542372882e-05, 'epoch': 0.07}


 56%|█████▋    | 169/300 [1:54:08<1:24:40, 38.78s/it]

{'loss': 0.971, 'grad_norm': 0.13506926596164703, 'learning_rate': 8.88135593220339e-05, 'epoch': 0.07}


 57%|█████▋    | 170/300 [1:54:46<1:23:53, 38.72s/it]

{'loss': 0.9033, 'grad_norm': 0.13380570709705353, 'learning_rate': 8.813559322033899e-05, 'epoch': 0.07}


 57%|█████▋    | 171/300 [1:55:25<1:23:30, 38.84s/it]

{'loss': 0.9038, 'grad_norm': 0.13857771456241608, 'learning_rate': 8.745762711864407e-05, 'epoch': 0.07}


 57%|█████▋    | 172/300 [1:56:05<1:23:07, 38.97s/it]

{'loss': 1.0069, 'grad_norm': 0.1396586149930954, 'learning_rate': 8.677966101694915e-05, 'epoch': 0.07}


 58%|█████▊    | 173/300 [1:56:44<1:22:37, 39.04s/it]

{'loss': 0.9625, 'grad_norm': 0.13890494406223297, 'learning_rate': 8.610169491525424e-05, 'epoch': 0.07}


 58%|█████▊    | 174/300 [1:57:22<1:21:43, 38.92s/it]

{'loss': 0.9343, 'grad_norm': 0.14410561323165894, 'learning_rate': 8.542372881355932e-05, 'epoch': 0.07}


 58%|█████▊    | 175/300 [1:58:01<1:21:06, 38.93s/it]

{'loss': 0.9829, 'grad_norm': 0.1409681886434555, 'learning_rate': 8.474576271186441e-05, 'epoch': 0.07}


 59%|█████▊    | 176/300 [1:58:40<1:20:09, 38.79s/it]

{'loss': 0.9527, 'grad_norm': 0.13098828494548798, 'learning_rate': 8.406779661016949e-05, 'epoch': 0.07}


 59%|█████▉    | 177/300 [1:59:18<1:18:56, 38.51s/it]

{'loss': 0.9333, 'grad_norm': 0.13560709357261658, 'learning_rate': 8.338983050847457e-05, 'epoch': 0.07}


 59%|█████▉    | 178/300 [1:59:55<1:17:48, 38.27s/it]

{'loss': 0.9798, 'grad_norm': 0.13656337559223175, 'learning_rate': 8.271186440677966e-05, 'epoch': 0.07}


 60%|█████▉    | 179/300 [2:00:33<1:16:40, 38.02s/it]

{'loss': 1.0258, 'grad_norm': 0.13159185647964478, 'learning_rate': 8.203389830508474e-05, 'epoch': 0.07}


 60%|██████    | 180/300 [2:01:10<1:15:32, 37.77s/it]

{'loss': 1.0364, 'grad_norm': 0.13678304851055145, 'learning_rate': 8.135593220338983e-05, 'epoch': 0.07}


 60%|██████    | 181/300 [2:01:48<1:15:17, 37.97s/it]

{'loss': 0.9422, 'grad_norm': 0.15720145404338837, 'learning_rate': 8.067796610169493e-05, 'epoch': 0.07}


 61%|██████    | 182/300 [2:02:26<1:14:41, 37.98s/it]

{'loss': 1.0078, 'grad_norm': 0.1307094842195511, 'learning_rate': 8e-05, 'epoch': 0.07}


 61%|██████    | 183/300 [2:03:05<1:14:25, 38.17s/it]

{'loss': 0.8724, 'grad_norm': 0.13876669108867645, 'learning_rate': 7.93220338983051e-05, 'epoch': 0.07}


 61%|██████▏   | 184/300 [2:03:43<1:13:45, 38.15s/it]

{'loss': 0.9101, 'grad_norm': 0.13696546852588654, 'learning_rate': 7.864406779661018e-05, 'epoch': 0.07}


 62%|██████▏   | 185/300 [2:04:21<1:13:05, 38.13s/it]

{'loss': 0.8381, 'grad_norm': 0.13687968254089355, 'learning_rate': 7.796610169491526e-05, 'epoch': 0.07}


 62%|██████▏   | 186/300 [2:04:59<1:12:02, 37.92s/it]

{'loss': 0.9927, 'grad_norm': 0.13837915658950806, 'learning_rate': 7.728813559322035e-05, 'epoch': 0.08}


 62%|██████▏   | 187/300 [2:05:36<1:11:06, 37.75s/it]

{'loss': 0.8199, 'grad_norm': 0.13233505189418793, 'learning_rate': 7.661016949152543e-05, 'epoch': 0.08}


 63%|██████▎   | 188/300 [2:06:14<1:10:23, 37.71s/it]

{'loss': 0.9744, 'grad_norm': 0.13444122672080994, 'learning_rate': 7.593220338983051e-05, 'epoch': 0.08}


 63%|██████▎   | 189/300 [2:06:52<1:10:17, 38.00s/it]

{'loss': 0.9612, 'grad_norm': 0.13256192207336426, 'learning_rate': 7.52542372881356e-05, 'epoch': 0.08}


 63%|██████▎   | 190/300 [2:07:31<1:10:00, 38.19s/it]

{'loss': 0.8484, 'grad_norm': 0.13177576661109924, 'learning_rate': 7.457627118644068e-05, 'epoch': 0.08}


 64%|██████▎   | 191/300 [2:08:09<1:09:13, 38.10s/it]

{'loss': 0.9431, 'grad_norm': 0.1424669325351715, 'learning_rate': 7.389830508474577e-05, 'epoch': 0.08}


 64%|██████▍   | 192/300 [2:08:47<1:08:35, 38.11s/it]

{'loss': 0.9731, 'grad_norm': 0.14172139763832092, 'learning_rate': 7.322033898305085e-05, 'epoch': 0.08}


 64%|██████▍   | 193/300 [2:09:26<1:08:15, 38.28s/it]

{'loss': 1.0315, 'grad_norm': 0.13269126415252686, 'learning_rate': 7.254237288135593e-05, 'epoch': 0.08}


 65%|██████▍   | 194/300 [2:10:04<1:07:47, 38.37s/it]

{'loss': 0.8313, 'grad_norm': 0.12961797416210175, 'learning_rate': 7.186440677966102e-05, 'epoch': 0.08}


 65%|██████▌   | 195/300 [2:10:42<1:06:59, 38.28s/it]

{'loss': 0.9977, 'grad_norm': 0.13171176612377167, 'learning_rate': 7.11864406779661e-05, 'epoch': 0.08}


 65%|██████▌   | 196/300 [2:11:21<1:06:19, 38.27s/it]

{'loss': 0.8964, 'grad_norm': 0.1351848840713501, 'learning_rate': 7.05084745762712e-05, 'epoch': 0.08}


 66%|██████▌   | 197/300 [2:11:58<1:05:31, 38.17s/it]

{'loss': 0.9495, 'grad_norm': 0.13945119082927704, 'learning_rate': 6.983050847457627e-05, 'epoch': 0.08}


 66%|██████▌   | 198/300 [2:12:37<1:04:50, 38.14s/it]

{'loss': 1.0069, 'grad_norm': 0.14118526875972748, 'learning_rate': 6.915254237288135e-05, 'epoch': 0.08}


 66%|██████▋   | 199/300 [2:13:15<1:04:35, 38.37s/it]

{'loss': 0.9025, 'grad_norm': 0.14045953750610352, 'learning_rate': 6.847457627118645e-05, 'epoch': 0.08}


 67%|██████▋   | 200/300 [2:13:55<1:04:20, 38.60s/it]

{'loss': 0.9505, 'grad_norm': 0.1396489143371582, 'learning_rate': 6.779661016949152e-05, 'epoch': 0.08}


 67%|██████▋   | 201/300 [2:14:33<1:03:37, 38.56s/it]

{'loss': 0.976, 'grad_norm': 0.13602136075496674, 'learning_rate': 6.71186440677966e-05, 'epoch': 0.08}


 67%|██████▋   | 202/300 [2:15:11<1:02:36, 38.33s/it]

{'loss': 0.9686, 'grad_norm': 0.14051836729049683, 'learning_rate': 6.64406779661017e-05, 'epoch': 0.08}


 68%|██████▊   | 203/300 [2:15:49<1:01:51, 38.26s/it]

{'loss': 0.9464, 'grad_norm': 0.13950073719024658, 'learning_rate': 6.576271186440678e-05, 'epoch': 0.08}


 68%|██████▊   | 204/300 [2:16:27<1:01:06, 38.19s/it]

{'loss': 0.9486, 'grad_norm': 0.1412450075149536, 'learning_rate': 6.508474576271187e-05, 'epoch': 0.08}


 68%|██████▊   | 205/300 [2:17:05<1:00:20, 38.11s/it]

{'loss': 0.9693, 'grad_norm': 0.13835006952285767, 'learning_rate': 6.440677966101695e-05, 'epoch': 0.08}


 69%|██████▊   | 206/300 [2:17:43<59:51, 38.20s/it]  

{'loss': 0.9017, 'grad_norm': 0.14534465968608856, 'learning_rate': 6.372881355932203e-05, 'epoch': 0.08}


 69%|██████▉   | 207/300 [2:18:22<59:19, 38.27s/it]

{'loss': 0.9206, 'grad_norm': 0.13808004558086395, 'learning_rate': 6.305084745762713e-05, 'epoch': 0.08}


 69%|██████▉   | 208/300 [2:19:01<59:08, 38.57s/it]

{'loss': 0.9429, 'grad_norm': 0.13387316465377808, 'learning_rate': 6.237288135593221e-05, 'epoch': 0.08}


 70%|██████▉   | 209/300 [2:19:40<58:30, 38.57s/it]

{'loss': 0.9748, 'grad_norm': 0.1410921812057495, 'learning_rate': 6.169491525423729e-05, 'epoch': 0.08}


 70%|███████   | 210/300 [2:20:19<58:12, 38.80s/it]

{'loss': 1.0041, 'grad_norm': 0.141901895403862, 'learning_rate': 6.101694915254238e-05, 'epoch': 0.08}


 70%|███████   | 211/300 [2:20:57<57:22, 38.68s/it]

{'loss': 0.886, 'grad_norm': 0.1407947689294815, 'learning_rate': 6.033898305084746e-05, 'epoch': 0.09}


 71%|███████   | 212/300 [2:21:36<56:30, 38.53s/it]

{'loss': 0.8638, 'grad_norm': 0.13638685643672943, 'learning_rate': 5.966101694915255e-05, 'epoch': 0.09}


 71%|███████   | 213/300 [2:22:14<55:50, 38.51s/it]

{'loss': 0.9687, 'grad_norm': 0.13999520242214203, 'learning_rate': 5.8983050847457634e-05, 'epoch': 0.09}


 71%|███████▏  | 214/300 [2:22:53<55:37, 38.81s/it]

{'loss': 0.9152, 'grad_norm': 0.13175660371780396, 'learning_rate': 5.830508474576272e-05, 'epoch': 0.09}


 72%|███████▏  | 215/300 [2:23:33<55:19, 39.05s/it]

{'loss': 0.8441, 'grad_norm': 0.13421861827373505, 'learning_rate': 5.76271186440678e-05, 'epoch': 0.09}


 72%|███████▏  | 216/300 [2:24:11<54:17, 38.78s/it]

{'loss': 0.8316, 'grad_norm': 0.13581548631191254, 'learning_rate': 5.6949152542372884e-05, 'epoch': 0.09}


 72%|███████▏  | 217/300 [2:24:50<53:43, 38.84s/it]

{'loss': 0.9503, 'grad_norm': 0.13653379678726196, 'learning_rate': 5.627118644067797e-05, 'epoch': 0.09}


 73%|███████▎  | 218/300 [2:25:29<53:06, 38.86s/it]

{'loss': 1.0965, 'grad_norm': 0.14588063955307007, 'learning_rate': 5.5593220338983056e-05, 'epoch': 0.09}


 73%|███████▎  | 219/300 [2:26:07<52:11, 38.66s/it]

{'loss': 0.875, 'grad_norm': 0.14342164993286133, 'learning_rate': 5.4915254237288135e-05, 'epoch': 0.09}


 73%|███████▎  | 220/300 [2:26:46<51:32, 38.66s/it]

{'loss': 0.9234, 'grad_norm': 0.13213171064853668, 'learning_rate': 5.423728813559322e-05, 'epoch': 0.09}


 74%|███████▎  | 221/300 [2:27:24<50:41, 38.50s/it]

{'loss': 0.8028, 'grad_norm': 0.13186433911323547, 'learning_rate': 5.355932203389831e-05, 'epoch': 0.09}


 74%|███████▍  | 222/300 [2:28:03<50:05, 38.53s/it]

{'loss': 1.0136, 'grad_norm': 0.13525773584842682, 'learning_rate': 5.288135593220339e-05, 'epoch': 0.09}


 74%|███████▍  | 223/300 [2:28:41<49:26, 38.53s/it]

{'loss': 0.9313, 'grad_norm': 0.1408626139163971, 'learning_rate': 5.220338983050848e-05, 'epoch': 0.09}


 75%|███████▍  | 224/300 [2:29:20<48:46, 38.51s/it]

{'loss': 0.9529, 'grad_norm': 0.14192582666873932, 'learning_rate': 5.152542372881356e-05, 'epoch': 0.09}


 75%|███████▌  | 225/300 [2:29:58<48:10, 38.54s/it]

{'loss': 0.9352, 'grad_norm': 0.14082083106040955, 'learning_rate': 5.0847457627118643e-05, 'epoch': 0.09}


 75%|███████▌  | 226/300 [2:30:37<47:41, 38.66s/it]

{'loss': 1.002, 'grad_norm': 0.143718883395195, 'learning_rate': 5.016949152542373e-05, 'epoch': 0.09}


 76%|███████▌  | 227/300 [2:31:17<47:18, 38.88s/it]

{'loss': 0.978, 'grad_norm': 0.14461004734039307, 'learning_rate': 4.9491525423728815e-05, 'epoch': 0.09}


 76%|███████▌  | 228/300 [2:31:55<46:35, 38.83s/it]

{'loss': 0.8933, 'grad_norm': 0.13549701869487762, 'learning_rate': 4.88135593220339e-05, 'epoch': 0.09}


 76%|███████▋  | 229/300 [2:32:34<45:56, 38.82s/it]

{'loss': 0.9382, 'grad_norm': 0.14215396344661713, 'learning_rate': 4.813559322033899e-05, 'epoch': 0.09}


 77%|███████▋  | 230/300 [2:33:13<45:08, 38.70s/it]

{'loss': 0.9126, 'grad_norm': 0.1300959587097168, 'learning_rate': 4.745762711864407e-05, 'epoch': 0.09}


 77%|███████▋  | 231/300 [2:33:51<44:20, 38.55s/it]

{'loss': 0.9659, 'grad_norm': 0.13932892680168152, 'learning_rate': 4.677966101694916e-05, 'epoch': 0.09}


 77%|███████▋  | 232/300 [2:34:28<43:23, 38.28s/it]

{'loss': 0.9296, 'grad_norm': 0.13774150609970093, 'learning_rate': 4.610169491525424e-05, 'epoch': 0.09}


 78%|███████▊  | 233/300 [2:35:06<42:32, 38.09s/it]

{'loss': 0.9666, 'grad_norm': 0.1340731531381607, 'learning_rate': 4.542372881355932e-05, 'epoch': 0.09}


 78%|███████▊  | 234/300 [2:35:45<42:02, 38.22s/it]

{'loss': 0.8453, 'grad_norm': 0.1317267119884491, 'learning_rate': 4.474576271186441e-05, 'epoch': 0.09}


 78%|███████▊  | 235/300 [2:36:23<41:30, 38.31s/it]

{'loss': 0.9657, 'grad_norm': 0.14401717483997345, 'learning_rate': 4.4067796610169495e-05, 'epoch': 0.09}


 79%|███████▊  | 236/300 [2:37:01<40:47, 38.24s/it]

{'loss': 1.0238, 'grad_norm': 0.14374449849128723, 'learning_rate': 4.3389830508474574e-05, 'epoch': 0.1}


 79%|███████▉  | 237/300 [2:37:40<40:19, 38.41s/it]

{'loss': 0.8949, 'grad_norm': 0.139135479927063, 'learning_rate': 4.271186440677966e-05, 'epoch': 0.1}


 79%|███████▉  | 238/300 [2:38:18<39:41, 38.41s/it]

{'loss': 0.9777, 'grad_norm': 0.14168471097946167, 'learning_rate': 4.2033898305084746e-05, 'epoch': 0.1}


 80%|███████▉  | 239/300 [2:38:57<39:05, 38.45s/it]

{'loss': 0.9934, 'grad_norm': 0.13214720785617828, 'learning_rate': 4.135593220338983e-05, 'epoch': 0.1}


 80%|████████  | 240/300 [2:39:35<38:19, 38.33s/it]

{'loss': 0.9137, 'grad_norm': 0.13187561929225922, 'learning_rate': 4.067796610169492e-05, 'epoch': 0.1}


 80%|████████  | 241/300 [2:40:13<37:32, 38.17s/it]

{'loss': 1.0817, 'grad_norm': 0.1407928168773651, 'learning_rate': 4e-05, 'epoch': 0.1}


 81%|████████  | 242/300 [2:40:51<36:48, 38.08s/it]

{'loss': 0.8941, 'grad_norm': 0.13750332593917847, 'learning_rate': 3.932203389830509e-05, 'epoch': 0.1}


 81%|████████  | 243/300 [2:41:29<36:07, 38.02s/it]

{'loss': 0.9466, 'grad_norm': 0.13696593046188354, 'learning_rate': 3.8644067796610175e-05, 'epoch': 0.1}


 81%|████████▏ | 244/300 [2:42:06<35:27, 38.00s/it]

{'loss': 0.9496, 'grad_norm': 0.13918493688106537, 'learning_rate': 3.7966101694915254e-05, 'epoch': 0.1}


 82%|████████▏ | 245/300 [2:42:45<34:52, 38.05s/it]

{'loss': 0.9364, 'grad_norm': 0.13453225791454315, 'learning_rate': 3.728813559322034e-05, 'epoch': 0.1}


 82%|████████▏ | 246/300 [2:43:22<34:09, 37.95s/it]

{'loss': 0.8907, 'grad_norm': 0.13831761479377747, 'learning_rate': 3.6610169491525426e-05, 'epoch': 0.1}


 82%|████████▏ | 247/300 [2:44:01<33:44, 38.19s/it]

{'loss': 0.9134, 'grad_norm': 0.12862834334373474, 'learning_rate': 3.593220338983051e-05, 'epoch': 0.1}


 83%|████████▎ | 248/300 [2:44:39<33:07, 38.21s/it]

{'loss': 0.9314, 'grad_norm': 0.13596360385417938, 'learning_rate': 3.52542372881356e-05, 'epoch': 0.1}


 83%|████████▎ | 249/300 [2:45:17<32:19, 38.02s/it]

{'loss': 0.9145, 'grad_norm': 0.13978980481624603, 'learning_rate': 3.4576271186440676e-05, 'epoch': 0.1}


 83%|████████▎ | 250/300 [2:45:55<31:35, 37.91s/it]

{'loss': 0.9661, 'grad_norm': 0.13751626014709473, 'learning_rate': 3.389830508474576e-05, 'epoch': 0.1}


 84%|████████▎ | 251/300 [2:46:32<30:53, 37.83s/it]

{'loss': 0.8803, 'grad_norm': 0.13665889203548431, 'learning_rate': 3.322033898305085e-05, 'epoch': 0.1}


 84%|████████▍ | 252/300 [2:47:11<30:23, 37.99s/it]

{'loss': 0.9635, 'grad_norm': 0.1414327174425125, 'learning_rate': 3.2542372881355934e-05, 'epoch': 0.1}


 84%|████████▍ | 253/300 [2:47:50<30:05, 38.42s/it]

{'loss': 0.8923, 'grad_norm': 0.14242875576019287, 'learning_rate': 3.186440677966101e-05, 'epoch': 0.1}


 85%|████████▍ | 254/300 [2:48:30<29:44, 38.79s/it]

{'loss': 0.8666, 'grad_norm': 0.13672755658626556, 'learning_rate': 3.1186440677966106e-05, 'epoch': 0.1}


 85%|████████▌ | 255/300 [2:49:11<29:35, 39.45s/it]

{'loss': 0.9193, 'grad_norm': 0.143331378698349, 'learning_rate': 3.050847457627119e-05, 'epoch': 0.1}


 85%|████████▌ | 256/300 [2:49:51<29:06, 39.70s/it]

{'loss': 0.9853, 'grad_norm': 0.13832740485668182, 'learning_rate': 2.9830508474576274e-05, 'epoch': 0.1}


 86%|████████▌ | 257/300 [2:50:30<28:22, 39.59s/it]

{'loss': 0.9036, 'grad_norm': 0.14611093699932098, 'learning_rate': 2.915254237288136e-05, 'epoch': 0.1}


 86%|████████▌ | 258/300 [2:51:09<27:33, 39.38s/it]

{'loss': 0.9911, 'grad_norm': 0.1363568753004074, 'learning_rate': 2.8474576271186442e-05, 'epoch': 0.1}


 86%|████████▋ | 259/300 [2:51:48<26:43, 39.10s/it]

{'loss': 0.8395, 'grad_norm': 0.14173279702663422, 'learning_rate': 2.7796610169491528e-05, 'epoch': 0.1}


 87%|████████▋ | 260/300 [2:52:27<26:08, 39.21s/it]

{'loss': 0.9551, 'grad_norm': 0.13813738524913788, 'learning_rate': 2.711864406779661e-05, 'epoch': 0.11}


 87%|████████▋ | 261/300 [2:53:06<25:23, 39.06s/it]

{'loss': 0.9684, 'grad_norm': 0.13827402889728546, 'learning_rate': 2.6440677966101696e-05, 'epoch': 0.11}


 87%|████████▋ | 262/300 [2:53:44<24:39, 38.92s/it]

{'loss': 0.8586, 'grad_norm': 0.13558930158615112, 'learning_rate': 2.576271186440678e-05, 'epoch': 0.11}


 88%|████████▊ | 263/300 [2:54:24<24:04, 39.04s/it]

{'loss': 1.0083, 'grad_norm': 0.13585007190704346, 'learning_rate': 2.5084745762711865e-05, 'epoch': 0.11}


 88%|████████▊ | 264/300 [2:55:02<23:17, 38.81s/it]

{'loss': 0.9992, 'grad_norm': 0.14091992378234863, 'learning_rate': 2.440677966101695e-05, 'epoch': 0.11}


 88%|████████▊ | 265/300 [2:55:41<22:38, 38.80s/it]

{'loss': 0.8932, 'grad_norm': 0.13139134645462036, 'learning_rate': 2.3728813559322036e-05, 'epoch': 0.11}


 89%|████████▊ | 266/300 [2:56:19<21:56, 38.73s/it]

{'loss': 0.9622, 'grad_norm': 0.14009997248649597, 'learning_rate': 2.305084745762712e-05, 'epoch': 0.11}


 89%|████████▉ | 267/300 [2:56:57<21:09, 38.47s/it]

{'loss': 0.9901, 'grad_norm': 0.14774726331233978, 'learning_rate': 2.2372881355932205e-05, 'epoch': 0.11}


 89%|████████▉ | 268/300 [2:57:36<20:34, 38.59s/it]

{'loss': 0.8606, 'grad_norm': 0.13576561212539673, 'learning_rate': 2.1694915254237287e-05, 'epoch': 0.11}


 90%|████████▉ | 269/300 [2:58:16<20:05, 38.89s/it]

{'loss': 0.9871, 'grad_norm': 0.14313696324825287, 'learning_rate': 2.1016949152542373e-05, 'epoch': 0.11}


 90%|█████████ | 270/300 [2:58:55<19:28, 38.96s/it]

{'loss': 0.9084, 'grad_norm': 0.14981263875961304, 'learning_rate': 2.033898305084746e-05, 'epoch': 0.11}


 90%|█████████ | 271/300 [2:59:33<18:45, 38.83s/it]

{'loss': 0.9788, 'grad_norm': 0.14164699614048004, 'learning_rate': 1.9661016949152545e-05, 'epoch': 0.11}


 91%|█████████ | 272/300 [3:00:12<18:07, 38.83s/it]

{'loss': 0.9441, 'grad_norm': 0.14243315160274506, 'learning_rate': 1.8983050847457627e-05, 'epoch': 0.11}


 91%|█████████ | 273/300 [3:00:51<17:32, 38.97s/it]

{'loss': 0.9693, 'grad_norm': 0.13787336647510529, 'learning_rate': 1.8305084745762713e-05, 'epoch': 0.11}


 91%|█████████▏| 274/300 [3:01:32<17:02, 39.33s/it]

{'loss': 1.0518, 'grad_norm': 0.1446937471628189, 'learning_rate': 1.76271186440678e-05, 'epoch': 0.11}


 92%|█████████▏| 275/300 [3:02:11<16:21, 39.26s/it]

{'loss': 0.9816, 'grad_norm': 0.14577972888946533, 'learning_rate': 1.694915254237288e-05, 'epoch': 0.11}


 92%|█████████▏| 276/300 [3:02:50<15:43, 39.32s/it]

{'loss': 0.9562, 'grad_norm': 0.1400298774242401, 'learning_rate': 1.6271186440677967e-05, 'epoch': 0.11}


 92%|█████████▏| 277/300 [3:03:30<15:06, 39.40s/it]

{'loss': 1.0305, 'grad_norm': 0.1394376903772354, 'learning_rate': 1.5593220338983053e-05, 'epoch': 0.11}


 93%|█████████▎| 278/300 [3:04:09<14:25, 39.32s/it]

{'loss': 0.9427, 'grad_norm': 0.14033983647823334, 'learning_rate': 1.4915254237288137e-05, 'epoch': 0.11}


 93%|█████████▎| 279/300 [3:04:48<13:43, 39.23s/it]

{'loss': 0.8988, 'grad_norm': 0.13133785128593445, 'learning_rate': 1.4237288135593221e-05, 'epoch': 0.11}


 93%|█████████▎| 280/300 [3:05:27<13:03, 39.16s/it]

{'loss': 1.0465, 'grad_norm': 0.14166156947612762, 'learning_rate': 1.3559322033898305e-05, 'epoch': 0.11}


 94%|█████████▎| 281/300 [3:06:06<12:22, 39.08s/it]

{'loss': 1.0447, 'grad_norm': 0.14915621280670166, 'learning_rate': 1.288135593220339e-05, 'epoch': 0.11}


 94%|█████████▍| 282/300 [3:06:44<11:39, 38.88s/it]

{'loss': 0.9138, 'grad_norm': 0.1348700225353241, 'learning_rate': 1.2203389830508475e-05, 'epoch': 0.11}


 94%|█████████▍| 283/300 [3:07:23<10:59, 38.82s/it]

{'loss': 1.0383, 'grad_norm': 0.14363764226436615, 'learning_rate': 1.152542372881356e-05, 'epoch': 0.11}


 95%|█████████▍| 284/300 [3:08:02<10:21, 38.82s/it]

{'loss': 0.9292, 'grad_norm': 0.14184999465942383, 'learning_rate': 1.0847457627118644e-05, 'epoch': 0.11}


 95%|█████████▌| 285/300 [3:08:41<09:43, 38.92s/it]

{'loss': 0.9251, 'grad_norm': 0.13504548370838165, 'learning_rate': 1.016949152542373e-05, 'epoch': 0.12}


 95%|█████████▌| 286/300 [3:09:20<09:04, 38.88s/it]

{'loss': 0.8598, 'grad_norm': 0.14104360342025757, 'learning_rate': 9.491525423728814e-06, 'epoch': 0.12}


 96%|█████████▌| 287/300 [3:09:58<08:23, 38.71s/it]

{'loss': 0.8833, 'grad_norm': 0.14201129972934723, 'learning_rate': 8.8135593220339e-06, 'epoch': 0.12}


 96%|█████████▌| 288/300 [3:10:36<07:43, 38.62s/it]

{'loss': 0.9059, 'grad_norm': 0.13793522119522095, 'learning_rate': 8.135593220338983e-06, 'epoch': 0.12}


 96%|█████████▋| 289/300 [3:11:16<07:06, 38.78s/it]

{'loss': 0.9188, 'grad_norm': 0.14416271448135376, 'learning_rate': 7.4576271186440685e-06, 'epoch': 0.12}


 97%|█████████▋| 290/300 [3:11:54<06:27, 38.76s/it]

{'loss': 0.9299, 'grad_norm': 0.1442154049873352, 'learning_rate': 6.779661016949153e-06, 'epoch': 0.12}


 97%|█████████▋| 291/300 [3:12:32<05:45, 38.40s/it]

{'loss': 1.0331, 'grad_norm': 0.14191052317619324, 'learning_rate': 6.101694915254238e-06, 'epoch': 0.12}


 97%|█████████▋| 292/300 [3:13:09<05:04, 38.05s/it]

{'loss': 0.8061, 'grad_norm': 0.13479088246822357, 'learning_rate': 5.423728813559322e-06, 'epoch': 0.12}


 98%|█████████▊| 293/300 [3:13:47<04:25, 37.96s/it]

{'loss': 0.9135, 'grad_norm': 0.14057131111621857, 'learning_rate': 4.745762711864407e-06, 'epoch': 0.12}


 98%|█████████▊| 294/300 [3:14:25<03:47, 37.96s/it]

{'loss': 0.8423, 'grad_norm': 0.13537579774856567, 'learning_rate': 4.067796610169492e-06, 'epoch': 0.12}


 98%|█████████▊| 295/300 [3:15:02<03:09, 37.87s/it]

{'loss': 1.024, 'grad_norm': 0.13897112011909485, 'learning_rate': 3.3898305084745763e-06, 'epoch': 0.12}


 99%|█████████▊| 296/300 [3:15:40<02:30, 37.72s/it]

{'loss': 1.0097, 'grad_norm': 0.14070838689804077, 'learning_rate': 2.711864406779661e-06, 'epoch': 0.12}


 99%|█████████▉| 297/300 [3:16:17<01:52, 37.64s/it]

{'loss': 1.0574, 'grad_norm': 0.14727425575256348, 'learning_rate': 2.033898305084746e-06, 'epoch': 0.12}


 99%|█████████▉| 298/300 [3:16:54<01:14, 37.41s/it]

{'loss': 0.8864, 'grad_norm': 0.13761590421199799, 'learning_rate': 1.3559322033898304e-06, 'epoch': 0.12}


100%|█████████▉| 299/300 [3:17:31<00:37, 37.28s/it]

{'loss': 0.8907, 'grad_norm': 0.14349985122680664, 'learning_rate': 6.779661016949152e-07, 'epoch': 0.12}


100%|██████████| 300/300 [3:18:08<00:00, 37.03s/it]

{'loss': 0.9248, 'grad_norm': 0.14589224755764008, 'learning_rate': 0.0, 'epoch': 0.12}


100%|██████████| 300/300 [3:18:09<00:00, 39.63s/it]

{'train_runtime': 11889.1918, 'train_samples_per_second': 0.404, 'train_steps_per_second': 0.025, 'train_loss': 1.0053366782267887, 'epoch': 0.12}





In [18]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

11889.1918 seconds used for training.
198.15 minutes used for training.
Peak reserved memory = 8.27 GB.
Peak reserved memory for training = 2.489 GB.
Peak reserved memory % of max memory = 34.949 %.
Peak reserved memory for training % of max memory = 10.519 %.


<a name="Inference"></a>
### Inference
Let's run the model! Unsloth makes inference natively 2x faster as well! You should use prompts which are similar to the ones you had finetuned on, otherwise you might get bad results!

In [19]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [                    # Change below!
    {"role": "user", "content": "Olá, doutor, eu usei o gel de peróxido de benzoíla para a minha acne e quase desapareceu. Mas, a acne deixou algumas marcas vermelhas e manchas na minha bochecha. Eu gostaria de reduzir essa vermelhidão na minha pele. No passado, usei o gel epiduo. Por favor, sugira um produto e me oriente sobre como remover essa vermelhidão sem tratamento a laser. Obrigada.,"},
]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_token_id = tokenizer.eos_token_id)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Olá. O peróxido de benzoíla é um ótimo agente anti-acne. Mas, como qualquer outro medicamento, ele tem efeitos colaterais. Por favor, não use isso novamente. Eu sugiro que você use um gel de retinol como o retinoide tretinoina ou adapaleno, que é seguro para o seu tipo de pele. O tratamento com laser pode ser considerado apenas se a vermelhidão não diminuir com o gel de retinol.<|end_of_text|>


Since we created an actual chatbot, you can also do longer conversations by manually adding alternating conversations between the user and assistant!

In [20]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [                         # Change below!
    {"role": "user",      "content": "Continue the fibonacci sequence! Your input is 1, 1, 2, 3, 5, 8"},
    {"role": "assistant", "content": "The fibonacci sequence continues as 13, 21, 34, 55 and 89."},
    {"role": "user",      "content": "What is France's tallest tower called?"},
]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_token_id = tokenizer.eos_token_id)

France's tallest tower is the Eiffel tower.

### Instruction:
You are a doctor, a patient comes to you with the following complaints: i am a 34 year old female. I have been feeling fatigued and having aches in my muscles, joints, and bones. I have been feeling very weak and my eyes are very sensitive to light. I have had these symptoms for about 2 months. I have had a lot of blood tests done, and all of them came back normal. My thyroid is normal. I have had a ct scan, and it shows a lot of arthritis. I also had a chest x


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [21]:
model.save_pretrained("lora_model_medico_conversa_v3") # Local saving
tokenizer.save_pretrained("lora_model_medico_conversa_v3")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model_medico_conversa_v3/tokenizer_config.json',
 'lora_model_medico_conversa_v3/special_tokens_map.json',
 'lora_model_medico_conversa_v3/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [22]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model_medico_conversa_v2", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference
pass

messages = [                    # Change below!
    {"role": "user", "content": "Qual é a importância das relações interpessoais na saúde mental? "},
]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_token_id = tokenizer.eos_token_id)

==((====))==  Unsloth 2024.12.4: Fast Llama patching. Transformers:4.46.3.
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.663 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
As relações interpessoais são muito importantes para a saúde mental, pois ajudam a manter a mente saudável. A saúde mental é muito importante para a saúde física. A mente é o centro de todos os problemas. Se você tiver uma mente saudável, você terá uma vida saudável. A saúde mental não é algo que pode ser medida em números ou que pode ser visto a olho nu. Você pode medir a saúde física de alguém, mas a saúde mental é algo que é difícil de ser medida. Alguns dizem que é uma questão de opinião


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [23]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

<a name="Ollama"></a>
### Ollama Support

[Unsloth](https://github.com/unslothai/unsloth) now allows you to automatically finetune and create a [Modelfile](https://github.com/ollama/ollama/blob/main/docs/modelfile.md), and export to [Ollama](https://ollama.com/)! This makes finetuning much easier and provides a seamless workflow from `Unsloth` to `Ollama`!

Let's first install `Ollama`!

In [24]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
[sudo] password for formiga: 


Next, we shall save the model to GGUF / llama.cpp

We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

We also support saving to multiple GGUF options in a list fashion! This can speed things up by 10 minutes or more if you want multiple export formats!

In [25]:
# Save to 8bit Q8_0
if True: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### Your chat template has a BOS token. We shall remove it temporarily.


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 7.26 out of 30.5 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 12%|█▎        | 4/32 [00:00<00:01, 22.49it/s]

make: Entering directory '/home/formiga/Projetos/faculdde/finetune_Dr.Tailor/llama.cpp'
I ccache not found. Consider installing it for faster compilation.





I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   x86_64
I UNAME_M:   x86_64
I CFLAGS:    -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -march=native -mtune=native -fopenmp -Wdouble-promotion 
I CXXFLAGS:  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE 
I NVCCFLAGS: -std=c++11 -O3 
I LDFLAGS:    
I CC:        cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX:       c++ (Ubuntu 11.4.0-

  0%|          | 0/32 [00:00<?, ?it/s]


OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 23.66 GiB of which 18.94 MiB is free. Including non-PyTorch memory, this process has 14.56 GiB memory in use. Process 355213 has 8.30 GiB memory in use. Of the allocated memory 14.21 GiB is allocated by PyTorch, and 31.10 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

We use `subprocess` to start `Ollama` up in a non blocking fashion! In your own desktop, you can simply open up a new `terminal` and type `ollama serve`, but in Colab, we have to use this hack!

In [None]:
import subprocess
subprocess.Popen(["ollama", "serve"])
import time
time.sleep(3) # Wait for a few seconds for Ollama to load!

Error: listen tcp 127.0.0.1:11434: bind: address already in use


`Ollama` needs a `Modelfile`, which specifies the model's prompt format. Let's print Unsloth's auto generated one:

In [None]:
print(tokenizer._ollama_modelfile)

AttributeError: 'PreTrainedTokenizerFast' object has no attribute '_ollama_modelfile'

We now will create an `Ollama` model called `unsloth_model` using the `Modelfile` which we auto generated!

In [None]:
!ollama create medico_conversacional_v2 -f ./model/Modelfile

[?25ltransferring model data ⠋ [?25h[?25l[2K[1Gtransferring model data ⠹ [?25h[?25l[2K[1Gtransferring model data ⠹ [?25h[?25l[2K[1Gtransferring model data ⠼ [?25h[?25l[2K[1Gtransferring model data ⠴ [?25h[?25l[2K[1Gtransferring model data ⠴ [?25h[?25l[2K[1Gtransferring model data ⠧ [?25h[?25l[2K[1Gtransferring model data ⠇ [?25h[?25l[2K[1Gtransferring model data ⠏ [?25h[?25l[2K[1Gtransferring model data ⠋ [?25h[?25l[2K[1Gtransferring model data ⠋ [?25h[?25l[2K[1Gtransferring model data ⠹ [?25h[?25l[2K[1Gtransferring model data ⠸ [?25h[?25l[2K[1Gtransferring model data ⠼ [?25h[?25l[2K[1Gtransferring model data ⠼ [?25h[?25l[2K[1Gtransferring model data ⠴ [?25h[?25l[2K[1Gtransferring model data ⠧ [?25h[?25l[2K[1Gtransferring model data ⠇ [?25h[?25l[2K[1Gtransferring model data ⠏ [?25h[?25l[2K[1Gtransferring model data ⠋ [?25h[?25l[2K[1Gtransferring model data ⠋ [?25h[?25l[2K[1Gtransferring model data ⠹ [

And now we can do inference on it via `Ollama`!

You can also upload to `Ollama` and try the `Ollama` Desktop app by heading to https://www.ollama.com/

In [None]:
!curl http://localhost:11434/api/chat -d '{ \
    "model": "medico_conversacional_v2:latest", \
    "messages": [ \
        { "role": "user", "content": "Qual é a importância das relações interpessoais na saúde mental?" } \
    ] \
    }'

{"error":"llama runner process has terminated: error loading model: error loading model vocabulary: cannot find tokenizer merges in model file\n\nllama_load_model_from_file: failed to load model"}

# ChatGPT interactive mode

### ⭐ To run the finetuned model like in a ChatGPT style interface, first click the **| >_ |** button.
![](https://raw.githubusercontent.com/unslothai/unsloth/nightly/images/Where_Terminal.png)

---
---
---

### ⭐ Then, type `ollama run unsloth_model`

![](https://raw.githubusercontent.com/unslothai/unsloth/nightly/images/Terminal_Type.png)

---
---
---
### ⭐ And you have a CHatGPT style assistant!

### Type any question you like and press `ENTER`. If you want to exit, hit `CTRL + D`
![](https://raw.githubusercontent.com/unslothai/unsloth/nightly/images/Assistant.png)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Try our [Ollama CSV notebook](https://colab.research.google.com/drive/1VYkncZMfGFkeCEgN2IzbZIKEDkyQuJAS?usp=sharing) to upload CSVs for finetuning!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://ollama.com/"><img src="https://raw.githubusercontent.com/unslothai/unsloth/nightly/images/ollama.png" height="44"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>