To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth?tab=readme-ov-file#-installation-instructions).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

[NEW] Llama-3.1 8b, 70b & 405b are trained on a crazy 15 trillion tokens with 128K long context lengths!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

In [None]:
%%capture
!pip install unsloth "xformers==0.0.28.post2"
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

In [None]:
%%capture
!pip install -U bitsandbytes transformers

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
* [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)
* [**NEW**] We make Mistral NeMo 12B 2x faster and fit in under 12GB of VRAM! [Mistral NeMo notebook](https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing)

# Use Mistral 7B Model

In [None]:
%%time
from unsloth import FastLanguageModel
import torch
max_seq_length = 500 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-v0.3",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.11.10: Fast Mistral patching. Transformers:4.46.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
CPU times: user 10.3 s, sys: 1.91 s, total: 12.2 s
Wall time: 28.3 s


In [None]:
model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32768, 4096, padding_idx=770)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): M

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.11.10 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%%time
import pandas as pd
from datasets import Dataset

# Load the dataset
MRIdf = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Capstone2/FinalResults/sample_results_with_details_FactualCorrectness.csv')

# Remove rows with missing values in the relevant columns

MRIdf = MRIdf.dropna(subset=['actual_impression', 'predicted_impression', 'Factual_Correctness'])
MRIdf.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 22 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   findings                 100 non-null    object 
 1   clinical_information     100 non-null    object 
 2   actual_impression        100 non-null    object 
 3   predicted_impression     100 non-null    object 
 4   Factual_Correctness      100 non-null    float64
 5   ClinicalBERT Similarity  100 non-null    float64
 6   RadBERT Similarity       100 non-null    float64
 7   Rogue L Similarity       100 non-null    float64
 8   BERTScore Precision      100 non-null    float64
 9   BERTScore Recall         100 non-null    float64
 10  BERTScore F1             100 non-null    float64
 11  Rouge1 Precision         100 non-null    float64
 12  Rouge1 Recall            100 non-null    float64
 13  Rouge1 F1                100 non-null    float64
 14  Rouge2 Precision         10

# Goal is to fine-tune model to generate factual correctness 

**100 samples of orginal impression and generated impression were manually given factual correctness scores**

In [None]:
%%time
import pandas as pd
from datasets import Dataset


# Load only the required columns and drop any rows with missing values in those columns
MRIdf_subset = MRIdf[['actual_impression', 'predicted_impression', 'Factual_Correctness']].dropna().drop_duplicates()

# Ensure that each column is of type string
MRIdf_subset['actual_impression'] = MRIdf_subset['actual_impression'].astype(str)
MRIdf_subset['predicted_impression'] = MRIdf_subset['predicted_impression'].astype(str)
MRIdf_subset['Factual_Correctness'] = MRIdf_subset['Factual_Correctness'].astype(str)

# Convert the subset DataFrame to a Hugging Face Dataset
MRIdf_dataset = Dataset.from_pandas(MRIdf_subset)

# Define the Alpaca-style prompt template
alpaca_prompt = """

### Input:
{}

### Input2:
{}

### Response:
{}"""

# EOS token for end-of-sequence in generated text
EOS_TOKEN = tokenizer.eos_token

# Formatting function to structure input-output pairs
def formatting_prompts_func(examples):
    inputs = examples["actual_impression"]
    inputs2 = examples["predicted_impression"]
    outputs = examples["Factual_Correctness"]
    texts = []
    for input, input2, output in zip(inputs, inputs2, outputs):
        # Use Alpaca prompt template, adding EOS token at the end
        text = alpaca_prompt.format(input, input2, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

# Apply formatting function to dataset with batched processing
formatted_data = MRIdf_dataset.map(formatting_prompts_func, batched=True)

# Show the first few formatted examples
formatted_data[:2]


Map:   0%|          | 0/100 [00:00<?, ? examples/s]

CPU times: user 60.6 ms, sys: 583 µs, total: 61.2 ms
Wall time: 200 ms


{'actual_impression': ['1.No evidence of an acute or new lesions or any detectable abnormal enhancement.2.Revisualization of stable few subcortical and periventricular flair hyperintensity without enhancement since prior exam.3.Previously reported flair hyperintensity in bilateral middle cerebellar peduncles are significantly less conspicuous and demonstrate no enhancement.',
  'Large knee effusion with associated fragmentation and destruction of the posterior horn of the medial meniscus. See description and the associated overlying adjacent cartilaginous defect.'],
 'predicted_impression': ['1.Stable few subcortical and periventricular foci of flair hyperintensity in bilateral cerebral hemispheres.2.No evidence of any new lesions or any detectable abnormal enhancement.3.Previously noted lesions within bilateral middle cerebellar peduncles are significantly less conspicuous on current the study and there is no evidence of any new or enhancing lesions in the posterior fossa.',
  '1. Mar

In [None]:
%%time
from sklearn.model_selection import train_test_split

# Convert the formatted dataset to a DataFrame for train-test split, and reset index to avoid duplicate index errors
formatted_df = formatted_data.to_pandas().reset_index(drop=True)

# First split into train and temp (80-20)
train_data, temp_data = train_test_split(formatted_df, test_size=0.2, random_state=42)

# Split temp into validation and test (50-50, meaning 10-10 of original data)
val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=42)

# Convert back to Hugging Face Datasets without additional indices
train_dataset = Dataset.from_pandas(train_data.reset_index(drop=True))
val_dataset = Dataset.from_pandas(val_data.reset_index(drop=True))
test_dataset = Dataset.from_pandas(test_data.reset_index(drop=True))

print(f"Training samples: {len(train_dataset)}")
print(f"Validation samples: {len(val_dataset)}")
print(f"Test samples: {len(test_dataset)}")

Training samples: 21987
Validation samples: 2748
Test samples: 2749
CPU times: user 310 ms, sys: 269 ms, total: 579 ms
Wall time: 605 ms


In [None]:
train_dataset[1]

{'findings': 'There is a heterogeneously enhancing, low T1 and T2 signal intensity left temporal region calvarium lesion that measures up to 5 cm in length with epidural extension that measures up to 7 mm in thickness. There is mild mass effect upon the underlying brain parenchyma. There also appears to be mild subgaleal extension of tumor. There is no evidence of intracranial hemorrhage or acute infarct. There is mild nonspecific periventricular cerebral white matter T2 hyperintensity, but no evidence of intraparenchymal enhancing lesions. The ventricles and basal cisterns are normal in size and configuration. There is no midline shift or herniation. The major cerebral flow voids are intact. The orbits are grossly unremarkable.',
 'clinical_information': 'Metastatic breast cancer.',
 'impression': 'Left temporal calvarium region metastasis with epidural extension measuring up to 7 mm in width. ',
 '__index_level_0__': 16548,
 'text': '\n\n### Input:\nMetastatic breast cancer.\n\n### I

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
     train_dataset=formatted_data,
    #eval_dataset=val_dataset,  # Added validation dataset
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 16,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
       # num_train_epochs = 10, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/100 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
11.582 GB of memory reserved.


In [None]:
%%time
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100 | Num Epochs = 60
O^O/ \_/ \    Batch size per device = 16 | Gradient Accumulation steps = 4
\        /    Total batch size = 64 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.1299
2,2.7794
3,2.3557
4,2.1111
5,2.3406
6,1.1297
7,1.749
8,1.2153
9,1.5566
10,1.3187


CPU times: user 26min 23s, sys: 16min 36s, total: 42min 59s
Wall time: 46min 35s


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
%%time
# Save the trained model and tokenizer
model.save_pretrained('/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsmisrral7bFC')
tokenizer.save_pretrained('/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsmisrral7bFC')

#Save training arguments
#torch.save(training_params, '/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsMRILlama8b60steps')

CPU times: user 256 ms, sys: 137 ms, total: 393 ms
Wall time: 6.03 s


('/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsMRIMistralb60stepsclinc/tokenizer_config.json',
 '/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsMRIMistralb60stepsclinc/special_tokens_map.json',
 '/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsMRIMistralb60stepsclinc/tokenizer.model',
 '/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsMRIMistralb60stepsclinc/added_tokens.json',
 '/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsMRIMistralb60stepsclinc/tokenizer.json')

In [None]:
#Reload Model
%%time
from transformers import AutoModelForCausalLM, AutoTokenizer

modelsaved = AutoModelForCausalLM.from_pretrained('/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsmisrral7bFC')
tokenizersaved = AutoTokenizer.from_pretrained('/content/drive/MyDrive/Colab Notebooks/Capstone2/classicsmisrral7bFC')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.


CPU times: user 17.7 s, sys: 5.96 s, total: 23.7 s
Wall time: 1min 42s


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

In [None]:
# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Prepare the inputs for the model
inputs = tokenizer(
    [
        alpaca_prompt.format(
            "1.No evidence of an acute or new lesions or any detectable abnormal enhancement.2.Revisualization of stable few subcortical and periventricular flair hyperintensity without enhancement since prior exam.3.Previously reported flair hyperintensity in bilateral middle cerebellar peduncles are significantly less conspicuous and demonstrate no enhancement.",
            "1.Stable few subcortical and periventricular foci of flair hyperintensity in bilateral cerebral hemispheres.2.No evidence of any new lesions or any detectable abnormal enhancement.3.Previously noted lesions within bilateral middle cerebellar peduncles are significantly less conspicuous on current the study and there is no evidence of any new or enhancing lesions in the posterior fossa.",  # input
            ""  # output - leave this blank for generation!
        )
    ],
    return_tensors="pt"
).to("cuda")

# Generate the outputs
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)

# Decode the outputs
generated_texts = tokenizer.batch_decode(outputs)

# Print the generated text
for text in generated_texts:
    print(text)

<s> 

### Input:
1.No evidence of an acute or new lesions or any detectable abnormal enhancement.2.Revisualization of stable few subcortical and periventricular flair hyperintensity without enhancement since prior exam.3.Previously reported flair hyperintensity in bilateral middle cerebellar peduncles are significantly less conspicuous and demonstrate no enhancement.

### Input2:
1.Stable few subcortical and periventricular foci of flair hyperintensity in bilateral cerebral hemispheres.2.No evidence of any new lesions or any detectable abnormal enhancement.3.Previously noted lesions within bilateral middle cerebellar peduncles are significantly less conspicuous on current the study and there is no evidence of any new or enhancing lesions in the posterior fossa.

### Response:
5.0</s>


# Generate Factual Correctness for top 5 model results

In [None]:
%%time
import json
import torch
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
#tokenizer = AutoTokenizer.from_pretrained('your-model-path')
#model = AutoModelForCausalLM.from_pretrained('your-model-path')

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Prepare the prompt template
alpaca_prompt = "\n\n### Input:\n{}\n\n### Input2:\n{}\n\n### Response:\n"

# Load the JSON file
with open('/content/drive/MyDrive/Colab Notebooks/Capstone2/FinalResults/results_mistral 22b.json', 'r') as f:
    test_dataset = json.load(f)

# List to store the results
results = []

# Iterate over the test_dataset with tqdm for progress bar
for entry in tqdm(test_dataset, desc="Processing Entries", unit="entry"):
    actual_impression = entry['actual_impression']
    predicted_impression = entry['predicted_impression']


    # Prepare the inputs for the model
    inputs = tokenizer(
        [
            alpaca_prompt.format(
                actual_impression,
                predicted_impression,
                ""  # output - leave this blank for generation!
            )
        ],
        return_tensors="pt"
    ).to("cuda")

    # Generate the outputs
    outputs = model.generate(**inputs, max_new_tokens=100, use_cache=True)

    # Decode the outputs
    generated_texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    # Extract the factual correction
    factual_correction = generated_texts[0].split("### Response:\n")[-1].strip()

    # Store the results
    results.append({
        'clinical_information': entry['clinical_information'],
        'findings': entry['findings'],
        'predicted_impression': entry['predicted_impression'],
        'actual_impression': entry['actual_impression'],
        'factual_correction': factual_correction
    })

# Save the results to a JSON file
with open('slothMistral_22b_factual_correctness.json', 'w') as f:
    json.dump(results, f, indent=4)

print("Results saved to slothMistral_22b_factual_correctness.json")


Processing Entries: 100%|██████████| 2749/2749 [29:06<00:00,  1.57entry/s]

Results saved to slothMistral_22b_factual_correctness.json
CPU times: user 26min 27s, sys: 1min 30s, total: 27min 57s
Wall time: 29min 8s





In [None]:
%%time
import json
import torch
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
#tokenizer = AutoTokenizer.from_pretrained('your-model-path')
#model = AutoModelForCausalLM.from_pretrained('your-model-path')

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Prepare the prompt template
alpaca_prompt = "\n\n### Input:\n{}\n\n### Input2:\n{}\n\n### Response:\n"

# Load the JSON file
with open('/content/drive/MyDrive/Colab Notebooks/Capstone2/FinalResults/slothMistral_7b_2ndattempt.json', 'r') as f:
    test_dataset = json.load(f)

# List to store the results
results = []

# Iterate over the test_dataset with tqdm for progress bar
for entry in tqdm(test_dataset, desc="Processing Entries", unit="entry"):
    actual_impression = entry['actual_impression']
    predicted_impression = entry['predicted_impression']


    # Prepare the inputs for the model
    inputs = tokenizer(
        [
            alpaca_prompt.format(
                actual_impression,
                predicted_impression,
                ""  # output - leave this blank for generation!
            )
        ],
        return_tensors="pt"
    ).to("cuda")

    # Generate the outputs
    outputs = model.generate(**inputs, max_new_tokens=100, use_cache=True)

    # Decode the outputs
    generated_texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    # Extract the factual correction
    factual_correction = generated_texts[0].split("### Response:\n")[-1].strip()

    # Store the results
    results.append({
        'clinical_information': entry['clinical_information'],
        'findings': entry['findings'],
        'predicted_impression': entry['predicted_impression'],
        'actual_impression': entry['actual_impression'],
        'factual_correction': factual_correction
    })

# Save the results to a JSON file
with open('slothMistral_7b_factual_correctness.json', 'w') as f:
    json.dump(results, f, indent=4)

print("Results saved to slothMistral_7b_factual_correctness.json")


Processing Entries: 100%|██████████| 2749/2749 [29:06<00:00,  1.57entry/s]

Results saved to slothMistral_7b_factual_correctness.json
CPU times: user 26min 5s, sys: 1min 28s, total: 27min 34s
Wall time: 29min 8s





In [None]:
%%time
import json
import torch
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
#tokenizer = AutoTokenizer.from_pretrained('your-model-path')
#model = AutoModelForCausalLM.from_pretrained('your-model-path')

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Prepare the prompt template
alpaca_prompt = "\n\n### Input:\n{}\n\n### Input2:\n{}\n\n### Response:\n"

# Load the JSON file
with open('/content/drive/MyDrive/Colab Notebooks/Capstone2/FinalResults/llama8b_2ndattempt_results.json', 'r') as f:
    test_dataset = json.load(f)

# List to store the results
results = []

# Iterate over the test_dataset with tqdm for progress bar
for entry in tqdm(test_dataset, desc="Processing Entries", unit="entry"):
    actual_impression = entry['actual_impression']
    predicted_impression = entry['predicted_impression']


    # Prepare the inputs for the model
    inputs = tokenizer(
        [
            alpaca_prompt.format(
                actual_impression,
                predicted_impression,
                ""  # output - leave this blank for generation!
            )
        ],
        return_tensors="pt"
    ).to("cuda")

    # Generate the outputs
    outputs = model.generate(**inputs, max_new_tokens=100, use_cache=True)

    # Decode the outputs
    generated_texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    # Extract the factual correction
    factual_correction = generated_texts[0].split("### Response:\n")[-1].strip()

    # Store the results
    results.append({
        'clinical_information': entry['clinical_information'],
        'findings': entry['findings'],
        'predicted_impression': entry['predicted_impression'],
        'actual_impression': entry['actual_impression'],
        'factual_correction': factual_correction
    })

# Save the results to a JSON file
with open('slothLlamal_8b_factual_correctness.json', 'w') as f:
    json.dump(results, f, indent=4)

print("Results saved to slothLlamal_8b_factual_correctness.json")


Processing Entries: 100%|██████████| 2749/2749 [28:27<00:00,  1.61entry/s]

Results saved to slothLlamal_8b_factual_correctness.json
CPU times: user 26min 3s, sys: 1min 37s, total: 27min 41s
Wall time: 28min 27s





In [None]:
%%time
import json
import torch
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
#tokenizer = AutoTokenizer.from_pretrained('your-model-path')
#model = AutoModelForCausalLM.from_pretrained('your-model-path')

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Prepare the prompt template
alpaca_prompt = "\n\n### Input:\n{}\n\n### Input2:\n{}\n\n### Response:\n"

# Load the JSON file
with open('/content/drive/MyDrive/Colab Notebooks/Capstone2/FinalResults/slothllama_1b_2ndattempt.json', 'r') as f:
    test_dataset = json.load(f)

# List to store the results
results = []

# Iterate over the test_dataset with tqdm for progress bar
for entry in tqdm(test_dataset, desc="Processing Entries", unit="entry"):
    actual_impression = entry['actual_impression']
    predicted_impression = entry['predicted_impression']


    # Prepare the inputs for the model
    inputs = tokenizer(
        [
            alpaca_prompt.format(
                actual_impression,
                predicted_impression,
                ""  # output - leave this blank for generation!
            )
        ],
        return_tensors="pt"
    ).to("cuda")

    # Generate the outputs
    outputs = model.generate(**inputs, max_new_tokens=100, use_cache=True)

    # Decode the outputs
    generated_texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    # Extract the factual correction
    factual_correction = generated_texts[0].split("### Response:\n")[-1].strip()

    # Store the results
    results.append({
        'clinical_information': entry['clinical_information'],
        'findings': entry['findings'],
        'predicted_impression': entry['predicted_impression'],
        'actual_impression': entry['actual_impression'],
        'factual_correction': factual_correction
    })

# Save the results to a JSON file
with open('slothLlamal_1b_factual_correctness.json', 'w') as f:
    json.dump(results, f, indent=4)

print("Results saved to slothLlamal_1b_factual_correctness.json")


Processing Entries: 100%|██████████| 2749/2749 [28:18<00:00,  1.62entry/s]

Results saved to slothLlamal_1b_factual_correctness.json
CPU times: user 26min 9s, sys: 1min 33s, total: 27min 43s
Wall time: 28min 19s





In [None]:
%%time
import json
import torch
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
#tokenizer = AutoTokenizer.from_pretrained('your-model-path')
#model = AutoModelForCausalLM.from_pretrained('your-model-path')

# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Prepare the prompt template
alpaca_prompt = "\n\n### Input:\n{}\n\n### Input2:\n{}\n\n### Response:\n"

# Load the JSON file
with open('/content/drive/MyDrive/Colab Notebooks/Capstone2/FinalResults/slothllama_3b_2ndattempt.json', 'r') as f:
    test_dataset = json.load(f)

# List to store the results
results = []

# Iterate over the test_dataset with tqdm for progress bar
for entry in tqdm(test_dataset, desc="Processing Entries", unit="entry"):
    actual_impression = entry['actual_impression']
    predicted_impression = entry['predicted_impression']


    # Prepare the inputs for the model
    inputs = tokenizer(
        [
            alpaca_prompt.format(
                actual_impression,
                predicted_impression,
                ""  # output - leave this blank for generation!
            )
        ],
        return_tensors="pt"
    ).to("cuda")

    # Generate the outputs
    outputs = model.generate(**inputs, max_new_tokens=100, use_cache=True)

    # Decode the outputs
    generated_texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    # Extract the factual correction
    factual_correction = generated_texts[0].split("### Response:\n")[-1].strip()

    # Store the results
    results.append({
        'clinical_information': entry['clinical_information'],
        'findings': entry['findings'],
        'predicted_impression': entry['predicted_impression'],
        'actual_impression': entry['actual_impression'],
        'factual_correction': factual_correction
    })

# Save the results to a JSON file
with open('slothLlamal_3b_factual_correctness.json', 'w') as f:
    json.dump(results, f, indent=4)

print("Results saved to slothLlamal_3b_factual_correctness.json")


Processing Entries: 100%|██████████| 2749/2749 [28:33<00:00,  1.60entry/s]

Results saved to slothLlamal_3b_factual_correctness.json
CPU times: user 26min 16s, sys: 1min 33s, total: 27min 50s
Wall time: 28min 34s



