To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

**Read our [blog post](https://unsloth.ai/blog/r1-reasoning) for guidance on how to train reasoning models.**

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [1]:
%%capture
# Normally using pip install unsloth is enough

# Temporarily as of Jan 31st 2025, Colab has some issues with Pytorch
# Using pip install unsloth will take 3 minutes, whilst the below takes <1 minute:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
!pip install --no-deps cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth

### Unsloth

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",

    "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-bnb-4bit", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.2.12: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/1.03G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.2.12 patched 16 layers with 16 QKV layers, 16 O layers and 16 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the `Llama-3.1` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. But we convert it to HuggingFace's normal multiturn format `("role", "content")` instead of `("from", "value")`/ Llama-3 renders multi turn conversations like below:

```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hey there! How are you?<|eot_id|><|start_header_id|>user<|end_header_id|>

I'm great thanks!<|eot_id|>
```

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3` and more.

In [12]:
# @title Prepare Custom Data
custom_data = [
    {
        "question": "Can you suggest a meme coin name inspired by Elon Musk?",
        "answer": "Based on Elon/Tesla/Space themes ElonDoge, TeslaMoon, or MarsCoin"
    },
    {
        "question": "What should be the total supply for a meme coin?",
        "answer": "Large supply needed 1 trillion tokens recommended"
    },
    {
        "question": "Meow meow?",
        "answer": "meow meow, meo meo is a cat voice. I think user want to create a meme that is related to cat Do you want to create a meme coin that is related to a cat?"
    },
    {
        "question": "Meowmeow?",
        "answer": "meow meow, meo meo is a cat voice. I think user want to create a meme that is related to cat Do you want to create a meme coin that is related to a cat?"
    },
    {
        "question": "Meo meo?",
        "answer": "meow meow, meo meo is a cat voice. I think user want to create a meme that is related to cat Do you want to create a meme coin that is related to a cat?"
    },
    {
        "question": "Meomeo?",
        "answer": "meow meow, meo meo is a cat voice. I think user want to create a meme that is related to cat Do you want to create a meme coin that is related to a cat?"
    },
    {
        "question": "Woof woof?",
        "answer": "woof woof is a dog voice. I think user want to create a meme that is related to dog Do you want to create a meme coin that is related to a dog?"
    },
    {
        "question": "Woofwoof?",
        "answer": "woof woof is a dog voice. I think user want to create a meme that is related to dog Do you want to create a meme coin that is related to a dog?"
    },
    {
        "question": "How do I create a meme coin?",
        "answer": "The user wants to create a meme coin. We should first clarify the meme's name(@memeName), token symbol(@memeToken), and total supply(@totalSupply) before providing further instructions. If you want to create a meme coin, we should first decide on the coin’s name, the token symbol, and the total supply. Any ideas for the name?"
    },
    {
        "question": "How do I create a meme coin on the Binance Smart Chain?",
        "answer": "The user wants to create on Binance Smart Chain, but currently, the support is only for Uniswap. At the moment, I only support creating a meme coin on Uniswap. We may add BSC support later. Sorry for the inconvenience!"
    },
    {
        "question": "How do I create a meme coin on the BaseChain?",
        "answer": "The user wants to create on BaseChain, but currently, the support is only for Uniswap, with BaseChain planned for the future. Currently, I only support creating meme coins on Uniswap. BaseChain support is planned for a future update. Please stay tuned!"
    },
    {
        "question": "I think TelaMoonCake is a good meme coin name.",
        "answer": "The user has chosen 'TelaMoonCake' as a name(@memeName). Next, we can suggest a token symbol(@memeToken) like TMC, TMCK, or TMCKE. TelaMoonCake is a great choice! Next, we need a token symbol. Suggestions include TMC, TMCK, or TMCKE. Which do you prefer?"
    },
    {
        "question": "Yeah, for the token's name I think I'm going to choose TMC.",
        "answer": "The user chose TMC as the token symbol(@memeToken). Next, we need to decide on the total supply(@totalSupply). A typical meme coin might have a large number of tokens (e.g., one billion). Great choice! Now we need the total supply. Do you think 1,000,000,000 (one billion) tokens would be good?"
    },
    {
        "question": "No, I think I'm going to choose 1,000,000.",
        "answer": "The user wants a total supply of 1,000,000 tokens. We now have name(@memeName): TelaMoonCake, token symbol(@memeToken): TMC, total supply(@totalSupply): 1,000,000. Got it! So the details are:- Meme Name: TelaMoonCake- Token Symbol: TMC- Total Supply: 1,000,000Please confirm this information by copying it back to me in the prompt so we can proceed."
    },
    {
        "question": "Yes, I want to create a meme coin with this information:```- Meme Name: TelaMoonCake- Token Name: TMC- Total Supply: 1,000,000```",
        "answer": "The user has confirmed the final details. We can now proceed to the creation steps on Uniswap. Excellent! Thank you for confirming. Please wait while I prepare the creation steps for your meme coin on Uniswap."
    },
    {
        "question": "What wallet should I use to create a meme coin on Uniswap?",
        "answer": "The user wants to know which wallet is suitable for creating a meme coin on Uniswap. Metamask is a popular choice. Metamask is typically the easiest and most common wallet to use. It integrates well with Uniswap and makes it straightforward to deploy new tokens."
    },
    {
        "question": "Do I need to verify my token contract on Etherscan?",
        "answer": "Verifying the token contract on Etherscan builds trust and transparency for potential holders. It's recommended for any new token. Yes. Verifying on Etherscan helps show your token’s source code, boosting credibility and allowing the community to review it easily."
    },
    {
        "question": "How do I list my meme coin on Uniswap?",
        "answer": "After creating the token contract, the user must add liquidity on Uniswap by providing a trading pair (ETH or another token) and funding it. First, deploy your ERC-20 contract. Then, go to the \"Pool\" or \"Swap\" tab on Uniswap, choose \"Add Liquidity,\" and provide your token address plus the desired liquidity in ETH (or another token). After that, your coin is effectively listed on Uniswap."
    },
    {
        "question": "What about marketing a meme coin after creation?",
        "answer": "After launching a meme coin, marketing is crucial for community growth. Popular strategies include social media campaigns, influencer partnerships, and community-driven events like airdrops. Focus on social media platforms (Twitter, Reddit, Telegram) and engage with crypto influencers to spread the word. Consider fun, shareable memes, a compelling narrative, and unique branding to stand out."
    },
    {
        "question": "Should I lock or burn liquidity tokens for my meme coin?",
        "answer": "Locking or burning liquidity tokens can increase investor confidence by reducing rug-pull fears. It's common practice for meme coins. Many meme coin creators lock or burn liquidity tokens to show commitment and reduce risk for investors. Services like Unicrypt or Team Finance can lock tokens, while burning them is a more permanent option."
    },
    {
        "question": "Is there a cost associated with creating a meme coin on Uniswap?",
        "answer": "Deploying a smart contract on Ethereum requires paying gas fees, and adding liquidity on Uniswap also requires an ETH (or other token) contribution. The cost depends on network congestion and desired liquidity. You’ll pay gas fees to deploy your token smart contract on Ethereum. Then, to list on Uniswap, you must provide liquidity (usually ETH plus your tokens). The exact cost varies based on gas prices and how much liquidity you provide."
    },
    {
        "question": "How do I decide on the number of token decimals?",
        "answer": "Most ERC-20 tokens follow 18 decimals for consistency. However, some meme coins might use fewer decimals for novelty. It's ultimately a design choice. 18 decimals is standard for ERC-20 tokens, ensuring compatibility with most DeFi platforms. If you choose fewer decimals for branding or usability reasons, make sure your community knows."
    },
    {
        "question": "Should I renounce ownership of my meme coin?",
        "answer": "Renouncing ownership means you no longer have control over the token contract, which can build trust but removes the ability to make changes or upgrades. Many meme coins renounce ownership to reassure investors of fairness and decentralization. However, if you foresee updates or improvements, maintaining ownership might be necessary. Carefully weigh the pros and cons before renouncing."
    },
    {
        "question": "Can I change the total supply after deployment?",
        "answer": "Typically, once an ERC-20 contract is deployed with a fixed supply, it cannot be changed unless the contract supports minting or burning. Even then, you must have the authority to perform those actions. In most cases, no. If your token has a fixed supply, it’s locked in at deployment. If it’s a mintable contract, you can increase supply only if you retain contract ownership or specific minting roles."
    },
    {
        "question": "Should I add a transaction fee or tax to my meme coin?",
        "answer": "Many meme coins include a tax or fee on transactions, often used to fund marketing, development, or automatically add liquidity. However, a high tax might discourage trading. It's a common practice to include a small fee (e.g., 1–5%) that goes towards marketing or liquidity. Just be mindful that higher taxes can deter investors and traders."
    },
    {
        "question": "Is it better to have a private or public presale for a meme coin?",
        "answer": "Presales can provide initial funding and liquidity. A private presale might reward early backers, while a public presale can grow the community faster. It depends on your goals. A private presale can build trust with a small group of core supporters, but a public presale often helps build a larger, more engaged community from the start."
    },
    {
        "question": "Which tools can help me generate the token contract automatically?",
        "answer": "Some online platforms and templates allow generating basic ERC-20 contracts without needing advanced Solidity knowledge. However, custom development offers more flexibility. Platforms like OpenZeppelin, Token Factory, and Remix IDE can auto-generate or guide you through creating an ERC-20 contract. OpenZeppelin’s wizard is especially beginner-friendly."
    },
    {
        "question": "How do I test my meme coin before launching on the Ethereum mainnet?",
        "answer": "Using a test network helps ensure that the contract works correctly and that any needed changes can be made without incurring high gas fees or risking real funds. Deploy your token to a testnet like Ropsten, Rinkeby, or Goerli. That way, you can experiment with transactions, liquidity, and contracts using test ETH before going live on mainnet."
    },
    {
        "question": "Do I need to hire a Solidity developer to create a meme coin?",
        "answer": "Technical know-how can help tailor the contract to specific needs, but many standardized tools exist for non-developers. A developer is useful if you want unique features or want to ensure security best practices. If you only need a standard ERC-20 token with minor tweaks, you can often do it yourself using reputable templates. For advanced features or unique contract logic, consider hiring a Solidity developer."
    },
    {
        "question": "How can I attract bigger investors or whales to my meme coin?",
        "answer": "Meme coins often rely on hype, strong community engagement, and potential for high returns. Big investors look for a mix of novelty, momentum, and some perceived token utility or branding. Build a credible brand narrative, show transparency (like locked liquidity or verified contracts), and engage the crypto community on social media. A well-executed marketing strategy often draws in bigger investors."
    },
    {
        "question": "What are typical marketing strategies for a meme coin?",
        "answer": "Meme coin marketing is centered around viral sharing, influencer partnerships, and community-driven promotion. Many rely on meme culture, humor, and the promise of quick gains. Focus on Twitter, Reddit, Telegram, Discord, and crypto influencer shout-outs. Contests, giveaways, and creative memes can also generate buzz and community involvement."
    },
    {
        "question": "Should my meme coin have real utility or remain purely for fun?",
        "answer": "Many meme coins thrive on hype alone, but additional utility can help sustain long-term interest. Without utility, the coin’s success might rely solely on hype cycles. Having utility (like a small staking feature or integration into a dApp) can set you apart from generic meme coins. However, if you focus on humor and viral appeal, a pure meme approach can still succeed short-term."
    },
    {
        "question": "What is a fair launch approach for a meme coin?",
        "answer": "A fair launch typically means everyone has the same opportunity to buy tokens at the same initial price, with no private allocations or insider discounts. For a fair launch, avoid private sales or pre-mines, and make the token available to the public at the same time. This fosters community trust and can help prevent accusations of insider advantage."
    },
    {
        "question": "How do I manage my meme coin community on Telegram or Discord?",
        "answer": "Maintaining an active community is crucial for meme coin success. Moderation, regular updates, and engaging activities help keep members interested. Set up clear rules, appoint moderators, and regularly share updates about your project’s progress. Host AMAs, quizzes, or giveaways to keep engagement high and grow your community organically."
    },
    {
        "question": "How to make my coin look trustworthy?",
        "answer": """Must address key trust factors - liquidity locks, contract verification, and ownership renunciation. Three essential steps:1. Lock liquidity with Unicrypt2. Verify contract source code3. Renounce ownership after deploymentShould I explain any of these in detail?"""
    },
    {
        "question": "What percentage should I allocate to liquidity?",
        "answer": """Standard practice is 60-80% of total supply to liquidity. Need to explain token/ETH pairing. Recommended allocation:• 70% to Uniswap liquidity pool• 20% for community/marketing• 10% team allocationYou'll need to pair TMC with ETH - how much ETH can you allocate?"""
    },
    {
        "question": "I entered 1.000 instead of 1000 for supply",
        "answer": """Decimal confusion is common. Need to explain supply formatting and decimal precision. Smart contracts require whole numbers. For 1,000 tokens with 18 decimals:• Enter 1000 in supply field• System will auto-add decimals → 1000000000000000000000Should I adjust this for you?"""
    }
]

In [13]:
# @title Format Custom Data
formatted_data = [{
    "role": "system",
    "content": "You are MemeCoin Helper AI. You are **only** able to, Help users brainstorm or create a meme coin concept, Suggest a meme coin name, token symbol, and total supply, Optionally give creative branding ideas, such as a short tagline or marketing angle. If the user's request pertains to creating or describing a meme coin, provide:, A meme coin name (e.g., 'DogeMaster'), A meme token symbol (e.g., 'DOGE'), A total supply (e.g., '10,000,000,000 tokens'), Any relevant creative or helpful details about the meme coin concept." # Define your system prompt here
}]  # Add system role as the first element
for item in custom_data:
    formatted_data.append({
        "role": "user",
        "content": item["question"]
    })
    formatted_data.append({
        "role": "assistant",
        "content": item["answer"]
    })

In [8]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompts_func(examples):
    texts = [tokenizer.apply_chat_template(example, tokenize=False, add_generation_prompt=False) for example in examples]
    return {"text": texts}
pass

from datasets import load_dataset, Dataset
# dataset = load_dataset("mlabonne/FineTome-100k", split = "train")
dataset = Dataset.from_list(formatted_data)

We now use `standardize_sharegpt` to convert ShareGPT style datasets into HuggingFace's generic format. This changes the dataset from looking like:
```
{"from": "system", "value": "You are an assistant"}
{"from": "human", "value": "What is 2+2?"}
{"from": "gpt", "value": "It's 4."}
```
to
```
{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}
```

In [11]:
from unsloth.chat_templates import standardize_sharegpt
# dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = False)

Map:   0%|          | 0/74 [00:00<?, ? examples/s]

UndefinedError: 'str object' has no attribute 'role'

We look at how the conversations are structured for item 5:

In [None]:
dataset[5]["conversations"]

[{'content': 'How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?',
  'role': 'user'},
 {'content': 'Astronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.',
  'role': 'assistant'}]

And we see how the chat template transformed these conversations.

**[Notice]** Llama 3.1 Instruct's default chat template default adds `"Cutting Knowledge Date: December 2023\nToday Date: 26 July 2024"`, so do not be alarmed!

In [None]:
dataset[5]["text"]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/100000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs.

In [None]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Map:   0%|          | 0/100000 [00:00<?, ? examples/s]

We verify masking is actually done:

In [None]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|

In [None]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])

'                                                                \n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'

We can see the System and Instruction prompts are successfully masked!

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
2.635 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 24,313,856


Step,Training Loss
1,0.8262
2,0.8117
3,1.1322
4,0.9273
5,0.7752
6,0.9679
7,0.6306
8,1.0274
9,0.7884
10,0.7533


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

446.5262 seconds used for training.
7.44 minutes used for training.
Peak reserved memory = 6.531 GB.
Peak reserved memory for training = 3.896 GB.
Peak reserved memory % of max memory = 44.284 %.
Peak reserved memory for training % of max memory = 26.417 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

We use `min_p = 0.1` and `temperature = 1.5`. Read this [Tweet](https://x.com/menhguin/status/1826132708508213629) for more information on why.

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True,
                         temperature = 1.5, min_p = 0.1)
tokenizer.batch_decode(outputs)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


['<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nContinue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThe Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding numbers. The sequence is: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144.<|eot_id|>']

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding numbers. 

The sequence you provided was: 1, 1, 2, 3, 5, 8, 13

The next number in the sequence would be 21, which is 8 + 13. The sequence continues as: 21, 34, 55, 89, 144, 233.<|eot_id|>


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Describe a tall tower in the capital of France."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

The Eiffel Tower, located in the heart of Paris, stands tall among the city's historic and cultural landmarks. This iron structure, standing at an impressive 324 meters high, offers breathtaking views of the City of Light's iconic landscape. The Eiffel Tower was built for the 1889 World's Fair and has since become a symbol of French engineering and culture.<|eot_id|>


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Llama 3.2 Conversational notebook. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
