# Getting Started with Fine-Tuning Moshi 7B

This notebook shows you a simple example of how to LoRA finetune Moshi 7B. You can run this notebook in Google Colab using a A100 GPU.

<a target="_blank" href="https://colab.research.google.com/github//kyutai-labs/moshi-finetune/blob/main/tutorials/moshi_finetune.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Check out `moshi-finetune` Github repo to learn more: https://github.com/kyutai-labs/moshi-finetune/


## Installation

Clone the `moshi-finetune` repo:


In [1]:
%cd /content/
!git clone https://github.com/kyutai-labs/moshi-finetune.git

/content
Cloning into 'moshi-finetune'...
remote: Enumerating objects: 245, done.[K
remote: Counting objects: 100% (55/55), done.[K
remote: Compressing objects: 100% (26/26), done.[K
remote: Total 245 (delta 36), reused 35 (delta 29), pack-reused 190 (from 1)[K
Receiving objects: 100% (245/245), 638.97 KiB | 14.52 MiB/s, done.
Resolving deltas: 100% (135/135), done.


Install all required dependencies:


In [None]:
%pip install -e /content/moshi-finetune

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m116.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m96.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m60.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading nvidia_cu

## Prepare dataset


In [None]:
# from pathlib import Path

# from huggingface_hub import snapshot_download

# Path("/content/data/daily-talk-contiguous").mkdir(parents=True, exist_ok=True)

# # Download the dataset
# local_dir = snapshot_download(
#     "kyutai/DailyTalkContiguous",
#     repo_type="dataset",
#     local_dir="/content/data/daily-talk-contiguous",
# )

# Delete any old directory.
!rm -rf sample_data
!rm -rf dailytalk_dataset

# Create new directories.
!mkdir dailytalk_dataset
!mkdir dailytalk_dataset/data_stereo

# Download jsonl file
!wget -q https://huggingface.co/datasets/kyutai/DailyTalkContiguous/resolve/main/dailytalk.jsonl -O dailytalk_dataset/dailytalk.jsonl

# Download file 1-100 for now.
base_url = "https://huggingface.co/datasets/kyutai/DailyTalkContiguous/blob/main/data_stereo/"
from tqdm.auto import tqdm
for i in tqdm(range(1, 10)):
    !wget -q {base_url}{i}.wav -O dailytalk_dataset/data_stereo/{i}.wav
    !wget -q {base_url}{i}.json -O dailytalk_dataset/data_stereo/{i}.json

## Start training


In [None]:
# these info is needed for training
import os

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [None]:
# define training configuration
# for your own use cases, you might want to change the data paths, model path, run_dir, and other hyperparameters
import yaml

config = """
# data
data:
  train_data: '/content/data/daily-talk-contiguous/dailytalk.jsonl' # Fill
  eval_data: '' # Optionally Fill
  shuffle: true

# model
moshi_paths:
  hf_repo_id: "kyutai/moshiko-pytorch-bf16"


full_finetuning: false # Activate lora.enable if partial finetuning
lora:
  enable: true
  rank: 128
  scaling: 2.
  ft_embed: false

# training hyperparameters
first_codebook_weight_multiplier: 100.
text_padding_weight: .5


# tokens per training steps = batch_size x num_GPUs x duration_sec
# we recommend a sequence duration of 300 seconds
# If you run into memory error, you can try reduce the sequence length
duration_sec: 100
batch_size: 1
max_steps: 300

gradient_checkpointing: true # Activate checkpointing of layers

# optim
optim:
  lr: 2.e-6
  weight_decay: 0.1
  pct_start: 0.05

# other
seed: 0
log_freq: 10
eval_freq: 1
do_eval: False
ckpt_freq: 10

save_adapters: True

run_dir: "/content/test"  # Fill
"""

# save the same file locally into the example.yaml file
with open("/content/example.yaml", "w") as file:
    yaml.dump(yaml.safe_load(config), file)

In [None]:
# make sure the run_dir has not been created before
# only run this when you ran torchrun previously and created the /content/test_ultra file
# ! rm -r /content/test

In [None]:
# start training

!cd /content/moshi-finetune && torchrun --nproc-per-node 1 -m train /content/example.yaml

## Inference

Once the model has been trained, inference can be run on the colab GPU too, and gradio can be used to tunnel the audio data from a local client to the notebook.

More details on how to set this up can be found in the [moshi readme](https://github.com/kyutai-labs/moshi?tab=readme-ov-file#python-pytorch).


In [None]:
!pip install gradio

In [None]:
!python -m moshi.server --gradio-tunnel --lora-weight=/content/test/checkpoints/checkpoint_000300/consolidated/lora.safetensors --config-path=/content/test/checkpoints/checkpoint_000300/consolidated/config.json