<a href="https://colab.research.google.com/github/hululuzhu/chinese-ai-writing-share/blob/main/further_finetune_example/gemma_lora_finetune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gemma + LoRA = Finetune on Consumer level GPU
- branched from my own [llama finetune repo](https://github.com/hululuzhu/llama-lora-chinese-couplet)
- Last update: 03/02/2024
- Contact: hululu.zhu@gmail.com
- 03/02: Fix the IT version of Gemma tokens (e.g. start/end of user/model), correct format (the final new_line is also important)
  ```
  GEMMA_IT_FORMAT = """<start_of_turn>user
{user_question}<end_of_turn>
<start_of_turn>model
"""
  ```


Zero-shot Examples
- after 10 mins maybe 500+ examples, cap max tokens, greedy, already seems promising
- post-processing to match # of chinese chars
- ideally a well trained model will know end of sentence (eos) itself



## Prerequisites
- Nvidia GPU, check if 10G HBM (High Bandwidth Memory) ram available
- pip install software

In [1]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-784f4b93-126f-b9c0-5c89-f1f6683e910d)


In [2]:
!pip install -q nvidia-ml-py3
import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
# card id 0 hardcoded here, there is also a call to get all available card ids, so we could iterate
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
nvidia_smi.nvmlShutdown()

print("Total memory:", info.total)
print("Free memory:", info.free)
print("Used memory:", info.used)

assert info.free > 1e10, (
    "Looks like your GPU is busy or not having enough 10G memory to continue")

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for nvidia-ml-py3 (setup.py) ... [?25l[?25hdone
Total memory: 16106127360
Free memory: 15835529216
Used memory: 270598144


In [3]:
# As of 02/21/2024, we need latest transformer to pick up Gemma tokenizer
# Note we might need restart the instance to pick up the changes
!pip install -q git+https://github.com/huggingface/transformers > /dev/null
!pip install -q bitsandbytes > /dev/null
!pip install -q datasets loralib sentencepiece > /dev/null
!pip install -q peft > /dev/null

## All the Imports

In [4]:
# disable warnings unless needed
import warnings
warnings.filterwarnings('ignore')

In [5]:
from datasets import Dataset, load_dataset
import numpy as np
import os
import pandas as pd
import pathlib
from peft import PeftModel, get_peft_config, get_peft_model, LoraConfig, TaskType, prepare_model_for_int8_training
import pickle
import sys
import torch
import transformers
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig, AutoModelForSeq2SeqLM, DataCollatorForLanguageModeling

## Define top-level configs

In [None]:
# User: <start_of_turn>user
# Knock knock.<end_of_turn>
# <start_of_turn>model
# Model: Who’s there?<end_of_turn>model
# User: <start_of_turn>user
# Gemma.<end_of_turn>
# <start_of_turn>model
# Model: Gemma who?<end_of_turn>model


In [6]:
# Select your model
# MODEL = "llama-1-7b"  #@param ["llama-1-7b", "llama-2-7b", "llama-2-7b-chat"]
MODEL = "gg-hf/gemma-7b-it"
model_name_or_path = MODEL
tokenizer_name_or_path = MODEL

# Max num of tokens (including prompt and output), chinese encoding takes more
# than # of chars as observed
CUTOFF_LEN = 96
# Predict training prompt as well to increase quality as Alpaca Lora does.
# Turn off to speedup, but might affect quality.
TRAIN_ON_INPUT = True

In [7]:
# Required to login to get gemma checkpoints
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) Y
Token is valid (permission: read).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store

In [8]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Somehow Gemma takes way more memory than llama2, why why?
quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it",
                                            #  device_map="auto",
                                            #  load_in_8bit=True, # 8bit seems to take more than 9G memory...
                                            quantization_config=quantization_config,)


tokenizer_config.json:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/888 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/694 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/2.11G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [35]:
GEMMA_IT_FORMAT = """<start_of_turn>user
{user_q}<end_of_turn>
<start_of_turn>model
"""

print(GEMMA_IT_FORMAT.format(user_q='Hello world?'))
tokenizer.encode(GEMMA_IT_FORMAT.format(user_q='Hello world?'))

<start_of_turn>user
Hello world?<end_of_turn>
<start_of_turn>model



[2, 106, 1645, 108, 4521, 2134, 235336, 107, 108, 106, 2516, 108]

In [75]:
# Check out meaning of the Chinese char using ChatGPT
def quick_inference(my_model, tokenizer, user_q):
  print(user_q)
  batch = tokenizer(
      GEMMA_IT_FORMAT.format(user_q=user_q),
      return_tensors='pt',
  )
  q_len = batch['input_ids'].shape[1]
  with torch.cuda.amp.autocast(): # required for mixed precisions
    output_tokens = my_model.generate(
        **batch, max_new_tokens=q_len)
  out = tokenizer.decode(output_tokens[0][q_len:], skip_special_tokens=False)
  print(out)
  # print()

In [59]:
quick_inference(model, tokenizer, user_q="How does the brain work?")

The brain is a complex organ that is responsible for controlling all of the functions


In [60]:
quick_inference(model, tokenizer, user_q="对联：闪电")

闪电在天空舞动，
光华洒在山头


## Load Training data

In [62]:
# Reuse my T5 couplet data code https://github.com/hululuzhu/chinese-ai-writing-share/blob/main/training/t5_finetune/Mengzi_T5_Finetune_Chinese_Couplet_V1.ipynb
working_dir = "/tmp/working_dir"
!mkdir -p {working_dir}
!wget https://github.com/wb14123/couplet-dataset/releases/download/1.0/couplet.tar.gz -P {working_dir}
!ls -l {working_dir}
!mkdir -p {working_dir}/couplet_files
!tar -xf {working_dir}/couplet.tar.gz -C {working_dir}/couplet_files
!head -1 {working_dir}/couplet_files/couplet/train/in.txt {working_dir}/couplet_files/couplet/train/out.txt

COUPLET_PATH = f'{working_dir}/couplet_files/couplet'
MAX_SEQ_LEN = 32  # Max 32 chinese char including punctuation marks

train_df, test_df = None, None
for t in ['train', 'test']:
  ins, outs = [], []
  for i in ['in', 'out']:
    with open(f"{COUPLET_PATH}/{t}/{i}.txt", "r") as f:
      for line in f:
        clean_line = line.strip().replace(' ', '').replace('\n', '').replace('\r', '')[:MAX_SEQ_LEN]
        if i=='in':
          ins.append(clean_line)
        else:
          outs.append(clean_line)
  # The column names to match simpleT5
  data_dict = {
      'source_text': ins,
      'target_text': outs,
  }
  if t == 'train':
    train_df = pd.DataFrame(data_dict)
  else:
    test_df = pd.DataFrame(data_dict)

def get_gemma_prompts(df):
  raw_prompts = df['source_text'].tolist()
  gemma_prompts = []
  for p in raw_prompts:
    gemma_prompts.append(GEMMA_IT_FORMAT.format(user_q=p))
  return gemma_prompts

train_df['source_text'] = get_gemma_prompts(train_df)
test_df['source_text'] = get_gemma_prompts(test_df)

# COUPLET_PROMPOT = '对联：'
# COUPLET_SUFFIX = '\n下联：'
# train_df['source_text'] = COUPLET_PROMPOT + train_df['source_text'] + COUPLET_SUFFIX
# test_df['source_text'] = COUPLET_PROMPOT + test_df['source_text'] + COUPLET_SUFFIX

--2024-03-02 19:43:09--  https://github.com/wb14123/couplet-dataset/releases/download/1.0/couplet.tar.gz
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/122695108/9643dda6-194e-11e8-9642-44c7d57d40ac?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240302%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240302T194310Z&X-Amz-Expires=300&X-Amz-Signature=de183986510e09dd06cf3798bbfce4a8ce432f4c24096505ee3228cbfb121726&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=122695108&response-content-disposition=attachment%3B%20filename%3Dcouplet.tar.gz&response-content-type=application%2Foctet-stream [following]
--2024-03-02 19:43:10--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/122695108/9643dda6-194e-11e8-9642-44c7d57d40ac?X-Amz-

In [79]:
train_df['target_text'] += '<eos>'
test_df['target_text'] += '<eos>'

In [80]:
# Sample 5k
train_df_sample = train_df[['source_text', 'target_text']].sample(5000)
train_df_sample.sample(3)

Unnamed: 0,source_text,target_text
531154,<start_of_turn>user\n君子之交如水淡<end_of_turn>\n<st...,鸿鹄所向与天高<eos>
520618,<start_of_turn>user\n殚心力以报所知，一代长才出甘陇<end_of_tu...,处腊膏而不自润，千秋遗爱满邗江<eos>
644218,<start_of_turn>user\n落花声里春成冢<end_of_turn>\n<st...,离别梦中泪满襟<eos>


## Convert Data to Training-friendly DataSet

In [81]:
# Copied from Alpaca-LoRA, notice input_ids, attention_mask, and labels are
# default expected columns in huggingface dataset lib
def tokenize(tokenizer, prompt, cutoff_len, add_eos_token=False):
  # there's probably a way to do this with the tokenizer settings
  # but again, gotta move fast
  result = tokenizer(
      prompt,
      truncation=True,
      max_length=cutoff_len,
      padding=False,
      return_tensors=None,
  )
  if (
      result["input_ids"][-1] != tokenizer.eos_token_id
      and len(result["input_ids"]) < cutoff_len
      and add_eos_token
  ):
    result["input_ids"].append(tokenizer.eos_token_id)
    result["attention_mask"].append(1)

  # result["labels"] = copy.deepcopy(result["input_ids"])
  result["labels"] = result["input_ids"].copy()
  return result


# Branched from Alpaca-LoRA
def tokenize_fn(data_point):
  prompt_in, prompt_out = data_point['source_text'], data_point['target_text']
  full_prompt = prompt_in + prompt_out
  tokenized_full_prompt = tokenize(tokenizer, full_prompt, CUTOFF_LEN)
  if not TRAIN_ON_INPUT:
    user_prompt = prompt_in
    tokenized_user_prompt = tokenize(tokenizer, user_prompt, CUTOFF_LEN, add_eos_token=False)
    user_prompt_len = len(tokenized_user_prompt["input_ids"])
    tokenized_full_prompt["labels"] = [
        -100 # special id for skipping
    ] * user_prompt_len + tokenized_full_prompt["labels"][user_prompt_len:]
  return tokenized_full_prompt


train_ds = Dataset.from_pandas(train_df_sample)
train_ds = train_ds.flatten()
tokenized_train_ds = train_ds.map(
    tokenize_fn,
    remove_columns=['source_text', 'target_text', '__index_level_0__'],
)

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

In [82]:
# Optionally check a few examples by decoding the inputs
for i in range(100, 103):
  print("token length", len(tokenized_train_ds['input_ids'][i]))
  print(tokenizer.decode(tokenized_train_ds['input_ids'][i]))
  print("Label ids", tokenized_train_ds['labels'][i])
  print()

token length 35
<bos><start_of_turn>user
书法、美术、诗词，廉信古今秀<end_of_turn>
<start_of_turn>model
园林、馆舍、亭台，景观方圆稀<eos>
Label ids [2, 106, 1645, 108, 197290, 235394, 150823, 235394, 238143, 237379, 235365, 239751, 235851, 236194, 235811, 237213, 107, 108, 106, 2516, 108, 237156, 236234, 235394, 237463, 238564, 235394, 238806, 235945, 235365, 189844, 235576, 237869, 239396, 1]

token length 22
<bos><start_of_turn>user
燕栖柳绿三分暖<end_of_turn>
<start_of_turn>model
蝶弄桃红一段新<eos>
Label ids [2, 106, 1645, 108, 238596, 241247, 238110, 237631, 144157, 237649, 107, 108, 106, 2516, 108, 238807, 238613, 237692, 236479, 64438, 235630, 1]

token length 34
<bos><start_of_turn>user
苦累为消防，练成绝技能擒虎<end_of_turn>
<start_of_turn>model
光荣因奉献，捧出忠心可感天<eos>
Label ids [2, 106, 1645, 108, 236954, 238236, 235640, 99096, 235365, 237535, 235636, 237127, 41846, 242888, 237655, 107, 108, 106, 2516, 108, 235973, 238203, 235933, 238745, 237225, 235365, 240158, 235531, 238284, 235675, 235553, 235746, 235654, 1]



## LoRA setup
- Check out LoRA paper
- Why q_proj and v_proj?
  - ![](https://lh3.googleusercontent.com/pG1o98-ZuTdvGaIuf3r0_GQ2wqZv1eAjaM13ki_AoipSm4Vo0v3JCynmU26PjE_6qKvyLdiDlZfQP8mGpvy0hG6TMDM-ROpup35WYmH3lBiGVC67tQL2kZnNIVzz0gviU88lq7yP126N1DCaKxkvXd1vE5TBwasBTH2waI_QbcyT324snp5iOCJXDrMa9bPokbM4w8PwJL61lqfGXmOvbP2Yqo5gOC7kA73aZPMOG3CnWzFujZpbQ5so7ZnHNOvmBWSjKUHvqI8UbvJvAy43SXL2UePcFg-KcWAA9gCUscNxKOOtai0_6ShZgZLCXCMLvLYpbqK6IqtYTS7-dwQMYQrRJ80IyPxMYwfSLaYn2UVd0I04ETFCqOH-pDtsToZ3eCGqQi-zxLdcDUpqhcXxj60PjxpMOyFK_wCK1tKEx7hu7nUDR4GYIIRFtNZS_jMhFaKhqcZf6d3Vora-2v0Sv_CVhDTy5cabVpSDEqBWpGMiCcj5IvnBIRAkPY5D_Mr5elWSCuanOXMp9riwK2-WobJoNvW7qATFAr3aiTA5MCQPqwvkOXhpj9YF7QudshxaplDzpBiLxJbdvzE-froAlxAup2yDEhEOb_xuvRBLetvL366GOEivlq577Y8MTusVcz_b9ex6TP77_XjRHAp4lQ7Bs7tR2tjY-n29bC1MhGB_t7Ta82MdLivR-T5lG4hvhGJ-rTsqMkUm0KY-Vqup-04eZHBMkY1RHjj7oNc8vDXHbTiFskLfne5Trr0_3MCZamyRZuwPeZXzFlzbif1lSBwXpSk0ckzPMGRFhiDZ0sa3QUrLeyvGA5UzHIhqHL0Ve-f03V0z48o_YoHSdWrhN8xZJb6ga-eGu0MM9f5VxE7Y9znQ4qE9_5neS6GBHvA0-YXjzZ7INP9KVgKpX_FTuAuegL7ARB1gG4lbXKWVKQS38g=w1577-h337-s-no?authuser=0)
- why r=16 (>=8)
  - ![](https://lh3.googleusercontent.com/sxLGQpoBbmjnZwK853wFOcgEgzvJIa7wOpaH72v1eNw9gI9VaMvhpWGhzPCPowSuG44wzO53ENrXGMrdoXXhTjPmy1jRVvAMqbFYiwcCU4sZ0jqOe2vP1I9hEw-syKqpPW1-Nr5TM10Qm8MYXuigatFPNl76FSxYXBRHNcZRjeluGPxMjz78SXzBa07j6YomCGQJCyx5QTRVfhWw7iy4dbb1rybldeodUvY1xI8XzTzQeclYhE8kLI6yN7J02LKpkhHmzMgFY0Qr73gvoINGmZyguJItJ0ZcaR3zBJNfIIaBSaYe3amB4qL7zWu6sYOxfdBk8v_lWLCCTys1_ThFoiUhLDrHRK1LX5QELQTf_MAlFVk5qisF2dZo3GFEt5bKga2CNwSH-I5FjL9Z6jnopRDHCs1JGVmYn6sgdLhrG7fbj83hAb5NLwSfebi5pjYRASAVVC18hNo1ZkG_TCTPJv2__KveWNjalDkSWWEVzMZO6ZlnoLMtwEA_KfqaDNEdVTs2wa_-dsbXijPDkF0bSdiDqtiAe6Nk_sL1iEoNMswMvCGOD5orD4oojigkteh-xdyaQ3W0mEqC3HXEaPAAKHQOp2V5XBIMi-wIo_M-bjK304SA68jvmizpyYTI-yTiAb_B8lUR0PeAMp1avZux4NdXVZZu1wWlzgpr3HC7pU7ZmqgO_xJrU-ICMrkeqy8eYOy-jy1NrK03Y9NsT7-JTxg5HHGBptMKkJORT7IIQRl_eCgw0WMu9Bc9ueSIbSCLQgZ_WdaMe3wSjLkj8NmRgQ83HW686Ww54xfwscMq8l97MgaJobKqRvagOzx2KG_cMthGWfkIqVFCdfTgBv3c8Mf7lBDwduxsyfNbLuPSxW_NI4UmxN7Tkx5xN_qBBI5prltAYX_jYhFtq6JNd2bnWgqDBfDDkHp96Wj3kMRF43A8sw=w1804-h507-s-no?authuser=0)

In [68]:
model = prepare_model_for_int8_training(model)

config = LoraConfig(
    r=16,
    lora_alpha=32, # scaling param related to r, reuse alpaca-lora
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

lora_model = get_peft_model(model, config)

## Training
- Show before and after!

In [83]:
quick_inference(lora_model, tokenizer, user_q="How does the brain work?")

How does the brain work?
The brain is a complex organ that is responsible for controlling all of the functions


In [84]:
# Check out meaning of the Chinese char using ChatGPT
def eval_model(my_model, examples=["春风得意花铺路",
                                   "美丽中国魅力北京",
                                   "鱼书千里梦",
                                   "日落晚霞临古寺",]):
  for p_in in examples:
    quick_inference(my_model, tokenizer, p_in)

In [85]:
# Different GPUs may give out slightly different answers below due to very small precision difference
print("Before training")
eval_model(lora_model)

Before training
春风得意花铺路
春风得意花铺路，春暖花开，美不胜
美丽中国魅力北京
**美丽中国魅力北京，令人惊叹的现代与传统
鱼书千里梦
鱼书千里梦，是一个关于爱慕和成长的小说
日落晚霞临古寺
夕暮暮色，日落缓缓西下，将天空渲染成 Varies


In [87]:
torch.cuda.empty_cache()

In [88]:
# As you can tell, I even omitted eval_dataset for this demo :(
trainer = transformers.Trainer(
    model=lora_model,
    train_dataset=tokenized_train_ds,
    args=transformers.TrainingArguments(
        # increased batch size will significantly increase GPU requirement here
        # Decrease to 4 if you have less than 16G vram
        # Batch = 4, probably 8.3-8.8G vram
        # Batch = 16, 9.5G+
        # Batch = 32, 11G+
        # Batch = 64, 14G+
        per_device_train_batch_size=4,
        gradient_accumulation_steps=2,
        warmup_steps=8,
        num_train_epochs=2,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=20,
        output_dir='outputs',
        remove_unused_columns=False,
    ),
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True,
    ),
)
lora_model.config.use_cache = False # Alpaca Lora sets this for training
trainer.train()

Step,Training Loss
20,33.5093
40,14.7649
60,6.0399
80,5.2018
100,4.9142
120,4.7877
140,4.738


KeyboardInterrupt: 

In [89]:
# Empirical quick tests showed "somehow ok" results if loss < THRESHOLD
print("After training")
eval_model(lora_model)

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


After training
春风得意花铺路
秋雨流派诗作韵<eos>
美丽中国魅力北京
和谐社会和谐社会<eos>
鱼书千里梦
鸟歌百载歌<eos>
日落晚霞临古寺
月明清风送客船<eos>


## Suggested additional reading
- [Decoding algorithm by HF](https://huggingface.co/blog/how-to-generate)
- So far, I only demoed greedy search (output token with highest prob at each position without looking ahead)

## Optional: Upload to HuggingFace and share with the world!
- And you should!

In [90]:
# from huggingface_hub import notebook_login
# notebook_login()
# YOUR_HF_ID = "hululuzhu"
# lora_model.push_to_hub(f"{YOUR_HF_ID}/chinese-couplet-gemma-lora-test-v0.1",
#                        use_auth_token=True,
#                        create_pr=True)
# Go to huggingface and merge the PR to share with the world!

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

README.md:   0%|          | 0.00/752 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/25.7M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/hululuzhu/chinese-couplet-gemma-lora-test-v0.1/commit/5707739eed53dacba7925baf9ce196b3ef1c0954', commit_message='Upload model', commit_description='', oid='5707739eed53dacba7925baf9ce196b3ef1c0954', pr_url='https://huggingface.co/hululuzhu/chinese-couplet-gemma-lora-test-v0.1/discussions/2', pr_revision='refs/pr/2', pr_num=2)