In [1]:
!nvidia-smi

Mon Aug 19 03:22:34 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10G                    On  |   00000000:00:1B.0 Off |                    0 |
|  0%   22C    P8             22W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A10G                    On  |   00

In [2]:
# %%capture
!pip install transformers datasets accelerate peft huggingface_hub hf_transfer flash-attn trl wandb -qU


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
from huggingface_hub import login

login(
  token="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", # ADD YOUR TOKEN HERE
)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /opt/app-root/src/.cache/huggingface/token
Login successful


In [4]:
import torch
from IPython.display import Markdown
from transformers import AutoTokenizer, AutoModelForCausalLM, EarlyStoppingCallback
from peft import LoraConfig, get_peft_model
from datasets import load_dataset 
from transformers import TrainingArguments
from trl import SFTTrainer

In [5]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Is Bfloat16 avaiable?: {torch.cuda.is_bf16_supported()}")

Is Bfloat16 avaiable?: True


### 1. Load model and tokenizer

In [6]:
model_name = "Deci/DeciLM-7B"

#### 1.1 Load model

In [7]:
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="flash_attention_2",
    trust_remote_code=True
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

#### 1.2 Load tokenizer

In [8]:
tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=model_name,
    padding_side="left",
    trust_remote_code=True
)

In [9]:
print(f"Vocabulary size of DeciLM7B: {len(tokenizer.get_vocab()):,}")

Vocabulary size of DeciLM7B: 32,000


In [10]:
tokenizer.special_tokens_map

{'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>'}

In [11]:
tokenizer.pad_token = tokenizer.unk_token

#### 1.3 Inferece test

In [12]:
generation_config = {
    "max_new_tokens": 100,
    "do_sample": True,
    "temperature": 1,
    "top_k": 100,
    "top_p":0.90,
    "pad_token_id": tokenizer.eos_token_id
}

In [13]:
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(text=input_text, return_tensors="pt").to(device)
outputs = model.generate(**input_ids, **generation_config)
Markdown(tokenizer.decode(token_ids=outputs[0], skip_special_tokens=True))

Write me a poem about Machine Learning. [/INST]
Answer: In the garden of technology, a seed was born
That would grow and flower, a budding Machine Learning torn
From petals of AI and logic trees
To data flows, code and memories

With algorithms and models, it spread its wings
To process images, speech and all manner of things
To analyze patterns, predict outcomes
To make decisions, with no human frowns

It learned to play chess and Go with ease

### 2. Train data

#### 2.1. Load data

In [14]:
dataset = load_dataset("b-mc2/sql-create-context", split="train")

#### 2.2 Split into test and val

In [15]:
train_test_split = dataset.train_test_split(test_size=100, seed=1399, shuffle=True)
train_data = train_test_split["train"].shuffle()
val_data = train_test_split["test"].shuffle()
print(len(train_data), len(val_data))

78477 100


In [16]:
torch.manual_seed(42)
sample = train_data[torch.randint(low=0, high=len(train_data), size=(1,)).item()]

#### 2.2 Testing baseline inference

In [17]:
template = "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.\n\n" + \
"You must output the SQL query that answers the question.\n\n" + \
"### Input:\n" + \
"```{question}```\n\n" + \
"### Context:\n" + \
"```{context}```\n\n"
# "### Response:\n" + \
# "```{response}```"

In [18]:
Markdown(template.format(question=sample["question"], context=sample["context"]))

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.

### Input:
```What country has a medical school established in 1969 with both an IMED and avicenna?```

### Context:
```CREATE TABLE table_name_15 (country_territory VARCHAR, imed_avicenna_listed VARCHAR, established VARCHAR)```



In [19]:
prompt = template.format(context=sample["context"], question=sample["question"])
input_ids = tokenizer(text=prompt, return_tensors="pt").to(device)
outputs = model.generate(**input_ids, **generation_config)

In [20]:
display(Markdown("#### Completion:"))
display(Markdown(tokenizer.decode(token_ids=outputs[0], skip_special_tokens=True).replace(prompt, "")))
display(Markdown("#### Answer:"))
Markdown(sample["answer"])

#### Completion:

```COPY table_name_15(country_territory,imed_avicenna_listed,established) FROM stdin CSV HEADER```

```cambridge,United Kingdom,true,true,1969```

```cambridge,United States,true,true,1951```

```madrid,Spain,true,true,19

#### Answer:

SELECT country_territory FROM table_name_15 WHERE imed_avicenna_listed = "both" AND established = 1969

#### 2.3 Creating template function

In [21]:
def formatting_func(example):
    template = "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.\n\n" + \
    "You must output the SQL query that answers the question.\n\n" + \
    "### Input:\n" + \
    "```{question}```\n\n" + \
    "### Context:\n" + \
    "```{context}```\n\n" + \
    "### Response:\n" + \
    "```{answer};```"

    text = template.format(context=example["context"], question=example["question"], answer=example["answer"])
    return text

In [22]:
Markdown(formatting_func(train_data[1]))

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.

### Input:
```Name the opponent for game 5```

### Context:
```CREATE TABLE table_name_25 (opponent VARCHAR, game VARCHAR)```

### Response:
```SELECT opponent FROM table_name_25 WHERE game = 5;```

### 3. Parameter Efficient Fine-Tuning (PEFT) - LoRA

In [23]:
# print(model)

#### 3.1 Prepare LoRA Fine-Tuning

In [24]:
model.gradient_checkpointing_enable()
if model.config.to_dict()["use_cache"]:
    model.use_cache = False

In [25]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [26]:
peft_model = get_peft_model(model=model, peft_config=peft_config)

#### 3.2 Check trainable parameters

In [27]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [28]:
print_trainable_parameters(peft_model)

trainable params: 41168896 || all params: 7084720128 || trainable%: 0.5810941752983809


### 4. Train the model

In [29]:
args_definition = dict(
    output_dir="./deci7bit-lora-sql",
    overwrite_output_dir=True,
    evaluation_strategy="steps",
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    learning_rate=3e-4,
    max_steps=500,
    lr_scheduler_type="cosine",
    max_grad_norm = 0.3,
    warmup_steps=100,
    logging_steps=20,
    save_steps=20,
    logging_first_step=True,
    seed=1399,
    bf16=True,
    report_to="wandb",
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    load_best_model_at_end=True
)
args = TrainingArguments(**args_definition)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [30]:
trainer = SFTTrainer(
    model=peft_model,
    args=args,
    train_dataset=train_data,
    eval_dataset=val_data,
    tokenizer=tokenizer,
    peft_config=peft_config,
    formatting_func=formatting_func,
    max_seq_length=1024,
    packing=True,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=1)]
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
max_steps is given, it will override any value given in num_train_epochs


In [31]:
trainer.train()

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[34m[1mwandb[0m: Currently logged in as: [33mliuxiangwin[0m ([33mliuxiangwin-free[0m). Use [1m`wandb login --relogin`[0m to force relogin
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss,Validation Loss
20,0.7762,0.507999
40,0.4829,0.427779
60,0.4283,0.401915
80,0.4102,0.390339
100,0.4,0.38562
120,0.3919,0.380029
140,0.3876,0.377312
160,0.3845,0.374146
180,0.3801,0.36652
200,0.3733,0.361285


TrainOutput(global_step=220, training_loss=0.4361744365908883, metrics={'train_runtime': 8858.361, 'train_samples_per_second': 1.806, 'train_steps_per_second': 0.056, 'total_flos': 3.007714272529613e+17, 'train_loss': 0.4361744365908883, 'epoch': 0.621030345800988})

#### 4.1 Compare outputs

In [32]:
fine_tuned_model = peft_model.merge_and_unload()

In [33]:
prompt = template.format(context=sample["context"], question=sample["question"])
input_ids = tokenizer(text=prompt, return_tensors="pt").to(device)
outputs = fine_tuned_model.generate(**input_ids, **generation_config)

In [34]:
display(Markdown("#### Completion:"))
display(Markdown(tokenizer.decode(token_ids=outputs[0], skip_special_tokens=True)))
display(Markdown("#### Answer:"))
Markdown(sample["answer"])

#### Completion:

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.

### Input:
```What country has a medical school established in 1969 with both an IMED and avicenna?```

### Context:
```CREATE TABLE table_name_15 (country_territory VARCHAR, imed_avicenna_listed VARCHAR, established VARCHAR)```

### Response:
```SELECT country_territory FROM table_name_15 WHERE imed_avicenna_listed = "yes" AND established = "1969";```

#### Answer:

SELECT country_territory FROM table_name_15 WHERE imed_avicenna_listed = "both" AND established = 1969

#### 4.2 Performance on test set

In [35]:
not_tuned_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="flash_attention_2",
    trust_remote_code=True
)

fine_tuned_model.use_cache = True

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [36]:
def generate_responses(example, ft_model, og_model):
    prompt = template.format(context=example["context"], question=example["question"])
    input_ids = tokenizer(text=prompt, return_tensors="pt").to(device)
    ft_outputs = ft_model.generate(**input_ids, **generation_config)
    og_outputs = og_model.generate(**input_ids, **generation_config)

    display(Markdown("#### Prompt:"))
    display(Markdown(prompt))
    display(Markdown("#### Original Completion:"))
    display(Markdown(tokenizer.decode(token_ids=og_outputs[0], skip_special_tokens=True) \
           .replace(prompt, "")))
    display(Markdown("#### Fine-tuned Completion:"))
    display(Markdown(tokenizer.decode(token_ids=ft_outputs[0], skip_special_tokens=True) \
           .replace(prompt, "")))
    display(Markdown("#### Expected Answer:"))
    display(Markdown("`{answer}`".format(answer=example["answer"])))
    display(Markdown("-----------------------------"))

In [37]:
for i in range(5):
    generate_responses(val_data[i], ft_model=fine_tuned_model, og_model=not_tuned_model)

#### Prompt:

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.

### Input:
```What is the name of the Tournament on oct 23?```

### Context:
```CREATE TABLE table_name_36 (tournament VARCHAR, date VARCHAR)```



#### Original Completion:

```Tournament```: ``(tournament VARCHAR(100), date VARCHAR(100), PRIMARY KEY (tournament, date))```

```date```: ``(date VARCHAR(100), PRIMARY KEY (tournament, date))```

```date```: ``(date VARCHAR(100), PRIMARY KEY (tournament, date

#### Fine-tuned Completion:

### Response:
```SELECT tournament FROM table_name_36 WHERE date = "oct 23";```

#### Expected Answer:

`SELECT tournament FROM table_name_36 WHERE date = "oct 23"`

-----------------------------

#### Prompt:

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.

### Input:
```List the name of rooms with king or queen bed.```

### Context:
```CREATE TABLE Rooms (roomName VARCHAR, bedType VARCHAR)```



#### Original Completion:

```INSERT INTO Rooms VALUES ('104', 'king')```

```INSERT INTO Rooms VALUES ('111', 'queen')```
```

### Output:
```SELECT roomName FROM Rooms WHERE bedType IN ('king', 'queen')``` 

#### Fine-tuned Completion:

### Response:
```SELECT roomName FROM Rooms WHERE bedType = 'King' OR bedType = 'Queen';```

#### Expected Answer:

`SELECT roomName FROM Rooms WHERE bedType = "King" OR bedType = "Queen"`

-----------------------------

#### Prompt:

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.

### Input:
```Which kickoff had an attendance of 58,120?```

### Context:
```CREATE TABLE table_name_9 (kickoff_ VARCHAR, a_ VARCHAR, attendance VARCHAR)```



#### Original Completion:

```INSERT INTO table_name_9 VALUES ('Sunday, January 11, 2009', 'Minnesota Vikings', '65,321')```

```INSERT INTO table_name_9 VALUES ('Sunday, January 11, 2009', 'Dallas Cowboys', '65,321')```

```INSERT INTO table_

#### Fine-tuned Completion:

### Response:
```SELECT kickoff_[a_] FROM table_name_9 WHERE attendance = "58,120";```

#### Expected Answer:

`SELECT kickoff_[a_] FROM table_name_9 WHERE attendance = "58,120"`

-----------------------------

#### Prompt:

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.

### Input:
```Who is the March playmate with a November playmate Cara Wakelin?```

### Context:
```CREATE TABLE table_name_3 (march VARCHAR, november VARCHAR)```



#### Original Completion:

```march```
```November```

```SARA WAKELIN```
```march playmate```
```november playmate```

### Output:
```SELECT * FROM table_name_3 WHERE March='march playmate' AND November='november playmate'``` 

#### Fine-tuned Completion:

### Response:
```SELECT march FROM table_name_3 WHERE november = "caral wakelin";```

#### Expected Answer:

`SELECT march FROM table_name_3 WHERE november = "cara wakelin"`

-----------------------------

#### Prompt:

You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.

### Input:
```What is every construction date for the registration of HB-HOS?```

### Context:
```CREATE TABLE table_22180353_1 (construction VARCHAR, registration VARCHAR)```



#### Original Completion:

```
| construction | registration |
|---|---|
| 06.11.1945 | HB-HOS |
| 10.01.1951 | HB-HOS |
| 12.09.1953 | HB-HOS |
| 26.01.1955 | HB-HOS |
| 06.11

#### Fine-tuned Completion:

### Response:
```SELECT construction FROM table_22180353_1 WHERE registration = "HB-HOS";```

#### Expected Answer:

`SELECT construction AS date FROM table_22180353_1 WHERE registration = "HB-HOS"`

-----------------------------

### 5. Save model

In [38]:
model_save_name = "deci7b-ft-lora-sql-v2"

In [39]:
# Save model & tokenizer
fine_tuned_model.push_to_hub(model_save_name)
tokenizer.push_to_hub(model_save_name)

model-00001-of-00003.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.18G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Liu-Xiang/deci7b-ft-lora-sql-v2/commit/cdfa7a8de1260bd190e6604cea385deeac53e5e2', commit_message='Upload tokenizer', commit_description='', oid='cdfa7a8de1260bd190e6604cea385deeac53e5e2', pr_url=None, pr_revision=None, pr_num=None)

In [40]:
# Save adapters
trainer.push_to_hub(model_save_name + "adapters")

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/40.0 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.43k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Liu-Xiang/deci7bit-lora-sql/commit/f42398f42c107f48a1cfeb9f7d1a5475fffda021', commit_message='deci7b-ft-lora-sql-v2adapters', commit_description='', oid='f42398f42c107f48a1cfeb9f7d1a5475fffda021', pr_url=None, pr_revision=None, pr_num=None)