# SFT Training for Convex Optimization Exercises

This notebook trains a language model on 340 convex optimization proof problems using Supervised Fine-Tuning (SFT).

**Dataset**: Boyd & Vandenberghe's "Convex Optimization" exercises (`exercises.jsonl`)

**Approach**: 
- Train directly on optimization exercises (proof-based problems)
- Use reasoning tags: `<start_working_out>...<end_working_out><SOLUTION>...</SOLUTION>`
- Multiple epochs for small dataset
- Based on Unsloth's Qwen GRPO notebook structure

## 1. Setup and Model Loading

In [1]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048  # Can increase for longer proofs
lora_rank = 32  # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen2.5-1.5B",  # Smaller model for faster training
    max_seq_length = max_seq_length,
    load_in_4bit = False,  # False for LoRA 16bit
    fast_inference = True,
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.9,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank,
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = lora_rank * 2,
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
)

print("‚úÖ Model loaded successfully")
print(f"üî• CUDA available: {torch.cuda.is_available()}")

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


ü¶• Unsloth Zoo will now patch everything to make training faster!
INFO 12-02 19:31:08 [vllm_utils.py:700] Unsloth: Patching vLLM v1 graph capture
==((====))==  Unsloth 2025.11.3: Fast Qwen2 patching. Transformers: 4.57.1. vLLM: 0.11.2.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 8. Max memory: 39.494 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: vLLM loading unsloth/Qwen2.5-1.5B with actual GPU utilization = 88.97%
Unsloth: Your GPU has CUDA compute capability 8.0 with VRAM = 39.49 GB.
Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 2048. Num Sequences = 320.
Unsloth: vLLM's KV Cache can use up to 32.19 GB. Also swap space = 6 GB.
Unsloth: FAILED getting compilation_config with error

2025-12-02 19:31:19,526	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


INFO 12-02 19:31:19 [scheduler.py:216] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 12-02 19:31:21 [core.py:93] Initializing a V1 LLM engine (v0.11.2) with config: model='unsloth/Qwen2.5-1.5B', speculative_config=None, tokenizer='unsloth/Qwen2.5-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_de

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.85it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.85it/s]


INFO 12-02 19:31:26 [default_loader.py:314] Loading weights took 0.60 seconds
INFO 12-02 19:31:26 [punica_selector.py:20] Using PunicaWrapperGPU.





INFO 12-02 19:31:27 [gpu_model_runner.py:3338] Model loading took 2.9550 GiB memory and 4.294009 seconds
INFO 12-02 19:31:38 [backends.py:631] Using cache directory: /home/ec2-user/.cache/vllm/torch_compile_cache/16c7de256e/rank_0_0/backbone for vLLM's torch.compile
INFO 12-02 19:31:38 [backends.py:647] Dynamo bytecode transform time: 9.60 s
INFO 12-02 19:31:42 [backends.py:251] Cache the graph for dynamic shape for later use
INFO 12-02 19:31:50 [backends.py:282] Compiling a graph for dynamic shape takes 10.85 s
INFO 12-02 19:31:52 [monitor.py:34] torch.compile takes 20.45 s in total
INFO 12-02 19:31:54 [gpu_worker.py:359] Available KV cache memory: 31.51 GiB
INFO 12-02 19:31:54 [kv_cache_utils.py:1229] GPU KV cache size: 1,180,096 tokens
INFO 12-02 19:31:54 [kv_cache_utils.py:1234] Maximum concurrency for 2,048 tokens per request: 576.22x
INFO 12-02 19:31:55 [kernel_warmup.py:65] Warming up FlashInfer attention.
INFO 12-02 19:31:55 [vllm_utils.py:705] Unsloth: Running patched vLLM v1 

Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   0%|          | 0/102 [00:00<?, ?it/s]



Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 102/102 [00:07<00:00, 13.55it/s]
Capturing CUDA graphs (decode, FULL): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 78/78 [00:05<00:00, 13.59it/s]

INFO 12-02 19:32:08 [gpu_model_runner.py:4244] Graph capturing finished in 13 secs, took 1.37 GiB
INFO 12-02 19:32:08 [vllm_utils.py:712] Unsloth: Patched vLLM v1 graph capture finished in 13 secs.





INFO 12-02 19:32:09 [core.py:250] init engine (profile, create kv cache, warmup model) took 41.89 seconds
INFO 12-02 19:32:11 [llm.py:352] Supported tasks: ('generate',)
Unsloth: Just some info: will skip parsing ['norm1', 'layer_norm1', 'post_feedforward_layernorm', 'q_norm', 'norm2', 'k_norm', 'attention_norm', 'post_attention_layernorm', 'post_layernorm', 'pre_feedforward_layernorm', 'layer_norm2', 'ffn_norm', 'norm', 'input_layernorm']
Performing substitution for additional_keys=set()
Unsloth: Just some info: will skip parsing ['norm1', 'layer_norm1', 'post_feedforward_layernorm', 'q_norm', 'norm2', 'k_norm', 'cross_attn_input_layernorm', 'attention_norm', 'post_attention_layernorm', 'post_layernorm', 'pre_feedforward_layernorm', 'layer_norm2', 'ffn_norm', 'norm', 'cross_attn_post_attention_layernorm', 'input_layernorm']


Unsloth 2025.11.3 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


‚úÖ Model loaded successfully
üî• CUDA available: True


## 2. Configure Chat Template with Reasoning Tags

In [2]:
reasoning_start = "<start_working_out>"
reasoning_end = "<end_working_out>"
solution_start = "<SOLUTION>"
solution_end = "</SOLUTION>"

system_prompt = \
f"""You are given an optimization problem.
Think about the problem and provide your working out (proof steps).
Place it between {reasoning_start} and {reasoning_end}.
Then, provide your solution between {solution_start}{solution_end}"""

print(system_prompt)

You are given an optimization problem.
Think about the problem and provide your working out (proof steps).
Place it between <start_working_out> and <end_working_out>.
Then, provide your solution between <SOLUTION></SOLUTION>


In [3]:
# Create chat template
chat_template = \
    "{% if messages[0]['role'] == 'system' %}"\
        "{{ messages[0]['content'] + eos_token }}"\
        "{% set loop_messages = messages[1:] %}"\
    "{% else %}"\
        "{{ '{system_prompt}' + eos_token }}"\
        "{% set loop_messages = messages %}"\
    "{% endif %}"\
    "{% for message in loop_messages %}"\
        "{% if message['role'] == 'user' %}"\
            "{{ message['content'] }}"\
        "{% elif message['role'] == 'assistant' %}"\
            "{{ message['content'] + eos_token }}"\
        "{% endif %}"\
    "{% endfor %}"\
    "{% if add_generation_prompt %}{{ '{reasoning_start}' }}"\
    "{% endif %}"

# Replace with our specific template
chat_template = chat_template\
    .replace("'{system_prompt}'", f"'{system_prompt}'")\
    .replace("'{reasoning_start}'", f"'{reasoning_start}'")
tokenizer.chat_template = chat_template

print("‚úÖ Chat template configured")

‚úÖ Chat template configured


### Test the Chat Template

In [4]:
print(tokenizer.apply_chat_template([
    {"role": "user", "content": "Show that the intersection of convex sets is convex."},
    {"role": "assistant", "content": f"{reasoning_start}Let C1 and C2 be convex...{reasoning_end}{solution_start}Proven{solution_end}"},
    {"role": "user", "content": "What about the union?"},
], tokenize=False, add_generation_prompt=True))

You are given an optimization problem.
Think about the problem and provide your working out (proof steps).
Place it between <start_working_out> and <end_working_out>.
Then, provide your solution between <SOLUTION></SOLUTION><|endoftext|>Show that the intersection of convex sets is convex.<start_working_out>Let C1 and C2 be convex...<end_working_out><SOLUTION>Proven</SOLUTION><|endoftext|>What about the union?<start_working_out>


## 3. Load and Format Optimization Exercises

In [5]:
import json
import pandas as pd
from datasets import Dataset

# Load exercises.jsonl
exercises = []
with open("exercises.jsonl", 'r', encoding='utf-8') as f:
    for line in f:
        exercises.append(json.loads(line))

dataset = pd.DataFrame(exercises)

print(f"üìö Loaded {len(dataset)} optimization exercises")
print(f"\nüìã Dataset columns: {dataset.columns.tolist()}")
dataset.head()

üìö Loaded 340 optimization exercises

üìã Dataset columns: ['exercise_number', 'exercise_text', 'solution_text', 'text']


Unnamed: 0,exercise_number,exercise_text,solution_text,text
0,2.1,"Let C ‚äÜ Rn be a convex set, with x1, . . . , x...",This is readily shown by induction from the de...,"2.1 Let C ‚äÜ Rn be a convex set, with x1, . . ...."
1,2.2,Show that a set is convex if and only if its i...,We prove the Ô¨Årst part. The intersection of tw...,2.2 Show that a set is convex if and only if i...
2,2.3,Midpoint convexity. A set C is midpoint convex...,We have to show that Œ∏x + (1 ‚àí Œ∏)y ‚àà C for all...,2.3 Midpoint convexity. A set C is midpoint co...
3,2.4,Show that the convex hull of a set S is the in...,Let H be the convex hull of S and let D be the...,2.4 Show that the convex hull of a set S is th...
4,2.5,What is the distance between two parallel hype...,The distance between the two hyperplanes is |b...,2.5 What is the distance between two parallel ...


In [6]:
# Show a sample exercise
print("üìù Sample Exercise:")
print(f"\nNumber: {dataset.iloc[0]['exercise_number']}")
print(f"\nProblem:\n{dataset.iloc[0]['exercise_text'][:200]}...")
print(f"\nSolution:\n{dataset.iloc[0]['solution_text'][:200]}...")

üìù Sample Exercise:

Number: 2.1

Problem:
Let C ‚äÜ Rn be a convex set, with x1, . . . , xk ‚àà C, and let Œ∏1, . . . , Œ∏k ‚àà R satisfy Œ∏i ‚â• 0, Œ∏1 + ¬∑ ¬∑ ¬∑ + Œ∏k = 1. Show that Œ∏1x1 + ¬∑ ¬∑ ¬∑ + Œ∏kxk ‚àà C. (The deÔ¨Ånition of convexity is that this holds f...

Solution:
This is readily shown by induction from the deÔ¨Ånition of convex set. We illus- trate the idea for k = 3, leaving the general case to the reader. Suppose that x1, x2, x3 ‚àà C, and Œ∏1 + Œ∏2 + Œ∏3 = 1 with ...


### Format Dataset with Reasoning Tags

In [7]:
def format_dataset(x):
    """Format exercise with reasoning tags."""
    problem = x["exercise_text"]
    solution = x["solution_text"]
    
    # Wrap solution with reasoning tags
    # For proofs, the entire solution text is the "reasoning"
    # and we put a summary in SOLUTION tags
    final_prompt = \
        reasoning_start + solution + reasoning_end + \
        solution_start + "Proven." + solution_end
    
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": problem},
        {"role": "assistant", "content": final_prompt},
    ]

dataset["Messages"] = dataset.apply(format_dataset, axis=1)
print("‚úÖ Dataset formatted with reasoning tags")

‚úÖ Dataset formatted with reasoning tags


In [8]:
# Check formatted example
print("üìÑ Formatted example (first 800 chars):")
print(tokenizer.apply_chat_template(dataset.iloc[0]["Messages"], tokenize=False)[:800])
print("...")

üìÑ Formatted example (first 800 chars):
You are given an optimization problem.
Think about the problem and provide your working out (proof steps).
Place it between <start_working_out> and <end_working_out>.
Then, provide your solution between <SOLUTION></SOLUTION><|endoftext|>Let C ‚äÜ Rn be a convex set, with x1, . . . , xk ‚àà C, and let Œ∏1, . . . , Œ∏k ‚àà R satisfy Œ∏i ‚â• 0, Œ∏1 + ¬∑ ¬∑ ¬∑ + Œ∏k = 1. Show that Œ∏1x1 + ¬∑ ¬∑ ¬∑ + Œ∏kxk ‚àà C. (The deÔ¨Ånition of convexity is that this holds for k = 2; you must show it for arbitrary k.) Hint. Use induction on k.<start_working_out>This is readily shown by induction from the deÔ¨Ånition of convex set. We illus- trate the idea for k = 3, leaving the general case to the reader. Suppose that x1, x2, x3 ‚àà C, and Œ∏1 + Œ∏2 + Œ∏3 = 1 with Œ∏1, Œ∏2, Œ∏3 ‚â• 0. We will show that y = Œ∏1x1 + Œ∏2x2 + Œ∏3x3 ‚àà C. At least one
...


### Filter by Length

Keep examples that fit within max_seq_length to avoid truncation.

In [9]:
# Calculate token lengths
dataset["N"] = dataset["Messages"].apply(
    lambda x: len(tokenizer.apply_chat_template(x, tokenize=True))
)

print(f"\nüìä Length statistics:")
print(f"   Min: {dataset['N'].min()} tokens")
print(f"   Max: {dataset['N'].max()} tokens")
print(f"   Mean: {dataset['N'].mean():.0f} tokens")
print(f"   Median: {dataset['N'].median():.0f} tokens")

# Filter to examples that fit
original_count = len(dataset)
dataset = dataset.loc[dataset["N"] <= max_seq_length].copy()

print(f"\n‚úÖ Filtered dataset:")
print(f"   Original: {original_count} examples")
print(f"   Kept: {len(dataset)} examples")
print(f"   Removed: {original_count - len(dataset)} examples (too long)")


üìä Length statistics:
   Min: 108 tokens
   Max: 2794 tokens
   Mean: 629 tokens
   Median: 506 tokens

‚úÖ Filtered dataset:
   Original: 340 examples
   Kept: 336 examples
   Removed: 4 examples (too long)


### Convert to HuggingFace Dataset

In [10]:
# Apply chat template to create "text" field
dataset["text"] = tokenizer.apply_chat_template(
    dataset["Messages"].values.tolist(), 
    tokenize=False
)

# Convert to HuggingFace Dataset
dataset = Dataset.from_pandas(dataset)

print(f"‚úÖ Dataset prepared for training")
print(f"   Total examples: {len(dataset)}")
print(f"   Columns: {dataset.column_names}")
dataset

‚úÖ Dataset prepared for training
   Total examples: 336
   Columns: ['exercise_number', 'exercise_text', 'solution_text', 'text', 'Messages', 'N', '__index_level_0__']


Dataset({
    features: ['exercise_number', 'exercise_text', 'solution_text', 'text', 'Messages', 'N', '__index_level_0__'],
    num_rows: 336
})

## 4. Configure and Run SFT Training

In [11]:
from trl import SFTTrainer, SFTConfig

# Calculate training steps
num_epochs = 15  # More epochs for small dataset
batch_size = 1
gradient_accumulation = 4
steps_per_epoch = len(dataset) // (batch_size * gradient_accumulation)
total_steps = steps_per_epoch * num_epochs

print(f"üìä Training configuration:")
print(f"   Examples: {len(dataset)}")
print(f"   Epochs: {num_epochs}")
print(f"   Batch size: {batch_size} x {gradient_accumulation} = {batch_size * gradient_accumulation} (effective)")
print(f"   Steps per epoch: ~{steps_per_epoch}")
print(f"   Total steps: ~{total_steps}")
print(f"   Estimated time: ~{total_steps * 0.5 / 60:.0f}-{total_steps * 1.0 / 60:.0f} minutes on modern GPU")

üìä Training configuration:
   Examples: 336
   Epochs: 15
   Batch size: 1 x 4 = 4 (effective)
   Steps per epoch: ~84
   Total steps: ~1260
   Estimated time: ~10-21 minutes on modern GPU


In [12]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        warmup_steps = 50,
        num_train_epochs = 15,
        learning_rate = 2e-4,
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs/optimization_sft",
        save_steps = 100,
        save_total_limit = 3,
        report_to = "none",
    ),
)

print("‚úÖ Trainer configured")

Unsloth: Tokenizing ["text"] (num_proc=64): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 336/336 [00:10<00:00, 30.96 examples/s]

‚úÖ Trainer configured





In [13]:
# Start training
print("üöÄ Starting training...\n")
trainer.train()
print("\n‚úÖ Training completed!")

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 336 | Num Epochs = 15 | Total steps = 1,260
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 36,929,536 of 1,580,643,840 (2.34% trained)


üöÄ Starting training...

Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,1.6057
20,1.3791
30,1.1976
40,1.1665
50,1.0817
60,1.0813
70,1.06
80,1.0253
90,0.9847
100,0.9244



‚úÖ Training completed!


## 5. Save the Model

In [14]:
# Save LoRA adapter
model.save_pretrained("optimization_sft_model")
tokenizer.save_pretrained("optimization_sft_model")

print("üíæ Model saved to: optimization_sft_model/")

üíæ Model saved to: optimization_sft_model/


## 6. Test the Trained Model

In [15]:
# Enable inference mode
FastLanguageModel.for_inference(model)

def generate_proof(problem: str, max_tokens: int = 1024):
    """Generate a proof for the given problem."""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": problem}
    ]
    
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    
    inputs = tokenizer(text, return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_tokens,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )
    
    generated = tokenizer.decode(
        outputs[0][inputs['input_ids'].shape[1]:],
        skip_special_tokens=True
    )
    
    return generated

print("‚úÖ Inference functions ready")

‚úÖ Inference functions ready


### Test 1: Training Example

In [16]:
# Get a test problem from training set
test_df = dataset.to_pandas()
test_problem = test_df.iloc[0]['exercise_text']

print("="*80)
print("TEST 1: Training Example")
print("="*80)
print(f"\nüìù Problem:\n{test_problem}")
print(f"\nü§ñ Model Generated:")
print(generate_proof(test_problem, max_tokens=1024))

# Check for format tags
generated = generate_proof(test_problem, max_tokens=1024)
has_reasoning_end = reasoning_end in generated
has_solution_start = solution_start in generated
has_solution_end = solution_end in generated

print("\nüìä Format Check:")
print(f"   ‚úì {reasoning_end}: {has_reasoning_end}")
print(f"   ‚úì {solution_start}: {has_solution_start}")
print(f"   ‚úì {solution_end}: {has_solution_end}")
print(f"   Overall: {'‚úÖ PASS' if all([has_reasoning_end, has_solution_start, has_solution_end]) else '‚ùå FAIL'}")

TEST 1: Training Example

üìù Problem:
Let C ‚äÜ Rn be a convex set, with x1, . . . , xk ‚àà C, and let Œ∏1, . . . , Œ∏k ‚àà R satisfy Œ∏i ‚â• 0, Œ∏1 + ¬∑ ¬∑ ¬∑ + Œ∏k = 1. Show that Œ∏1x1 + ¬∑ ¬∑ ¬∑ + Œ∏kxk ‚àà C. (The deÔ¨Ånition of convexity is that this holds for k = 2; you must show it for arbitrary k.) Hint. Use induction on k.

ü§ñ Model Generated:
We use induction on k. The base case k = 2 follows from the deÔ¨Ånition of convex set. Suppose the result is true for k ‚àí 1, and consider the case where k is even. Without loss of generality we can assume that Œ∏ = (1, . . . , 1) T ‚àà {Œ∏}. Let x ‚àà C. We will show that y = Œ∏1x1 + ¬∑ ¬∑ ¬∑ + Œ∏kxk ‚àà C. When k = 2, this follows from the deÔ¨Ånition of convex set. Assume the result is true for k ‚àí 2, and consider the case when k is even. DeÔ¨Åne D = {Œ∏ ‚àà {Œ∏} | Œ∏ ‚™∞ 0, Œ∏ ‚â† 1}. We will show that y ‚àà cl C. Suppose y Ã∏‚àà cl C, i.e., there exists a $\epsilon$-ball around y that does not intersect C: ![imprecisely conta

### Test 2: Different Example

In [17]:
test_problem_2 = test_df.iloc[5]['exercise_text']

print("="*80)
print("TEST 2: Another Training Example")
print("="*80)
print(f"\nüìù Problem:\n{test_problem_2}")
print(f"\nü§ñ Model Generated:")
print(generate_proof(test_problem_2, max_tokens=1024))

TEST 2: Another Training Example

üìù Problem:
When does one halfspace contain another? Give conditions under which {x | aT x ‚â§ b} ‚äÜ {x | ÀúaT x ‚â§ Àúb} (where a Ã∏= 0, Àúa Ã∏= 0). Also Ô¨Ånd the conditions under which the two halfspaces are equal.

ü§ñ Model Generated:
Let H = {x | aT x ‚â§ b} and Àú H = {x | ÀúaT x ‚â§ Àúb}. The conditions are: ‚Ä¢ H ‚äÜ Àú H if and only if there exists a Œª > 0 such that Àúa = Œªa and Àúb ‚â• Œªb. ‚Ä¢ H = Àú H if and only if there exists a Œª > 0 such that Àúa = Œªa and Àúb = Œªb. Let us prove the Ô¨Årst condition. The condition is clearly suÔ¨Écient: if Àúa = Œªa and Àúb ‚â• Œªb for some Œª > 0, then aT x ‚â§ b =‚áí ŒªaT x ‚â§ Œªb =‚áí ÀúaT x ‚â§ Àúb, i.e., H ‚äÜ Àú H. To prove necessity, we distinguish three cases. First suppose a and Àúa are not parallel. This means we can Ô¨Ånd a v with ÀúaT v = 0 and aT v Ã∏= 0. Let ÀÜx be any point in the intersection of H and Àú H, i.e., aT ÀÜx ‚â§ b and ÀúaT x ‚â§ Àúb. We have aT (ÀÜx + tv) = aT ÀÜx ‚

### Test 3: Custom Problem (Generalization)

In [18]:
custom_problem = "Show that the intersection of two convex sets is convex."

print("="*80)
print("TEST 3: Custom Problem (Generalization)")
print("="*80)
print(f"\nüìù Problem:\n{custom_problem}")
print(f"\nü§ñ Model Generated:")
print(generate_proof(custom_problem, max_tokens=1024))

TEST 3: Custom Problem (Generalization)

üìù Problem:
Show that the intersection of two convex sets is convex.

ü§ñ Model Generated:
Let S1 and S2 be convex sets. Let x, y ‚àà S1 ‚à© S2. Suppose 0 ‚â§ Œ∏ ‚â§ 1. Then Œ∏x + (1 ‚àí Œ∏)y ‚àà S1 ‚à© S2, because S1 and S2 are convex, and therefore Œ∏x + (1 ‚àí Œ∏)y ‚àà S1, Œ∏x + (1 ‚àí Œ∏)y ‚àà S2. Exercises<end_working_out><SOLUTION>Proven.</SOLUTION>


### Test 4: Another Custom Problem

In [19]:
custom_problem_2 = "Prove that a convex combination of points in a convex set remains in that set."

print("="*80)
print("TEST 4: Another Custom Problem")
print("="*80)
print(f"\nüìù Problem:\n{custom_problem_2}")
print(f"\nü§ñ Model Generated:")
print(generate_proof(custom_problem_2, max_tokens=1024))

TEST 4: Another Custom Problem

üìù Problem:
Prove that a convex combination of points in a convex set remains in that set.

ü§ñ Model Generated:
Let x1, . . . , xn be a convex combination of points in the convex set C, i.e., x1 + ¬∑ ¬∑ ¬∑ + xn = 1 and xi ‚àà C for i = 1, . . . , n. Now suppose x ‚àà C. Then, for t ‚àà [0, 1], tx1 + (1 ‚àí t)x2 + ¬∑ ¬∑ ¬∑ + tnxn = tx1 + (1 ‚àí t)x2 + ¬∑ ¬∑ ¬∑ + tnxi + ¬∑ ¬∑ ¬∑ + tnxn = t(x1 + x2 + ¬∑ ¬∑ ¬∑ + xn) + (1 ‚àí t)(x2 + ¬∑ ¬∑ ¬∑ + xi + ¬∑ ¬∑ ¬∑ + xn) ‚àà C, because xi + xj ‚àà C for i Ã∏= j, and C is convex. 2 Convex sets<end_working_out><SOLUTION>Proven.</SOLUTION>


In [23]:
random_question = """
A geothermal power plant has three geothermal wells (A, B, and C) with maximum sustainable capacities of A: 1000 units, B: 1500 units, and C: 2000 units. The maximum operating capacity of the power generation equipment is 3000 units. To maintain the pressure in the geothermal field, some of the used hot water or steam needs to be reinjected underground, with a reinjection ratio requirement of 40%. The unit extraction costs for each geothermal well are A: 5 units, B: 4 units, and C: 3 units, and the environmental protection cost is 2 units. Assuming the electricity market demand for a time period t is 2800 units, and each unit of electricity generation brings 1 unit of revenue.
How should the extraction quantities of the three geothermal wells be scheduled to meet the electricity market demand while maximizing revenue and minimizing costs? Design a scheduling plan and calculate the maximum total revenue.
"""

random_question = """
Consider a polyhedron P described by linear inequality constraints:
P= {x ‚ààRn : a‚Ä≤
ix ‚â§bi, i = 1,...,m}. A ball with center y and radius r is defined as the
set of all points within Euclidean distance r from y. We are interested in finding a ball with
the largest possible radius, which is entirely contained within P. Provide a linear programming
formulation of this problem.
"""

In [24]:
print(generate_proof(random_question, max_tokens=2048))


Interpret the problem as an LP minimize 1T x subject to aT i x ‚àí yi < bi, i = 1, . . . , m, with variables x ‚àà Rn, y ‚àà Rn, and constraints on the righthand side of the inequalities. The objective is to maximize the linear function 1T x, which is easily shown to be equivalent to maximize x1. 7 Statistical estimation The constraints ensure that the ball is entirely contained in P. To show this, suppose that the ball is not contained in P. Then, there is some z ‚àà P with ‚à•z ‚àí y‚à•2 > r. Hence, 1T z > 1T y + r, and 1T z ‚àí 1T y > r, which is a contradiction. Chapter 2 Convex optimization problems<end_working_out><SOLUTION>Proven.</SOLUTION>


In [21]:
print(random_question)


A geothermal power plant has three geothermal wells (A, B, and C) with maximum sustainable capacities of A: 1000 units, B: 1500 units, and C: 2000 units. The maximum operating capacity of the power generation equipment is 3000 units. To maintain the pressure in the geothermal field, some of the used hot water or steam needs to be reinjected underground, with a reinjection ratio requirement of 40%. The unit extraction costs for each geothermal well are A: 5 units, B: 4 units, and C: 3 units, and the environmental protection cost is 2 units. Assuming the electricity market demand for a time period t is 2800 units, and each unit of electricity generation brings 1 unit of revenue.
How should the extraction quantities of the three geothermal wells be scheduled to meet the electricity market demand while maximizing revenue and minimizing costs? Design a scheduling plan and calculate the maximum total revenue.



## 7. Summary

### What We Accomplished
- ‚úÖ Loaded 340 convex optimization exercises
- ‚úÖ Formatted with reasoning tags
- ‚úÖ Trained with SFT for 15 epochs (~1,000+ steps)
- ‚úÖ Saved trained LoRA adapter
- ‚úÖ Tested on sample problems

### Model Location
- **Final model**: `optimization_sft_model/`
- **Checkpoints**: `outputs/optimization_sft/checkpoint-*/`

### Next Steps

**If format is correct but quality needs improvement:**
1. Train for more epochs (20-30)
2. Try a larger model: `unsloth/Qwen2.5-3B` or `unsloth/Qwen2.5-7B`
3. Adjust learning rate (try 1e-4 or 3e-4)
4. Increase max_seq_length to 3072 or 4096 for longer proofs

**If ready for GRPO:**
1. Use this SFT model as base
2. Apply GRPO with optimization exercises
3. Design reward functions for proof quality

### Loading the Model Later

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "optimization_sft_model",
    max_seq_length = 2048,
)
FastLanguageModel.for_inference(model)
```