# Lab 8: Instruction-Based Data Generation and Classification Using Gemma-3 and Decision Trees

## Step 1: Launch a Colab Notebook and Set Up Environment Install necessary packages first: 
```bash
!pip install torch torchvision torchaudio transformers accelerate bitsandbytes 
huggingface_hub scikit-learn pandas numpy matplotlib –quiet
```

## Step 2: Load and Run Gemma-3 from API

In [1]:
# configure api
from dotenv import load_dotenv
import os

load_dotenv()
gemini_api_key = os.getenv("GEMINI_API_KEY")

In [2]:
from google import genai

client = genai.Client(api_key=gemini_api_key)

model = "gemma-3-27b-it"

## Step 3: Run a Basic Prompt 
Here's an example to demonstrate how Mistral responds to instructions: 

In [3]:
prompt = """ 
Generate ONLY CSV data without any explanation. It should contain exactly 30 
rows with the columns: age, income, decision (yes/no).  
Ensure that: - age values are between 18 and 65, - income ranges from 30000 to 150000, - decision has a roughly equal number of 'yes' and 'no'. 
""" 
# Increased max_new_tokens for complete CSV generation 
response = client.models.generate_content(model=model, contents=prompt)
print(response.text) 

```csv
age,income,decision
25,65000,yes
38,120000,no
42,80000,yes
21,35000,no
55,100000,yes
31,45000,no
61,140000,no
19,32000,yes
48,95000,yes
28,50000,no
35,70000,yes
50,110000,no
23,40000,yes
63,150000,no
30,60000,no
45,85000,yes
27,38000,no
58,130000,yes
33,55000,yes
20,31000,no
40,75000,no
52,90000,yes
29,42000,yes
60,125000,no
36,68000,no
49,105000,yes
22,36000,no
57,115000,yes
39,72000,yes
26,48000,no
```


## Step 4: Continue with Model Training 

In [4]:
import pandas as pd 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.model_selection import train_test_split

In [5]:
# Assume response is the output from text_gen_pipeline 
response_text = response.text

# Remove markdown code block indicators if they exist
if "```csv" in response_text and "```" in response_text:
    # Extract content between ```csv and the last ```
    start_idx = response_text.find("```csv") + 6  # Length of ```csv
    end_idx = response_text.rfind("```")
    response_text = response_text[start_idx:end_idx].strip()
# If there's just ``` without csv specification
elif response_text.startswith("```") and "```" in response_text[3:]:
    start_idx = response_text.find("```") + 3
    end_idx = response_text.rfind("```")
    response_text = response_text[start_idx:end_idx].strip()

# Split the response text into lines and then into data points 
data_points = [] 
for line in response_text.strip().split('\n'): 
    if line: 
        try: 
            age, income, purchase = line.split(',') 
            data_points.append([int(age), int(income), purchase.strip()]) 
        except ValueError: 
            # Handle lines that don't conform to the expected format 
            print(f"Skipping line: {line}") 
 
# Create a pandas DataFrame 
df = pd.DataFrame(data_points, columns=['age', 'income', 'purchase']) 
 
# Convert 'purchase' to numerical (0 for 'no', 1 for 'yes') 
df['purchase'] = df['purchase'].map({'no': 0, 'yes': 1}) 
# Define X and y 
X = df[['age', 'income']] 
y = df['purchase']

# Now you can use train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) 
model = DecisionTreeClassifier(max_depth=5) 
model.fit(X_train, y_train) 

Skipping line: age,income,decision


In [6]:
from sklearn.model_selection import train_test_split 
from sklearn.tree import DecisionTreeClassifier, export_text 
from sklearn.metrics import classification_report 
# Decision rules visualization 
rules = export_text(model, feature_names=['age', 'income']) 
print("Decision Tree Rules:\n", rules) 
# Evaluate model clearly 
predictions = model.predict(X_test) 
print("Classification Report:\n", classification_report(y_test, predictions))

Decision Tree Rules:
 |--- age <= 59.00
|   |--- age <= 41.00
|   |   |--- income <= 73500.00
|   |   |   |--- income <= 69000.00
|   |   |   |   |--- age <= 26.00
|   |   |   |   |   |--- class: 1
|   |   |   |   |--- age >  26.00
|   |   |   |   |   |--- class: 0
|   |   |   |--- income >  69000.00
|   |   |   |   |--- class: 1
|   |   |--- income >  73500.00
|   |   |   |--- class: 0
|   |--- age >  41.00
|   |   |--- class: 1
|--- age >  59.00
|   |--- class: 0

Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.20      0.33         5
           1       0.20      1.00      0.33         1

    accuracy                           0.33         6
   macro avg       0.60      0.60      0.33         6
weighted avg       0.87      0.33      0.33         6

