# KANSR Class-Based API Example

This notebook demonstrates how to use the KANSR class from the LLMSR package to perform symbolic regression using Kolmogorov-Arnold Networks (KANs).

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from openai import OpenAI
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Import the KANSR class
from LLMSR.kansr import KANSR

## Set up the client for API calls

In [None]:
# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

## Define a function to model

Let's use a simple function with a mix of trigonometric and polynomial terms.

In [None]:
# Define the target function
def target_function(x):
    if isinstance(x, torch.Tensor):
        return 2.0 * torch.sin(2.0 * x) + 0.5 * x**2
    else:
        return 2.0 * np.sin(2.0 * x) + 0.5 * x**2

# Plot the function
x = np.linspace(-3, 3, 100)
y = target_function(x)

plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title("Target Function: 2.0 * sin(2.0 * x) + 0.5 * x²")
plt.xlabel("x")
plt.ylabel("y")
plt.grid(True)
plt.show()

## Method 1: Step-by-Step Approach

The KANSR class provides flexibility by letting you run each step individually.

In [None]:
# Initialize the KANSR instance
# The architecture [1,4,1] means: 1 input node, 4 hidden nodes, 1 output node
kansr = KANSR(
    client=client,      # API client for LLM calls
    width=[1, 4, 1],    # Network architecture
    grid=5,             # Grid size for KAN
    k=3,                # Number of basis functions
    seed=42             # Random seed for reproducibility
)

In [None]:
# Step 1: Create a dataset for training
dataset = kansr.create_dataset(
    f=target_function,    # The function to approximate
    ranges=(-3, 3),       # Input range
    n_var=1,              # Number of input variables
    train_num=5000,       # Number of training points
    test_num=1000         # Number of test points
)

In [None]:
# Step 2: Train the KAN model
kansr.train_kan(
    dataset=dataset,   # The dataset to train on
    opt="LBFGS",       # Optimization algorithm
    steps=50,          # Number of optimization steps
    prune=True,        # Whether to prune the model after training
    node_th=0.2,       # Node threshold for pruning
    edge_th=0.2        # Edge threshold for pruning
)

In [None]:
# Step 3: Convert the KAN model to symbolic expressions
best_expressions, best_chi_squareds, results_dicts = kansr.get_symbolic(
    client=client,              # Client for LLM API calls
    population=10,              # Population size for genetic algorithm
    generations=3,              # Number of generations for genetic algorithm
    temperature=0.1,            # Temperature parameter for genetic algorithm
    gpt_model="openai/gpt-4o",  # GPT model to use
    verbose=1,                  # Verbosity level
    use_async=True,             # Whether to use async execution
    plot_fit=True               # Whether to plot fitting results
)

In [None]:
# Print the best expression and its chi-squared value
print(f"Best expression: {best_expressions[0]}")
print(f"Chi-squared: {best_chi_squareds[0]}")

In [None]:
# Plot the results
fig, ax = kansr.plot_results(
    ranges=(-3, 3),           # Input range for plotting
    result_dict=results_dicts[0],  # Results dictionary from get_symbolic
    title="KAN Symbolic Regression Results"  # Plot title
)
plt.show()

## Method 2: All-in-One Pipeline

For convenience, you can run the entire pipeline in one call.

In [None]:
# Initialize a new KANSR instance
kansr2 = KANSR(
    client=client,
    width=[1, 5, 1],  # Using a slightly different architecture
    grid=6,
    k=3,
    seed=17
)

In [None]:
# Run the complete pipeline
results = kansr2.run_complete_pipeline(
    client=client,
    f=target_function,
    ranges=(-3, 3),
    train_steps=50,
    generations=3,
    gpt_model="openai/gpt-4o",
    node_th=0.2,
    edge_th=0.2,
    optimizer="LBFGS",
    population=10,
    temperature=0.1,
    verbose=1,
    use_async=True,
    plot_fit=True
)

In [None]:
# Print the best expression from the pipeline
print(f"Best expression from pipeline: {results['best_expressions'][0]}")
print(f"Chi-squared: {results['best_chi_squareds'][0]}")

## Method 3: Using the standalone function

You can also use the standalone function `run_complete_pipeline` from the kansr module.

In [None]:
from LLMSR.kansr import run_complete_pipeline

# Run the complete pipeline using the standalone function
results_standalone = run_complete_pipeline(
    client=client,
    f=target_function,
    ranges=(-3, 3),
    width=[1, 4, 1],
    grid=5,
    k=3,
    train_steps=50,
    generations=3,
    gpt_model="openai/gpt-4o",
    device='cpu',
    node_th=0.2,
    edge_th=0.2,
    optimizer="LBFGS",
    population=10,
    temperature=0.1,
    verbose=1,
    use_async=True,
    plot_fit=True,
    seed=42
)

In [None]:
# Print the best expression from the standalone pipeline
print(f"Best expression from standalone pipeline: {results_standalone['best_expressions'][0]}")
print(f"Chi-squared: {results_standalone['best_chi_squareds'][0]}")

## Advanced Example: Multi-variate Function

The KANSR class can also handle multivariate functions.

In [None]:
# Define a multivariate function
def multivariate_function(x):
    """
    Function with two input variables: f(x0, x1) = sin(x0) + 0.5 * x1^2
    
    Args:
        x: Tensor of shape [n, 2] where n is batch size
        
    Returns:
        Tensor of shape [n] with function values
    """
    if isinstance(x, torch.Tensor):
        return torch.sin(x[:, 0]) + 0.5 * x[:, 1]**2
    else:
        return np.sin(x[:, 0]) + 0.5 * x[:, 1]**2

In [None]:
# Initialize a KANSR instance for the multivariate function
multivariate_kansr = KANSR(
    client=client,
    width=[2, 5, 1],  # 2 inputs, 5 hidden nodes, 1 output
    grid=5,
    k=3,
    seed=42
)

In [None]:
# Create a dataset for the multivariate function
multivariate_dataset = multivariate_kansr.create_dataset(
    f=multivariate_function,
    ranges=(-3, 3),  # Same range for both variables
    n_var=2,  # Two input variables
    train_num=10000,
    test_num=1000
)

In [None]:
# Train the KAN model
multivariate_kansr.train_kan(
    dataset=multivariate_dataset,
    opt="LBFGS",
    steps=75,  # More steps for multivariate function
    prune=True,
    node_th=0.2,
    edge_th=0.2
)

In [None]:
# Convert to symbolic expressions
best_expressions, best_chi_squareds, results_dicts = multivariate_kansr.get_symbolic(
    client=client,
    population=10,
    generations=3,
    temperature=0.1,
    gpt_model="openai/gpt-4o",
    verbose=1,
    use_async=True,
    plot_fit=True
)

In [None]:
# Print the best expression
print(f"Best multivariate expression: {best_expressions[0]}")
print(f"Chi-squared: {best_chi_squareds[0]}")

## Summary

This notebook demonstrated three different ways to use the KANSR class:

1. **Step-by-Step Approach**: Creating a KANSR instance and calling each method individually
2. **All-in-One Pipeline**: Using the `run_complete_pipeline` method of the KANSR class
3. **Standalone Function**: Using the standalone `run_complete_pipeline` function

We also demonstrated how to handle multivariate functions with the KANSR class.

The KANSR class provides a convenient and flexible interface for performing symbolic regression with Kolmogorov-Arnold Networks.