Decision Tree Genetic Programming classifier with a scikit-learn-style API.
pip install gpclassifyfrom gpclassify import GPClassifier
X = [
[4.0, 1.0],
[5.0, 2.0],
[1.0, 3.0],
[2.0, 5.0],
]
y = [1 if row[0] > row[1] else 0 for row in X]
clf = GPClassifier(
num_models=40,
generations=40,
max_depth=6,
selection_method="pareto_tournament",
fitness_method="pearson_r2",
random_state=42,
)
clf.fit(X, y)
print(clf.predict(X))
print(clf.predict_proba(X))
print(clf.score(X, y))GPClassifier supports multiclass classification with one-vs-rest training.
from gpclassify import GPClassifier
X = [
[9.0, 1.0, 1.0],
[1.0, 9.0, 1.0],
[1.0, 1.0, 9.0],
[8.0, 2.0, 1.0],
[1.0, 8.0, 2.0],
[2.0, 1.0, 8.0],
]
y = [0, 1, 2, 0, 1, 2]
clf = GPClassifier(num_models=20, generations=20, random_state=7)
clf.fit(X, y)
pred = clf.predict(X)
proba = clf.predict_proba(X)You can inspect evolved models as expressions or tree-like text.
expr = clf.view_model() # one model as a readable expression
top3_expr = clf.view_model(3) # top 3 models as expressions
tree = clf.view_model_tree() # one model in tree-like format
top2_trees = clf.view_model_tree(2)For multiclass one-vs-rest models, both inspection methods include class labels in the output so it is clear which class each displayed model belongs to.
- Leaves are dataset variables (
x[i]) and numeric constants. - Comparison operands can include nested math layers (none, one, or many).
- Left and right sides are not required to be symmetric.
This allows regression-like value expressions at the bottom of boolean trees.
num_models: population sizegenerations: evolution stepscrossover_rate: crossover fractionmutation_rate: mutation fractionelitist_rate: elite carryover fractionmax_depth: maximum tree depthtournament_size: tournament selection sizeselection_method: parent selection strategy ("tournament"or"pareto_tournament")fitness_method: fitness objective ("accuracy","f1_score", or"pearson_r2")enable_trig_functions: include trig unary operators (sin,cos,tanh) in tree value expressionsrandom_state: reproducibility seedshow_training_curve: print generation-by-generation best fitness
Set selection_method="pareto_tournament" to optimize with two objectives during tournament
selection:
- maximize fitness (classification performance)
- minimize complexity (tree size)
For each tournament draw, GPClassify computes the full non-dominated front and samples parents from that front.
Set fitness_method to choose the optimization target used by evolutionary scoring (default: "pearson_r2"):
"accuracy": maximize classification agreement (with inversion symmetry)"f1_score": maximize F1 score (with inversion symmetry)"pearson_r2": maximize squared Pearson correlation between predictions and labels
Haut, Nathan. Active Learning in Genetic Programming. Michigan State University, 2023.