<a href="https://colab.research.google.com/github/janoPig/HROCH/blob/main/examples/SymbolicFuzzy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Symbolic fuzzy regression/classification demo

In [None]:
%pip install -U HROCH
#Penn Machine Learning Benchmarks
%pip install -U pmlb

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from pmlb import fetch_data
from sklearn import tree
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from HROCH import PHCRegressor

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Regression

Let's create a data set with 40 elements and satisfying the equation
y = ((X1 & X16) | (!X4 & X19)) & (X23 | X26)

In [None]:
X = np.random.uniform(low=0.0, high=1.0, size=(10000, 40))
A = X[:, 0] * X[:, 15]
B = (1.0 - X[:, 3]) * X[:, 18]
C = A + B - A * B  # A or b
D = X[:, 22] + X[:, 25] - X[:, 22] * X[:, 25]
y = C * D

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=42)

Fit with PHCRegressor produce correct equation



In [None]:
reg = PHCRegressor(iter_limit=5000000, num_threads=1, problem='fuzzy', stopping_criteria=0, random_state=42)
reg.fit(X_train, y_train)

# predict
y_predicted = reg.predict(X_test)
test_mse = mean_squared_error(y_predicted, y_test)
test_r2 = r2_score(y_predicted, y_test)

print(f'PHCRegressor: mse= {test_mse} r2= {test_r2} eq= {str(reg.sexpr)} ')

PHCRegressor: mse= 8.485926183887007e-16 r2= 0.9999999999999803 eq= ((x23|x26)&((x16&x1)|(x19&(!x4))))
 


Compare with DecisionTreeRegressor and RandomForestRegressor



In [None]:
reg = DecisionTreeRegressor()
reg.fit(X_train, y_train)

y_predicted = reg.predict(X_test)
test_mse = mean_squared_error(y_predicted, y_test)
test_r2 = r2_score(y_predicted, y_test)

print(f'DecisionTreeRegressor: mse= {test_mse} r2= {test_r2}')


DecisionTreeRegressor: mse= 0.006423857643730605 r2= 0.8501713285772761


In [None]:
reg = RandomForestRegressor()
reg.fit(X_train, y_train)

y_predicted = reg.predict(X_test)
test_mse = mean_squared_error(y_predicted, y_test)
test_r2 = r2_score(y_predicted, y_test)

print(f'RandomForestRegressor: mse= {test_mse} r2= {test_r2}')

RandomForestRegressor: mse= 0.0017124935140343546 r2= 0.9518370843775761


## Classification

A good simple example is the parity5 and parity5+5 dataset from pmlb. 
PHCRegressor will find equations (((x5^(x4^x3))^x1)^x2) or similar that can be simplified to this form. The equation Xor fits the parity calculation perfectly. The DecisionTreeClassifier and RandomForestClassifier fit the training data with an r2 score of 1.0, but absolutely not the test data.

In [43]:
datasets = [(fetch_data('parity5'), 'parity5'), (fetch_data('parity5+5'), 'parity5+5')]
random_states = [42, 1083, 20133, 35879, 45688, 211565, 1212248, 58985945, 48994485, 5454544]
classifiers = {PHCRegressor: {'problem':'fuzzy', 'iter_limit':5000000, 'num_threads':1}, DecisionTreeClassifier: {}, RandomForestClassifier: {}}

for classifier, params in classifiers.items():
  print(classifier.__name__)
  for dataset, dataset_name in datasets:
    print(dataset_name)
    Y = np.ravel(pd.DataFrame(dataset, columns=['target']).to_numpy())
    X = dataset.drop(columns=['target']).to_numpy()
    for rs in random_states:
      X_train, X_test, y_train, y_test = train_test_split(X, Y, train_size=0.75, test_size=0.25, random_state=rs)
      clf = classifier(random_state=rs, **params)
      clf.fit(X_train, y_train)
      yp_train = clf.predict(X_train)
      if classifier is PHCRegressor:
        yp_train = (yp_train > 0.5)*1.0
      r2_train = r2_score(y_train, yp_train)
      rms_train = np.sqrt(mean_squared_error(y_train, yp_train))
      yp = clf.predict(X_test)
      if classifier is PHCRegressor:
        yp = (yp > 0.5)*1.0
      r2 = r2_score(y_test, yp)
      rms = np.sqrt(mean_squared_error(y_test, yp))
      print(f'train: r2={r2_train} rms={rms_train} test: r2={r2} rms={rms}')
      if classifier is PHCRegressor:
        print(f'eq: {clf.sexpr}')





PHCRegressor
parity5
train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: ((0.00000000000000000000)|(x2^((x5|x5)^(x4^(x1^x3)))))

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: ((x4^x2)^(x5^((0.00000000000000000000)|((x1&x1)^x3))))

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (((x2&x2)^(((x1&x1)^x3)^(x5^x4)))&((x2&x2)^(((x1&x1)^x3)^(x5^x4))))

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (x4^(x5^(x3^(x1^x2))))

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (!((x5^(x4&x4))^((x2^(!x3))^x1)))

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (((x5^(x4^x3))^x1)^x2)

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (x2^(((x3^x1)^(x4^x5))|((x3^x1)^(x4^x5))))

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (x5^((x4&(!(!x4)))^(x3^(x2^x1))))

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (((x3^(x4^x2))^((0.00000000000000000000)|x1))^x5)

train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (((0.00000000000000000000)|x1)^(x4^((x3^x2)^x5)))

parity5+5
train: r2=1.0 rms=0.0 test: r2=1.0 rms=0.0
eq: (((!x3)