# Hypothesis exploring with the QLattice from Abzu

This was one of the first datasets we used to test our first algorithms (named: `Tiamat` and the next generation was `the wave model` which resembles the QLattice to some degree).

Disclaimer: I'm an backend engineer at Abzu.

# Inspect the dataset

In [None]:
import pandas as pd

data = pd.read_csv("/kaggle/input/predicting-pulsar-starintermediate/pulsar_data_train.csv")
data.head()

In [None]:
# Nan values
data[data.isna().any(axis=1)]

In [None]:
# Balance
data.target_class.value_counts()
# Around 10/90

In [None]:
# Get rid of the nan values
data = data.dropna()
data.target_class.value_counts()
# Still around 10/90

In [None]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data, stratify=data["target_class"], train_size=.66, random_state=1)

# Produce hypothesis with the QLattice

The QLattice is a supervised machine learning tool for symbolic regression developed by Abzu . It is inspired by Richard Feynman's path integral formulation. That's why the python module to use it is called Feyn, and the Q in QLattice is for Quantum.

Abzu provides free QLattices for non-commercial use to anyone. These free community QLattices gets allocated for us automatically if we use Feyn without an active subscription, as we will do in this notebook. Read more about how it works here: https://docs.abzu.ai/docs/guides/getting_started/community.html

Documentation and tutorials can be found here: https://docs.abzu.ai

In [None]:
# Note: the pip install will fail unless you enable Internet in the settings to the right --->
!pip install feyn

In [None]:
import feyn
ql = feyn.connect_qlattice()

In [None]:
# Reset with a seed for reproducible results
ql.reset(random_seed=1)

In [None]:
models = ql.auto_run(train_data, output_name="target_class", kind="classification", n_epochs=20)

In [None]:
# Plot the different hypothesis
from IPython.display import display

for model in feyn.best_diverse_models(models):
    display(model.plot(data=train_data,test=test_data))

# Evaluation the best hypothesis

In [None]:
best_model = models[0] # Just blindly picking the one that scored best in the auto_run

In [None]:
# As graph
best_model

In [None]:
# As math
best_model.sympify(2, symbolic_lr=True)

In [None]:
best_model.plot_roc_curve(train_data, label="Training data")
best_model.plot_roc_curve(test_data, label="Test data")

In [None]:
best_model.plot_confusion_matrix(test_data)

## To-doooo
- Predict the supplied test-set
- Can we get some simpler and easier understandable models? Try different criterions: aic and bic. Or restrict with queries.