In [None]:
!pip3 install feyn

# QLattice on diabetic predictions

Here we use a QLattice to determine whether `age` or `gender` has an predictive power whether someone develops diabetes or not. Here we use a community QLattice which is available for non-commercial use

In [None]:
import feyn
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.utils.class_weight import compute_sample_weight

In [None]:
random_seed = 123

In [None]:
data = pd.read_csv("/kaggle/input/predict-diabetes-based-on-diagnostic-measures/diabetes.csv", decimal=",")
data.replace(to_replace="Diabetes", value=1, inplace=True)
data.replace(to_replace="No diabetes", value=0, inplace=True)
data.drop("patient_number", axis=1, inplace=True)

In [None]:
train, test = train_test_split(data, test_size=0.4, random_state=random_seed, stratify=data['diabetes'])
train.head()

The dataset is a little unbalanced so we will use sample weights.

In [None]:
sw = compute_sample_weight("balanced", train["diabetes"])

In [None]:
ql = feyn.connect_qlattice()
ql.reset(random_seed)

Here we will just the `auto_run` to find what are the best predictors.

In [None]:
models = ql.auto_run(train,
                     output_name="diabetes",
                     kind="classification",
                     stypes={"gender": "c"},                 
                     criterion="bic",
                     sample_weights=sw,
                    )

In [None]:
best = models[0]

In [None]:
best.plot(train, test)

It looks like `glucose` is the best predictor here. But very little predictive power with `age` and `gender` wasn't even picked up.

In [None]:
best.plot_roc_curve(train)
best.plot_roc_curve(test)

## Only `age` and `glucose`

So it looks like `glucose` has huge predictive power but does the effect vary across different age groups? Here we will use the `query_string` parameter in `auto_run` to tell the `QLattice` we want very simple models that include only `glucose` and `age`.

In [None]:
ql.reset(random_seed)

In [None]:
query = "_['age','glucose', 2]"

In [None]:
models = ql.auto_run(train,
                     output_name="diabetes",
                     kind="classification",
                     stypes={"gender": "c"},                 
                     criterion="bic",
                     sample_weights=sw,
                     query_string=query
                    )

In [None]:
best = models[0]

In [None]:
best.plot(train, test)

A very simple model that has almost the same predictive power. Let's plot it!

In [None]:
best.plot_partial2d(train)

In [None]:
best.plot_partial(train, by="age")

Here you can see that the older one is the more like one is going to develop diabetes. However the effect is exacerbated with the higher glucose levels.

## Only `gender` and `glucose`

Here we do the same as above but instead of age we look at `gender`

In [None]:
ql.reset(random_seed)

In [None]:
query = "_['gender','glucose', 2]"

In [None]:
models = ql.auto_run(train,
                     output_name="diabetes",
                     kind="classification",
                     stypes={"gender": "c"},                 
                     criterion="bic",
                     sample_weights=sw,
                     query_string=query
                    )

In [None]:
best = models[0]

In [None]:
best.plot(train, test)

In [None]:
best.plot_partial2d(train)

Here we can see that `gender` has very little effect on whether one will develope diabetes or not.

## Conclusion

In conclusion, increase in `age` also increases ones chances of developing diabetes but `gender` has no effect.