# How to predict missing data

LinkML implements the "CRUDSI" design pattern. In addition to **Create**, **Read**, **Update**, **Delete**, LinkML also supports Search and *Inference*.

The framework is designed to support different kinds of inference, including rule-based and LLMs. This notebooks shows simple ML-based inference using scikit-learn DecisionTrees.

We will use the Iris dataset:

In [18]:
%%bash
linkml-store -i ../../tests/input/iris.jsonl describe

              count unique     top freq   mean       std  min  25%   50%    75%  max
petal_length  100.0    NaN     NaN  NaN  2.861  1.449549  1.0  1.5  2.45  4.325  5.1
petal_width   100.0    NaN     NaN  NaN  0.786  0.565153  0.1  0.2   0.8    1.3  1.8
sepal_length  100.0    NaN     NaN  NaN  5.471  0.641698  4.3  5.0   5.4    5.9  7.0
sepal_width   100.0    NaN     NaN  NaN  3.099  0.478739  2.0  2.8  3.05    3.4  4.4
species         100      2  setosa   50    NaN       NaN  NaN  NaN   NaN    NaN  NaN


## Training and Inference

We can perform training and inference in a single step:

In [9]:
%%bash
linkml-store -i ../../tests/input/iris.jsonl infer -t sklearn -T species -q "{petal_length: 2.5, petal_width: 0.5, sepal_length: 5.0, sepal_width: 3.5}" 



predicted_object:
  species: setosa
confidence: 1.0


## Saving the Model

Performing training and inference in a single step is convenient where training is fast, but more typically we'd want to save the model for later use:

In [11]:
%%bash
linkml-store -i ../../tests/input/iris.jsonl infer -t sklearn -T species -E "tmp/iris-model.joblib"

We can use a pre-saved model in inference:

In [14]:
%%bash
linkml-store -i ../../tests/input/iris.jsonl infer -t sklearn -L "tmp/iris-model.joblib" -q "{petal_length: 2.5, petal_width: 0.5, sepal_length: 5.0, sepal_width: 3.5}" 



predicted_object:
  species: setosa
confidence: 1.0


## Exporting models to explainable visualizations

We can export the model to a visual representation to make it more explaininable:

In [15]:
%%bash
linkml-store -i ../../tests/input/iris.jsonl infer -t sklearn -L "tmp/iris-model.joblib" -E "tmp/iris-model.png"

![img](tmp/iris-model.png)

In [29]:
%%bash
linkml-store -i ../../tests/input/iris.jsonl infer -t sklearn -L tmp/iris-model.joblib -E tmp/iris-model.rulebased.yaml

## Generating a rule-based model

Although traditionally ML is used for *statistical inference*, sometimes we might want to use ML (e.g. Decision Trees) to generate
simple purely deterministic rule-based models.

linkml-store has a different kind of inference engine that works using LinkML schemas, specifically

- `rules` at the class an slot level
- `expressions` that combine slot assignments logically and artithmetically

We can export (some) ML models to this format:

In [30]:
%%bash
cat tmp/iris-model.rulebased.yaml

class_rules: null
config:
  feature_attributes:
  - petal_length
  - petal_width
  - sepal_length
  - sepal_width
  target_attributes:
  - species
slot_expressions:
  species: ("setosa" if ({petal_width} <= 0.8000) else "versicolor")
slot_rules: null


In [32]:
%%bash
linkml-store --stacktrace -i ../../tests/input/iris.jsonl infer -t rulebased -L tmp/iris-model.rulebased.yaml  -q "{petal_length: 2.5, petal_width: 0.5, sepal_length: 5.0, sepal_width: 3.5}" 

EVAL {'petal_length': 2.5, 'petal_width': 0.5, 'sepal_length': 5.0, 'sepal_width': 3.5}
predicted_object:
  petal_length: 2.5
  petal_width: 0.5
  sepal_length: 5.0
  sepal_width: 3.5
  species: setosa
