# Quick Start

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


{: .attention }
> Once pyxai has been installed, you can use these commands:
>
> ```python3 <file.py>```: Execute a python file with lines of code using PyXAI\\
> ```python3 -m pyxai -gui```: Open the PyXAI's Graphical User Interface\\
> ```python3 -m pyxai -explanations```: Copy the explanations backups of GUI in your current directory\\
> ```python3 -m pyxai -examples```: Copy the examples in your current directory

Let us give a quick illustration of PyXAI, showing how to compute explanations given a ML model.
<img src="attachment:irislatex.png" alt="Iris" width="800" />

The first thing to do is to import the components of PyXAI. In order to import only the necessary methods into a project, PyXAI is composed of three distinct modules: ```Learning```, ```Explainer```, and ```Tools```.

In [2]:
!pip install pyxai -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/643.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.7/643.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m266.2/643.4 kB[0m [31m3.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m634.9/643.4 kB[0m [31m6.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m643.4/643.4 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.0/74.0 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m


In [5]:
from pyxai import Learning, Explainer, Tools

Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



If you encounter a problem, this is certainly because you need the python package PyXAI to be installed on your system. You need to execute a command like ```python3 -m pip install pyxai```. See the [Installation](/documentation/installation) page for details.

In most situations, the use of PyXAI library requires to achieve two successive steps: first the generation of an ML model from a dataset (with the ```Learning``` module) and second, given the learned model, the computation of explanations for some instances (using the ```Explainer``` module).

## Machine Learning

For this example, we want to create a decision tree classifier for the iris dataset using [Scikit-learn](https://scikit-learn.org/stable/).

In [45]:
file_dataset_path = "/content/drive/MyDrive/PROJETO - THIAGO ALVES /pyxai/iris.csv"

In [46]:
learner = Learning.Scikitlearn(file_dataset_path, learner_type=Learning.CLASSIFICATION) # modifique o caminho para o seu dataset

data:
     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width         Species
0             5.1          3.5           1.4          0.2     Iris-setosa
1             4.9          3.0           1.4          0.2     Iris-setosa
2             4.7          3.2           1.3          0.2     Iris-setosa
3             4.6          3.1           1.5          0.2     Iris-setosa
4             5.0          3.6           1.4          0.2     Iris-setosa
..            ...          ...           ...          ...             ...
145           6.7          3.0           5.2          2.3  Iris-virginica
146           6.3          2.5           5.0          1.9  Iris-virginica
147           6.5          3.0           5.2          2.0  Iris-virginica
148           6.2          3.4           5.4          2.3  Iris-virginica
149           5.9          3.0           5.1          1.8  Iris-virginica

[150 rows x 5 columns]
--------------   Information   ---------------
Dataset name: /content/drive/MyDriv

It is possible to download this dataset from the [UCI Machine Learning Repository -- Iris Data Set](http://archive.ics.uci.edu/ml/datasets/Iris) or [here](/assets/notebooks/dataset/iris.csv). In our case, it is located in the directory ```../dataset```. The parameter ```learner_type=Learning.CLASSIFICATION``` asks to achieve a classification task. The Iris Dataset contains four features (length and width of sepals and petals) of 50 samples of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). The goal of the classifier is to find the right outcome for an instance among three classes: setosa, virginica, versicolor.   

To create models, PyXAI implements methods to directly run an ML experimental protocol (with the train-test split technique). Several cross-validation methods (```Learning.HOLD_OUT```, ```Learning.K_FOLDS```, ```Learning.LEAVE_ONE_GROUP_OUT```) and models (```Learning.DT```, ```Learning.RF```, ```Learning.BT```) are available.

In this example, we compute a Decision Tree (see the parameter ```output=Learning.DT```).

In [47]:
model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.DT)  # no output é onde define se é um modelo de DF ou RF por exemplo

---------------   Evaluation   ---------------
method: HoldOut
output: DT
learner_type: Classification
learner_options: {'max_depth': None, 'random_state': 0}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   micro_averaging_accuracy: 98.51851851851852
   micro_averaging_precision: 97.77777777777777
   micro_averaging_recall: 97.77777777777777
   macro_averaging_accuracy: 98.51851851851853
   macro_averaging_precision: 97.22222222222221
   macro_averaging_recall: 98.14814814814815
   true_positives: {'Iris-setosa': 16, 'Iris-versicolor': 17, 'Iris-virginica': 11}
   true_negatives: {'Iris-setosa': 29, 'Iris-versicolor': 27, 'Iris-virginica': 33}
   false_positives: {'Iris-setosa': 0, 'Iris-versicolor': 0, 'Iris-virginica': 1}
   false_negatives: {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 0}
   accuracy: 97.77777777777777
   sklearn_confusion_matrix: [[16, 0, 0], [0, 17, 1], [0, 0, 11]]
nTraining instances: 105
nTest instances: 45

-

Uma vez criado o modelo, selecionamos uma instância para poder derivar explicações. Aqui, uma instância bem classificada é escolhida: o modelo prevê a primeira classe ```0``` (ou seja, a classe Iris setosa) graças ao ```correct=True``` e ao ```predictions=[ 0]``` parâmetros.



In [79]:
instance, prediction = learner.get_instances(model, n=2, correct=True, predictions=[0])

# n = quantidade de instancias que a gente quer daquela classe predictions0

print(instance)
print('\n', prediction)
instance[0]

---------------   Instances   ----------------
number of instances selected: 2
----------------------------------------------
(array([5.1, 3.5, 1.4, 0.2]), 0)

 (array([4.9, 3. , 1.4, 0.2]), 0)


array([5.1, 3.5, 1.4, 0.2])

In [80]:
# padrao
instance, prediction = learner.get_instances(model, n=1, correct=True, predictions=[0])

# n = quantidade de instancias que a gente quer daquela classe predictions0

print(instance)
print('\n', prediction)

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------
[5.1 3.5 1.4 0.2]

 0


Please consult the [Learning](/documentation/learning/generating) page for more details about this ML part.

## Explainer

O módulo ```Explainer``` contém diferentes métodos para gerar explicações. Para isso, o modelo e a instância alvo são definidos como parâmetros da função ```initialize``` deste módulo.


In [81]:
explainer = Explainer.initialize(model, instance)

A função ```initialize``` converte a instância em variáveis ​​binárias (chamadas de representação binária) codificando o modelo associado. Mais precisamente, cada uma dessas variáveis ​​binárias representa uma condição (característica $op$ valor?) no modelo onde $op$ é um operador de comparação padrão. [Scikit-learn](https://scikit-learn.org/stable/) e [XGBoost](https://xgboost.readthedocs.io/en/stable/) usam o operador $\ge$. Com relação à instância, o sinal de uma variável binária indica se a condição é verdadeira ou não no modelo. Aqui podemos ver a instância e sua representação binária. Podemos ver as condições relacionadas à representação binária usando a função ```to_features``` que é explicada abaixo.

In [54]:
print("instance:", instance)
print("binary representation:", explainer.binary_representation)
print("conditions related to the binary representation:", explainer.to_features(explainer.binary_representation,eliminate_redundant_features=False))
print("conditions related to the binary representation:", explainer.to_features(explainer.binary_representation,eliminate_redundant_features=True))


instance: [5.1 3.5 1.4 0.2]
binary representation: (-1, -2, -3, 4, -5)
conditions related to the binary representation: ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75', 'Petal.Width <= 1.6500000357627869', 'Petal.Width <= 1.75')
conditions related to the binary representation: ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75')


Notamos que a representação binária desta instância contém mais de 4 variáveis ​​porque a árvore de decisão do modelo é composta por cinco nós (variáveis ​​binárias). Na verdade, o recurso Petal.Width aparece 3 vezes, enquanto Sepal.Length é inútil. Consulte a página [conceitos](/documentation/explainer/concepts/) para obter mais informações sobre representações binárias.

### Abductive explanations

In PyXAI, several types of explanation are available. In their binary forms representing conditions, these are called reasons. In our example, we choose to compute one of the most popular type of explanations: a sufficient reason. A sufficient reason is an abductive explanation (any other instance X' sharing the conditions of this reason is classified by the model as X is) for which no proper subset of this reason is a sufficient reason (i.e., the explanation is minimal with respect to set inclusion).

In [60]:
sufficient_reason = explainer.sufficient_reason(n=1)
print("sufficient_reason:", sufficient_reason)

sufficient_reason: (-1,)


We can get the features involved in the reason thanks to the method ```to_features```:

In [58]:
print("to_features:", explainer.to_features(sufficient_reason))

to_features: ('Petal.Width <= 0.75',)


In [63]:
# Exemplo de uso do método `to_features` com o parâmetro `details`
detailed_features = explainer.to_features(sufficient_reason, details=True)

# Imprime a saída detalhada
"Detailed features: ", detailed_features

('Detailed features: ',
 OrderedDict([('Petal.Width',
               [{'id': 4,
                 'name': 'Petal.Width',
                 'operator': <OperatorCondition.GT: 52>,
                 'sign': True,
                 'operator_sign_considered': <OperatorCondition.LE: 51>,
                 'threshold': 0.75,
                 'weight': None,
                 'theory': None,
                 'string': 'Petal.Width <= 0.75'}])]))

The ```to_features``` method eliminates redundant features by default and is also able to return more information about the features using the ```details``` parameter. This method is described in the [concepts](/documentation/explainer/concepts/) page.  

We can check whether the derived explanation actually is a reason.

In [64]:
print("is sufficient: ", explainer.is_sufficient_reason(sufficient_reason))

is sufficient:  True


{: .attention }

> It is important to note that computing and checking reasons are done independently.

To conclude, the sufficient reason (```('Petal.Width < 0.75',)```) explains why the instance ```[5.1 3.5 1.4 0.2]``` is well classified by the model (the prediction was Iris-setosa). It is because the fourth feature (the petal width in cm), set to 0.2 cm, is not greater or equal than 0.75 cm (see the attached image).
<img src="attachment:irislatex.png" alt="Iris" width="800" />

### Contrastive explanations

Now, let us consider another instance, a wrongly classified one using the parameter ```correct=False``` of the function ```get_instance```. We set this instance to the explainer with the ```set_instance``` method.

In [65]:
instance, prediction = learner.get_instances(model, n=1, correct=False)
explainer.set_instance(instance)

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------


We can explain why this instance is **not** classified differently by providing a contrastive explanation.

In [66]:
contrastive_reason = explainer.contrastive_reason()
print("contrastive reason", contrastive_reason)
print("to_features:", explainer.to_features(contrastive_reason, contrastive=True))

contrastive reason (1,)
to_features: ('Petal.Width > 0.75',)


More information about explanations can be found in the [Explainer Principles](/documentation/explainer/) page, the [Explaining Classification](/documentation/classification/) page and the [Explaining Regression](/documentation/regression/) page.

In [69]:
instances_with_prediction = learner.get_instances(model, n=10, indexes=Learning.TEST)
for instance, prediction in instances_with_prediction:
    print("instance:", instance)
    print("prediction", prediction)


---------------   Instances   ----------------
number of instances selected: 10
----------------------------------------------
instance: [5.8 2.8 5.1 2.4]
prediction 2
instance: [6.  2.2 4.  1. ]
prediction 1
instance: [5.5 4.2 1.4 0.2]
prediction 0
instance: [7.3 2.9 6.3 1.8]
prediction 2
instance: [5.  3.4 1.5 0.2]
prediction 0
instance: [6.3 3.3 6.  2.5]
prediction 2
instance: [5.  3.5 1.3 0.3]
prediction 0
instance: [6.7 3.1 4.7 1.5]
prediction 1
instance: [6.8 2.8 4.8 1.4]
prediction 1
instance: [6.1 2.8 4.  1.3]
prediction 1
