# Problem Description.

The task of this project is to develop an ASP encoding for
learning binary decision trees from a set of examples.

Every example may be positive or negative, 
and is associated with a set of attributes that hold in the example.

In our running example, 
the following table shows 6 examples over 22 attributes:
* example 1 is positive and has attributes 3, 4, 8, 9, 13, 14 and 22,
* example 2 is negative and has only attribute 13, and so on...

Example | Class | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ... | 20 | 21 | 22 
--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- 
1 | pos | | | X| X| | | | X| X| | | |  X|  X| | | | X|    
2 | neg | | |  |  | | | |  |  | | | |  X|   | | | |  |    
3 | pos | X| | X| | X| | | X| | X| | | X| | | | | |    
4 | neg | | | | | | | | | X| | | | | X| | | | |    
5 | pos | | | | | | | | | | | X| | | | | X| X| X|    
6 | neg | | | | X| | | | | | | | | | | | | | |    

The goal of this learning problem is to find a decision tree that 
classifies correctly (as positive or negative) all the examples.

In this project, the input also provides a parameter `n` that
fixes the number of nodes of the learned decision tree. 

A decision tree is a binary directed tree.
Each leaf is labeled as either positive or negative.
Each of the remaining nodes is labeled by a single attribute.
One of its outgoing edges is associated with the case where the attribute holds, and 
the other outgoing edge is associated with the case where the attribute does not hold.

We make two restrictions on the form of decision trees:
* an attribute cannot occur more than once in some path from the root to a leaf, and
* two leaves of the same parent cannot have the same label.

A decision tree maps each example to the (unique) path from the root to a leaf
such that the edges agree with the attributes that hold in the example.
Then, the decision tree classifies each example according to the label of that leaf.
Finally, the decision tree solves the learning problem if it classifies correctly all given examples.

The next image shows a decision tree that solves our problem.

<img width="175" height="125" src="img/decision-tree-19.png">

Green edges represent the case where an attribute holds, 
while red edges represent the other case.

This decision tree first decides on attribute 1.

If it holds, then the example belongs to the positive class (this classifies example 3).

Otherwise, the tree decides on attribute 22.

If it holds, then the example belongs to the positive class (this classifies examples 1 and 5), 
and otherwise the example belongs to the negative class (this classifies examples 2, 4 and 6).

If we replaced attribute 22 by attribute 14, for instance, 
then the tree would not solve our problem because it would classify 
example 4 as positive and example 5 as negative.

There are other decision trees that explain our set of examples, but all of them have at least 5 nodes.


# Representation in ASP.

The examples are represented by facts over the following predicates:    
```
example(E,C). % example E belongs to class C (where C can be either pos or neg) 
  holds(E,A). % in example E the attribute A holds
```
Every problem instance also contains a constant declaration of the form
```
#const n=v.
```
for some **odd** integer `v`, 
that fixes the number of nodes of the tree.

Our running example is represented as follows:
```
#const n=5.
example(1,pos). holds(1,3). holds(1,4). holds(1,8). holds(1,9). holds(1,13). holds(1,14). holds(1,22).
example(2,neg). holds(2,13).
example(3,pos). holds(3,1). holds(3,3). holds(3,5). holds(3,8). holds(3,10). holds(3,13).
example(4,neg). holds(4,9). holds(4,14).
example(5,pos). holds(5,11). holds(5,20). holds(5,21). holds(5,22).
example(6,neg). holds(6,4).
```

A solution is represented by atoms over the following predicates:  
```
edge(X,Y).    % there is an edge between node X and node Y
decision(N,A) % node N decides on attribute A
positive(N).  % node N is assigned to the positive class
```   
Predicate `edge/2` defines the form of the tree, while `decision/2` and `positive/1` assign the labels.

The solution of our example consists of the following atoms:
```
edge(1,2) edge(1,3) edge(2,4) edge(2,5) 
decision(1,1) decision(2,22) positive(3) positive(5) 
```

Above we have stated two restrictions on the form of the trees:
* an attribute cannot occur more than once in some path from the root to a leaf, and
* two leaves of the same parent cannot have the same label.

In our representation in ASP, nodes are identified by numbers from `1` to `n`. 
To reduce the number of possible solutions, we consider the following 
additional restrictions:
* the root is number `1`,
* the (two) children of a node X must be consecutive nodes bigger than X;
* the smallest child of a node is associated with the case where the attribute of the node does not hold, 
  and the biggest children is associated with the case where the attribute holds;
* if X and Y are two nodes and X is smaller than Y, then the children of X are also smaller than the children of Y.

According to this, the root `1` has always children `2` and `3`.
In fact, there is a unique unlabeled tree of `3` nodes,
and there are exactly two unlabeled trees of `5` nodes.
You can find all unlabeled trees of sizes `3`, `5`, `7`, `9` and `11` in the folder `asp/trees`.
See below how to visualize them.

# Visualization

You can use clingraph to visualize the unlabeled trees as well as the decision trees:
* https://github.com/potassco/clingraph

For example, you can visualize the unlabeled trees of `5` nodes with this command:

In [None]:
! cat asp/trees/n_equals_05.json | clingraph --viz-encoding=asp/tree_viz.lp --out=render --format=png --dir img --name-format=tree-{model_number}

It generates the following images:

<img src="img/tree-0.png" alt="Image 1" width="200" height="150"> <img src="img/tree-1.png" alt="Image 2" width="200" height="150">

Note that clingraph does not print the children of a node in order.
For example, in those images, node `3` occurs to the left of node `2`.

Once you have an encoding that generates unlabeled trees, you can visualize the answers with this command:

In [None]:
! clingo -c n=5 asp/decision-trees.lp 0 --outf=2 | clingraph --viz-encoding=asp/tree_viz.lp --out=render --format=png --dir img --name-format=tree-{model_number}

The solutions of our running example can be visualized with this command:

In [None]:
! cat asp/solutions/instance-06-05.json | clingraph --viz-encoding=asp/viz.lp --out=render --format=png --dir img --name-format=decision-tree-{model_number}

The first image of this notebook was generated like that. 

Once you have your final encoding, you can visualize the answers with this command:

In [None]:
! clingo asp/decision-trees.lp asp/instances/instance-06-05.lp 0 --outf=2 | clingraph --viz-encoding=asp/viz.lp --out=render --format=png --dir img --name-format=decision-tree-{model_number}

# Framework.

The directory ``asp`` contains the files that you need for the project. In the directory ``asp/instances`` you can find the instances (our example is ``instance-06-05.lp``), and in the directory ``asp/solutions`` you can find their solutions in ``json`` format. 

You have to submit a file named ``decision-trees.lp``, included as a template in the directory ``asp``, that contains the following line (and no more ``#show`` statements) so that in the output only the atoms of predicates ``edge/2``, ``decision/2`` and ``positive/1`` appear:

```
#show edge/2.
#show decision/2.
#show positive/1.
```

You can check if your encoding solves correctly all instances by running the ``Python`` script ``test.py`` as follows:
* ``python asp/test.py -e asp/decision-trees.lp -i asp/instances -s asp/solutions -t 180``

**NOTE: The script ``test.py`` is not available yet, but we will upload it soon.**

In this case, the timeout for each instance is set to `180` seconds, but you can use any other value instead.

For help, type `python asp/test.py --help`.

We recommend you to work locally in your computer, using your own installation of ``clingo``.

For this, you can run the next cell to generate a zip file of this directory. The zip file will be stored in the parent directory with the name `decision-trees.zip`. You can click on the folder symbol at the left of the screen to look for it and download it.

In [None]:
import os
from shutil import make_archive
make_archive('../decision-trees', 'zip', os.getcwd())

You can also run your encoding in the next cell. It is not recommended to work in this notebook at ``Binder``, but if you do it, remember to download the files that you modify to your computer, otherwise you will lose your changes.

In [None]:
%%clingo 0 asp/instances/instance-06-05.lp -

node(1..n).

% Your encoding please...

#show edge/2.
#show decision/2.
#show positive/1.

# Formalities.
You can work on the solution alone or in groups of two people. 
Different groups have to submit different solutions, in case
of plagiarism all groups involved will fail the project. 

Your solution should represent correctly all solutions for every instance, 
but you can still pass the project if some instances are not solved within
the time limit.
This is tested automatically by the script ``test.py``. 

We will send you further instructions about the submission process from Moodle.

# Tips:

* First, write an encoding that generates unlabeled trees, and 
  check that it generates the solutions in `asp/trees` for `n=3,5,7,9,11`.
  
* Then, extend the encoding to represent decision trees, 
  and try it with the smallest instances in `asp/instances` one by one. 
  Only once this works, you should try all instances using the script `test.py`.
  
* If you are stuck you can contact us. We will do out best to answer all your questions. You can send us questions and remarks either via Moodle or by email.

* Start as soon as possible to avoid running out of time. However, if you still realize that you have problems making it before the deadline, please contact us instead of copying another solution.