# Problem Description.

The task of this project is to write an ASP encoding that solves
the problem of learning propositional formulas that classify some data.

The *input* to the problem consists a set of examples, where each example has an identifier,
is labeled as either positive or negative, and is associated with a set of attributes.

The input also includes a parameter *size* that fixes the size of the propositional formula to be learned.

For instance, 
the following table shows 6 examples over 26 attributes (from 'a' to 'z'):
* example 1 is positive and has attributes c, d, h, i, m, n, and v;
* example 2 is negative and has only attribute m;
* example 3 is positive and has attributes a, c, e, h, j and m; and so on...

Example | Class | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z 
--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
1 | pos | | | X| X| | | | X| X| | | |  X|  X| | | | | | | | X | | | | | 
2 | neg | | |  |  | | | |  |  | | | |  X|   | | | |  | | |    
3 | pos | X| | X| | X| | | X| | X| | | X| | | | | | | |
4 | neg | | | | | | | | | X| | | | | X| | | | | | |    
5 | pos | | | | | | | | | | | X| | | | | | | | | X| X| X|  | |  
6 | neg | | | | X| | | | | | | | | | | | | | |    | |


In an application in medicine,
each example could stand for a person with some attributes,
that is labeled as positive if it has some illness and as negative otherwise.

The *propositional formulas* for a given problem are defined inductively as follows:
* Every attribute of the problem is a propositional formula.
* If $F$ is a propositional formula, then $\neg F$ is a propositional formula.
* If $F$ and $G$ are propositional formulas, then $(F \wedge G)$ and $(F \vee G)$ are propositional formulas.

These are some propositional formulas for our running example:
$a$, $b$, $c$, ..., $z$, $\neg d$, $(a \vee v)$, and $\neg((d \wedge e)\vee (h \wedge \neg i))$.

The *size* of a propositional formula $F$ is defined inductively as follows:
* If $F$ is an attribute then its size is $1$.
* If $F$ has the form $\neg G$ then its size is $1$ plus the size of $G$.
* If $F$ has the form $(G \wedge H)$ or $(G \vee H)$ then its size is $1$ plus the size of $G$ plus the size of $H$.

For instance, $c$ has size $1$, $\neg d$ has size $2$, $(a \vee v)$ has size $3$, and $\neg((d \wedge e)\vee (h \wedge \neg i))$ has size $9$.

A propositional formula $F$ classifies an example $E$ *as positive* if:
* $F$ is an attribute and $E$ has attribute $F$.
* $F$ has the form $\neg G$ and $G$ does not classify $E$ as positive.
* $F$ has the form $G \wedge H$ and both $G$ and $H$ classify $E$ as positive.
* $F$ has the form $G \vee H$ and either $G$ or $H$ (or both) classify $E$ as positive.

A propositional formula $F$ classifies an example $E$ *as negative* if it does not classify it as positive.

For instance: 
* $c$ classifies examples 1 and 3 as positive and the rest as negative, 
* $\neg d$ classifies examples 2, 3, 4 and 5 as positive and the rest as negative, 
* $(a \vee v)$ classifies examples 1, 3 and 5 as positive and the rest as negative.


A propositional formula $F$ classifies an example $E$ *correctly* if:
* either $E$ is positive and $F$ classifies $E$ as positive,
* or $E$ is negative and $F$ classifies $E$ as negative.

For instance:
* $c$ classifies correctly all examples except 5,
* $\neg d$ classifies correctly examples 3, 5 and 6,
* $(a \vee v)$ classifies correctly all examples.
  
A *solution* is a propositional formula $F$ of the given size that classifies correctly as many examples as possible.
In other words, there must be no other propositional formula $G$ of the given size that classifies correctly more examples than $F$.

For instance, $c$ is a solution for size $1$, $\neg d$ is a solution for size $2$, and $(a \vee v)$ is a solution for size $3$. 
There are many solutions for each of those sizes.

In our application in medicine, the solutions
aim to represent the conditions for having the illness.
For instance, solution $a \vee v$ would suggest that persons having attribute $a$ or attribute $z$ also have the investigated illness.

To limit the number of possible propositional formulas, we make the following additional restrictions over their form:
* If $A$ and $B$ are attributes and $A$ is smaller than $B$ (according to the order of terms in *clingo*)
  then $(B \wedge A)$ and $(B \vee A)$ are not valid propositional formulas.
* Additionally, a formula of the form $(F \wedge G)$ or $(F \vee G)$ is not valid if either:
  - $F$ is not an attribute but $G$ is an attribute.
  - $F$ has the form $(H \wedge I)$ or $(H \vee I)$, and $G$ has the form $\neg J$.
  - $F$ has the form $(H \wedge I)$ and $G$ has the form $(J \vee K)$.

For instance, these formulas are valid:
* $(e \vee v)$, $(d \vee (a \vee z))$, $(\neg (x \wedge y) \vee (a \vee b))$, and $((c \vee d) \wedge (a \wedge b))$
  
but these are not:
* $(v \vee e)$, $((a \vee z) \vee d)$, $((a \vee b) \vee \neg (x \wedge y))$, and $( (a \wedge b) \wedge (c \vee d))$.

In this project we do not consider propositional formulas that are not valid, or that contain propositional formulas that are not valid.


# Representation in ASP.

The examples are represented by facts over the following predicates:    
```
attribute(A).    % A is an attribute
example(E,C).    % example E is of class C (where C can be either pos or neg) 
    has(E,A).    % example E has attribute A
```
Every problem instance includes also a constant declaration of the form
```
#const size=v.
```
where `v` is the size of the learned propositional formulas.

Our running example is represented as follows:
```
#const size=3.
attribute(a;c;d;e;h;i;j;k;m;n;t;u;v).
example(1,pos). has(1,c). has(1,d). has(1,h). has(1,i). has(1,m). has(1,n). has(1,v).
example(2,neg). has(2,m).
example(3,pos). has(3,a). has(3,c). has(3,e). has(3,h). has(3,j). has(3,m).
example(4,neg). has(4,i). has(4,n).
example(5,pos). has(5,k). has(5,t). has(5,u). has(5,v).
example(6,neg). has(6,d).
```
Observe that the facts over predicate `attribute/1` refer only to attributes that belong to at least one example.

A solution is represented by atoms over the following predicate:  
```
for(I,T,X).   % there is a formula with identifier I,
              % of type T (which can be either `atom`, `neg`, `or`, or `and`)
              % and X is an attribute if T is `atom` and otherwise it is an identifier
```
* The solution formula has identifier 1.
* It `T` is `atom` then `X` is an attribute, and `for(I,atom,X)` means that formula `I` is the atom (or attribute) `X`.
* It `T` is `neg` then `X` is the identifier of another formula, and `for(I,neg,X)` means that formula `I` is the negation of formula `X`.
* It `T` is  `or` then `X` and `X+1` are identifiers of other formulas, and  `for(I,or,X)` means that formula `I` is the disjunction ($\vee$) over formulas `X` and `X+1`.
* It `T` is `and` then `X` and `X+1` are identifiers of other formulas, and `for(I,and,X)` means that formula `I` is the conjunction ($\wedge$) over formulas `X` and `X+1`.

The solution $c$ for size 3 is represented by 
```
for(1,atom,c)
```

The solution $\neg d$ for size 2 is represented by 
```
for(1,neg,2) for(2,atom,d)
```
that represents that formula 1 is a negation over formula 2, 
that is the atom $d$.

The solution $(a \vee v)$ for size 3  is represented by the following atoms:
```
for(1,or,2) for(2,atom,a) for(3,atom,v)
```
where: 
* `for(1,or,2)` means that formula 1 is an or ($\vee$) over formulas 2 and 3,
* `for(2,atom,a)` means that formula 2 is the atom $a$, and
* `for(3,atom,v)` means that formula 3 is the atom $v$.

The formula $\neg(
(d \wedge e)
\vee 
(h \wedge \neg i))$ of size 9 is represented by the following atoms:
```
for(1,neg,2) for(2,or,3) for(3,and,5) for(4,and,7) for(5,atom,d) for(6,atom,e) for(7,atom,h) for(8,neg,9) for(9,neg,i)
```

To ensure that every valid propositional formula has a unique representation in this format, we require that:
* The identifiers `I` are smaller or equal to the input parameter *size*.
* Additionally, if:
  - the atoms `for(I,T1,I1)` and `for(J,T2,I2)` belong to an answer set,
  - `I < J`, and `T1` and `T2` are either `neg`, `or` or `and`,
  - then `I1 < I2`. Moreover, if the previous conditions hold and `T1` is `or` or `and`, then `I1 + 1 < I2`.

# Visualization

You can use clingraph to visualize the solutions:
* https://github.com/potassco/clingraph

For example, you can visualize the solutions to our example (`instance-06-03-0.lp`) with this command:

In [None]:
! cat asp/asp/solutions/instance-06-03.json | clingraph --viz-encoding=asp/viz.lp --out=render --format=png --dir img --name-format=instance-06-03-{model_number} 

For the first answer set, representing solution $(a \vee v)$, it generates the following image:

<img src="img/instance-06-03-0.png" alt="Image 1" width="150" height="120"> 


Once you have a working encoding, you can visualize the answers with this command:

In [None]:
! clingo asp/propositional.lp asp/instances/instance-06-03.lp | clingraph --viz-encoding=asp/viz.lp --out=render --format=png --dir img --name-format=instance-06-03-{model_number} 

# Framework.

The directory ``asp`` contains the files that you need for the project. In the directory ``asp/instances`` you can find the instances (our example is ``instance-06-03.lp``), and in the directory ``asp/solutions`` you can find their solutions in ``json`` format. 

You have to submit a file named ``propositional.lp``, included as a template in the directory ``asp``, that contains the following line (and no more ``#show`` statements) so that in the output only the atoms of predicate ``for/3`` are shown:

```
#show for/3.
```

You can check if your encoding solves correctly all instances by running the ``Python`` script ``test.py`` as follows:
* ``python asp/test.py -e asp/propositional.lp  -i asp/instances -s asp/solutions -opt -t 100 -m 1 --correct``

Note that this call only checks if one optimal model is a solution of the problem.

The timeout for each instance is set to `100` seconds, but you can use any other value instead.

For help, type `python asp/test.py --help`.

We recommend you to work locally in your computer, using your own installation of ``clingo``.

You can also run your encoding in the next cell. It is not recommended to work in this notebook at ``Binder``, but if you do it, remember to download the files that you modify to your computer, otherwise you will lose your changes.

In [None]:
%%clingo asp/instances/instance-06-03.lp -


%
% Guess solution candidates
%


%
% Check solution candidates
%


%
% Optimize solutions
%


%
% Display
%

#show for/3.

# Formalities.

You can work on the solution alone or in groups of two people. 
Different groups have to submit different solutions, in case
of plagiarism all groups involved will fail the project. 

For every instance, 
your encoding together with the instance must have at least one answer set, 
and every optimal answer set of that program must be a solution for that instance.
This is tested automatically by the script ``test.py``. 

You can still pass the project if some instances are not solved within
the time limit.


We will send you further instructions about the submission process from Moodle.

# Tips:

* Commands to find all optimal stable models look as follows:
```
clingo propositional.lp instance.lp --opt-mode=optN 0
```

* With option `--quiet=1` you can avoid printing non optimal solutions. More clingo options can be found using option `--help=3`.
  
* If you are stuck you can contact us. We will do out best to answer all your questions. You can send us questions and remarks either via Moodle or by email.

* Start as soon as possible to avoid running out of time. However, if you still realize that you have problems making it before the deadline, please contact us instead of copying another solution.