Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [1]:
NAME = ""
COLLABORATORS = ""

---

# In this problem we will use publications dataset and write some datalog rules to check data integrity.

## Few points:
* Refer [Clingo with Jupyter Intro](Dlv_with_Jupyter_Intro.ipynb) before attempting this notebook.
* It's important to run following cell first for rest of notebook to work.
* It's always a good idea to run cells in order. In case you have run cells in jumbled order and would want to start fresh, restart kernel from menu above.
* All clingo cells start with `%%clingo`.
* You can run your clingo cell against some basic facts and rules from a file. `set_db_file $filepath` sets the file against which your clingo cells will run.
* Each clingo cell is independent of others. Rules defined in one cell won't be available in others.
* It's nice to be able to execute clingo from within your notebook but don't forget to practice from command line. `%%clingo` is just a thin wrapper over command line and it's best to know how to use the underlying tool.
* Upon assignment submission, we will run your code against different set of facts. Please don't hardcode answers and save yourself the embarassment.

### Good luck!!

In [2]:
%reload_ext lib.clingo.clingo_magic
import os
from lib.clingo.clingo_evaluate_util import clingo_evaluate

In [3]:
# All clingo cells will run against this file containing some base facts.
publications_base_facts_and_rules_file = os.path.expanduser('~/data_readonly/datalog/publications_base.lp')
%set_db_file $publications_base_facts_and_rules_file

## We will now write various rules to find "bad" (inconsistent) data

### [10 points] The key attribute ID should uniquely determine all other attributes.

In DENIAL form we report all IC violations, i.e., where there are at least two rows having the same ID same, but some differing attributes somewhere.
Here we report both the name of the attribute and the duplicate values.


In [4]:
%%clingo {"predicate" : "icv_pid_key", "predicate_arity" : 4, "result_var": "Icv_pid_key"}

% Following code snippet and it's result will be assigned to local variable Icv_pid_key

% Change following expressions.
% In DENIAL form we report all IC violations, i.e., where there are at least two rows
% having the same ID same, but some differing attributes somewhere.
% Here we report both the name of the attribute and the duplicate values.
icv_pid_key(ID,author,A1,A2) :- 
    publication(ID, A1, _, _, _, _, _, _, _, _),
    publication(ID, A2, _, _, _, _, _, _, _, _),
    A1 < A2.

icv_pid_key(ID,year,Y1,Y2) :- 
    publication(ID, _, Y1, _, _, _, _, _, _, _),
    publication(ID, _, Y2, _, _, _, _, _, _, _),
    Y1 < Y2.

icv_pid_key(ID,title,T1,T2) :- 
    publication(ID, _, _, T1, _, _, _, _, _, _),
    publication(ID, _, _, T2, _, _, _, _, _, _),
    T1 < T2.

icv_pid_key(ID,journal,J1,J2) :- 
    publication(ID, _, _, _, J1, _, _, _, _, _),
    publication(ID, _, _, _, J2, _, _, _, _, _),
    J1 < J2.

icv_pid_key(ID,vol,V1,V2) :- 
    publication(ID, _, _, _, _, V1, _, _, _, _),
    publication(ID, _, _, _, _, V2, _, _, _, _),
    V1 < V2.

icv_pid_key(ID,no,N1,N2) :- 
    publication(ID, _, _, _, _, _, N1, _, _, _),
    publication(ID, _, _, _, _, _, N2, _, _, _),
    N1 < N2.

icv_pid_key(ID,fp,F1,F2) :- 
    publication(ID, _, _, _, _, _, _, F1, _, _),
    publication(ID, _, _, _, _, _, _, F2, _, _),
    F1 < F2.

icv_pid_key(ID,lp,L1,L2) :- 
    publication(ID, _, _, _, _, _, _, _, L1, _),
    publication(ID, _, _, _, _, _, _, _, L2, _),
    L1 < L2.

icv_pid_key(ID,publisher,P1,P2) :- 
    publication(ID, _, _, _, _, _, _, _, _, P1),
    publication(ID, _, _, _, _, _, _, _, _, P2),
    P1 < P2.


Saving output to local variable Icv_pid_key['result']
Saving code snippet to local variable Icv_pid_key['code']



#### [3 points] Test 1 for icv_pid_key.
Following test will compare output of your icv_pid_key rule against expected output.
You must have run all clingo cells above for test to pass.

In [5]:
# Following should be output of your previous cell.
# Order of predicates in the output doesn't matter.
# Run to see expected output with syntax highlighting.
expected_output = '''
icv_pid_key(4407,author,doe,kummel) icv_pid_key(4407,year,1969,2015) icv_pid_key(4407,title,ammonoids,foobar) icv_pid_key(4407,vol,10,137) icv_pid_key(4407,no,1,3) icv_pid_key(4407,fp,10,476) icv_pid_key(4407,lp,1,null) icv_pid_key(4407,publisher,null,publisher2)
'''

db_file = os.path.expanduser('~/data_readonly/datalog/publications_base.lp')
clingo_evaluate(db_file, Icv_pid_key['code'], 'icv_pid_key', 4, expected_output)


#### [7 points] Test 2 for icv_pid_key.
Following is what is called a hidden test case. This will always pass in student's version but will actually be evaluated after submission.
* We will first add some facts that are hidden from student.
* We will run descendant rule using these new facts and see if rule still behaving correctly.

In [6]:
# This cell will test the descendant with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.


### [10 points] Every journal has a single publisher, i.e., Journal --> Publisher
In denial mode, we report the journals which have multiple publishers, two publishers at a time.

In [7]:
%%clingo {"predicate" : "icv_journal_publisher", "predicate_arity" : 3, "result_var": "Icv_journal_publisher"}

% Following code snippet and it's result will be assigned to local variable Icv_journal_publisher

% Food for thought: How are null values for publishers handled by your rules?
% Do you notice different repair options, depending on whether or not a null value is reported?

% publication(I, A, Y, T, J, V, N, F, L, P).
icv_journal_publisher(J,P1,P2) :- 
    publication(ID1, _, _, _, J, _, _, _, _, P1),
    publication(ID2, _, _, _, J, _, _, _, _, P2),
    P1 < P2.


Saving output to local variable Icv_journal_publisher['result']
Saving code snippet to local variable Icv_journal_publisher['code']


### [3 points] Test 1 for icv_journal_publisher.
Following test will compare output of your icv_journal_publisher rule against expected output.
You must have run all clingo cells above for test to pass.

In [8]:
# Following should be output of your previous cell.
# Order of predicates in the output doesn't matter.
# Run to see expected output with syntax highlighting.
expected_output = '''
icv_journal_publisher(bullmcz,null,publisher1) icv_journal_publisher(bullmcz,publisher1,publisher2) icv_journal_publisher(bullmcz,null,publisher2)
'''

db_file = os.path.expanduser('~/data_readonly/datalog/publications_base.lp')
clingo_evaluate(db_file, Icv_journal_publisher['code'], 'icv_journal_publisher', 3, expected_output)

#### [7 points] Test 2 for icv_journal_publisher.
Following is what is called a hidden test case. This will always pass in student's version but will actually be evaluated after submission.
* We will first add some facts that are hidden from student.
* We will run sibling rule using these new facts and see if rule still behaving correctly.

In [9]:
# This cell will test the icv_journal_publisher with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.


### [10 points] The last page Lp cannot be smaller than the first page Fp.
In DENIAL form, we report the ones for which last page is smaller than first.

In [10]:
%%clingo {"predicate" : "icv_firstpage_lastpage", "predicate_arity" : 3, "result_var": "Icv_firstpage_lastpage"}

% Following code snippet and it's result will be assigned to local variable Icv_firstpage_lastpage

% Change following expression.

% publication(I, A, Y, T, J, V, N, F, L, P).
icv_firstpage_lastpage(ID,F,L) :- 
    publication(ID, _, _, _, _, _, _, F, L, _),
    F > L.



Saving output to local variable Icv_firstpage_lastpage['result']
Saving code snippet to local variable Icv_firstpage_lastpage['code']



#### [3 points] Test 1 for icv_firstpage_lastpage.
Following test will compare output of your icv_firstpage_lastpage rule against expected output.
You must have run all clingo cells above for test to pass.

In [11]:
# Following should be output of your previous cell.
# Order of predicates in the output doesn't matter.
# Run to see expected output with syntax highlighting.
expected_output = '''
icv_firstpage_lastpage(6755,91,9) icv_firstpage_lastpage(4407,10,1)
'''

db_file = os.path.expanduser('~/data_readonly/datalog/publications_base.lp')
clingo_evaluate(db_file, Icv_firstpage_lastpage['code'], 'icv_firstpage_lastpage', 3, expected_output)



#### [7 points] Test 2 for icv_firstpage_lastpage.
Following is what is called a hidden test case. This will always pass in student's version but will actually be evaluated after submission.
* We will first add some facts that are hidden from student.
* We will run icv_person_has_parent rule using these new facts and see if rule still behaving correctly.

In [12]:
# This cell will test the icv_person_has_parent with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.


### [10 points] Inclusion Dependency: Every cited publication in CITES also occurs in PUBLICATION.

In DENIAL form, we report those publications which are in CITES but not in PUBLICATION.

In [13]:
%%clingo {"predicate" : "icv_cited_publication", "predicate_arity" : 1, "result_var": "Icv_cited_publication"}

% Following code snippet and it's result will be assigned to local variable Icv_cited_publication

% Change following expression.
%(Inclusion Dependency): Every cited publication in CITES also occurs in PUBLICATION.

% cites(Pid1, Pid2) says that Pid1 is citing Pid2, i.e., Pid2 is cited.
cited_publication(P2) :- publication(P2, _, _, _, _, _, _, _, _, _), cites(_, P2).
icv_cited_publication(P2) :- cites(P1,P2), not cited_publication(P2).


Saving output to local variable Icv_cited_publication['result']
Saving code snippet to local variable Icv_cited_publication['code']


#### [3 points] Test 1 for icv_cited_publication.
Following test will compare output of your icv_person_has_father_mother rule against expected output.
You must have run all clingo cells above for test to pass.

In [14]:
# Following should be output of your previous cell.
# Order of predicates in the output doesn't matter.
# Run to see expected output with syntax highlighting.
expected_output = '''
icv_cited_publication(2020) icv_cited_publication(3799)
'''

db_file = os.path.expanduser('~/data_readonly/datalog/publications_base.lp')
clingo_evaluate(db_file, Icv_cited_publication['code'], 'icv_cited_publication', 1, expected_output)


#### [7 points] Test 2 for icv_cited_publication.
Following is what is called a hidden test case. This will always pass in student's version but will actually be evaluated after submission.
* We will first add some facts that are hidden from student.
* We will run icv_person_has_father_mother rule using these new facts and see if rule still behaving correctly.

In [15]:
# This cell will test the icv_cited_publication with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.


### [10 points] If P1 cites P2 then P2's year of publication cannot be greater than P1.

In [16]:
%%clingo {"predicate" : "icv_p1_greater_p2", "predicate_arity" : 4, "result_var": "Icv_p1_greater_p2"}

% Following code snippet and it's result will be assigned to local variable Icv_p1_greater_p2

% Change following expression.
icv_p1_greater_p2(P1,P2,Y1,Y2) :- 
    publication(P1, _, Y1, _, _, _, _, _, _, _), 
    publication(P2, _, Y2, _, _, _, _, _, _, _), 
    cites(P1, P2), Y2 > Y1.

Saving output to local variable Icv_p1_greater_p2['result']
Saving code snippet to local variable Icv_p1_greater_p2['code']


#### [3 points] Test 1 for icv_p1_greater_p2.
Following test will compare output of your icv_p1_greater_p2 rule against expected output.
You must have run all clingo cells above for test to pass.

In [17]:
# Following should be output of your previous cell.
# Order of predicates in the output doesn't matter.
# Run to see expected output with syntax highlighting.
expected_output = '''
icv_p1_greater_p2(2044,2580,1934,1962)
'''

db_file = os.path.expanduser('~/data_readonly/datalog/publications_base.lp')
clingo_evaluate(db_file, Icv_p1_greater_p2['code'], 'icv_p1_greater_p2', 4, expected_output)

#### [7 points] Test 2 for icv_p1_greater_p2.
Following is what is called a hidden test case. This will always pass in student's version but will actually be evaluated after submission.
* We will first add some facts that are hidden from student.
* We will run icv_person_has_father_mother rule using these new facts and see if rule still behaving correctly.

In [18]:
# This cell will test the icv_p1_greater_p2 with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.
