# Logic Programming with the MARS Output

Since the output of MARS is weighted confidences, we can make a logic program out of that to see how likely each MoA discovered is.

In [1]:
import scallopy
import json
import os
from collections import defaultdict
import ast

In [2]:
ctx = scallopy.ScallopContext()

Let's start with some example output.

We'll pull from the following directory:

In [3]:
EXAMPLE_DIR = '../data/output/KG_size_experiment/full_KG/PoLo-2'

First, let's get the confidences. Since we did 5 replicates, we have to get the average of the confidence scores. 

In [4]:
# Initialize a dictionary to store the averaged values
averaged_dict = defaultdict(list)

# Iterate through subdirectories
for subdir, _, _ in os.walk(EXAMPLE_DIR):
    confidences_file = os.path.join(subdir, 'confidences.txt')

    # Check if 'confidences.txt' exists in the current subdirectory
    if os.path.isfile(confidences_file):
        # Read the 'confidences.txt' file into a dictionary
        # read in json file
        with open(confidences_file, 'r') as f:
            current_dict = json.load(f)
        for lst in current_dict['CtBP']:
            averaged_dict[str(lst[1::])].append(float(lst[0]))

# Calculate the average of each value in the dictionary
for key, value in averaged_dict.items():
    averaged_dict[key] = sum(value) / len(value)

## Adding the Relations

Additionally, for each of these rules, we need to add the relations into the scallopy logic program:

In [5]:
relations = set()

for key in averaged_dict.keys():
    as_list = ast.literal_eval(key)
    as_list = [i[1::] + '_' if i.startswith('_') else i for i in as_list]  # Scallop doesn't like leading underscores
    relations.update(set(as_list))

In [6]:
relations

{'CdG', 'CtBP', 'CuG', 'GiG', 'GiG_', 'GpBP', 'GpBP_', 'NO_OP'}

In [7]:
# Add these relations to the scallopy program:
for relation in relations:
    ctx.add_relation(relation, (str, str))  # Add them as binary relations

## Adding the Probabilistic Rules

Now, we'll add those rules in:

In [8]:
for key, val in averaged_dict.items():
    key_as_list = ast.literal_eval(key)
    key_as_list = [i[1::] + '_' if i.startswith('_') else i for i in key_as_list]  # Scallop doesn't like leading underscores
    # Count how long the rule is, and from there, get the number of variables needed:
    variables = [chr(i+97) for i in range(len(key_as_list))]
    # Add the rule to the scallopy program:
    body = ' and '.join([f"{key_as_list[i]}({variables[i-1]}, {variables[i]})" for i in range(1, len(key_as_list))])
    rule = f"{key_as_list[0]}({variables[0]}, {variables[-1]}) = {body}"
    print(rule)
    ctx.add_rule(rule)

CtBP(a, c) = CdG(a, b) and GpBP(b, c)
CtBP(a, c) = CuG(a, b) and GpBP(b, c)
CtBP(a, d) = CtBP(a, b) and GpBP_(b, c) and GpBP(c, d)
CtBP(a, d) = CdG(a, b) and GiG_(b, c) and GpBP(c, d)
CtBP(a, d) = CdG(a, b) and GiG(b, c) and GpBP(c, d)
CtBP(a, d) = CuG(a, b) and GiG_(b, c) and GpBP(c, d)
CtBP(a, d) = CuG(a, b) and GiG(b, c) and GpBP(c, d)
CtBP(a, e) = CtBP(a, b) and GpBP_(b, c) and GiG_(c, d) and GpBP(d, e)
CtBP(a, e) = CtBP(a, b) and GpBP_(b, c) and GiG(c, d) and GpBP(d, e)
CtBP(a, e) = CdG(a, b) and GpBP(b, c) and GpBP_(c, d) and GpBP(d, e)
CtBP(a, e) = CdG(a, b) and GiG_(b, c) and GiG_(c, d) and GpBP(d, e)
CtBP(a, e) = CdG(a, b) and GiG_(b, c) and GiG(c, d) and GpBP(d, e)
CtBP(a, e) = CdG(a, b) and GiG(b, c) and GiG_(c, d) and GpBP(d, e)
CtBP(a, e) = CdG(a, b) and GiG(b, c) and GiG(c, d) and GpBP(d, e)
CtBP(a, e) = CuG(a, b) and GpBP(b, c) and GpBP_(c, d) and GpBP(d, e)
CtBP(a, e) = CuG(a, b) and GiG_(b, c) and GiG_(c, d) and GpBP(d, e)
CtBP(a, e) = CuG(a, b) and GiG_(b, c) and GiG(

## TODO: 

It's not clear to me right now how to incorporate probabilities into the rules.

## Run the Program

In [9]:
ctx.run()

In [10]:
print(list(ctx.relation("CtBP")))

[]


Well, there are no examples because we haven't grounded it yet.