# LAB2

In lab2, we will guide you through the basic Scallop Python API, called scallopy.
In this tutorial, you will learn:
1. How to construct and execute a Scallop program in Python using Scallopy.
2. How to import a .scl file into Python, and how to interact with the .scl file with Scallopy.
3. How to perform a learning task in python with Scallop

## Hello Scallopy

Let's write our first hello world Python program. You can do this purely through the scallopy interface. 

In [None]:
# We start from constructing a scallopy context
import scallopy
ctx = scallopy.ScallopContext()

# We declare a relation type using 'add_relation'. 
# This is equvalent to 'type hello(String)' in a .scl file
ctx.add_relation("hello", str)

# We add the fact hello("Hello World") to the scallopy context 
ctx.add_facts("hello", [("Hello World",)])

# We can execute the context through 'run'
ctx.run()
print(list(ctx.relation("hello")))

We can also import an scl file to the scallopy context, as shown in the example below. 

In [None]:
# We define a helper fucntion here to inspect what does the scl file look like
def print_scl_file(scl_path):
    print("---------- scl file -----------")
    !cat $scl_path
    print("-------------------------------")
    print()

In [None]:
# Setup scl file directory
import os
import scallopy

scl_dir = os.path.abspath(os.path.join(os.path.abspath("__file__"), "../scl"))
scl_path = os.path.join(scl_dir, "hello_world.scl")
print_scl_file(scl_path)

ctx = scallopy.ScallopContext()

# Import and run the scl file
ctx.import_file(scl_path)
ctx.run()
print(list(ctx.relation('hello')))

##### Learning with .scl
Because the type definition is pretty unnatural to directly perform in Python, the common practice is to put the type definition and relations into an .scl file; while we can use scallopy to add the probabilistic facts to be learnt online during the learning process.
We can thus enjoy the following benefits:
 - better efficiency
 - better error message
 - better type compatibility



## MNIST
<div>
  <img src="img/mnist_example.png" width="300"/>
</div>

### P1. Count 2
- In this jupyter notebook:
The first practice is to write a relation `digit(i, d)` where i is the image id, and d is the numerical value of the corresponding image using scallopy.
- In scl file:
Write a rule count the number of 2 existing in the image in `scl/mnist_count_2.scl`.

In [None]:
import os
import scallopy

scl_dir = os.path.abspath(os.path.join(os.path.abspath("__file__"), "../scl"))
scl_path = os.path.join(scl_dir, "mnist_count_2.scl")
print_scl_file(scl_path)

# We import the type definition and relations from the scl file 
ctx = scallopy.ScallopContext()
ctx.import_file(scl_path)

# We add the scene graph of the image into the context as facts in python
ctx.add_facts("object_color", [(0, "red"), (1, "green"), (2, "green")])

ctx.run()
print(list(ctx.relation("green_obj")))

### Learning in Scallop

##### MNIST Sum 3
Given three mnist images and their sum, can we learn what are the three digits seperatly? 
<div>
  <img src="img/mnist_example.png" width="300"/>
</div>

**Step 1** Get familiar with the MNIST sum 3 dataloader.
We have prepared two dataloaders for you, one is for training and the other is for testing.

In [None]:
from MNIST_sum_3_dataloader import mnist_sum_3_loader
import matplotlib.pyplot as plt
import torch, random

# Ensure we have duplicatable result during implementation + debug
seed = 1234
torch.manual_seed(seed)
random.seed(seed)

# Set batch size here
batch_size_train = 8
batch_size_test = 8
train_loader, test_loader = mnist_sum_3_loader(batch_size_train, batch_size_test)

# Let's take a look into the dataset
print(f"The dataset size is: {len(train_loader)}.")
for (x, y) in train_loader:
    # The dataloader will give you batches of three MNIST images and their sum 
    (a_imgs, b_imgs, c_imgs), digits = (x, y)
    print(a_imgs.shape)

    # We can peek the CLEVR image in the dataset
    imgplot = plt.imshow(a_imgs[0].reshape(28, 28), cmap='gray')
    plt.show()
    break

**Step 2** Construct a classifier `MNISTNet` that takes in an MNIST image, and return a tensor of the probability it is the number between 0~9. Here is a link to a tutorial: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

In [None]:
from torch import nn
import torch.nn.functional as F

class MNISTNet(nn.Module):
  def __init__(self):
    super(MNISTNet, self).__init__()
    # TODO
    pass

  def forward(self, x):
    # TODO
    pass


**Step 3** Construct a classifier `MNISTSum3Net` that takes in three MNIST images, and return a tensor of the distribution of their sum over 0 to 27.

In [None]:
import scallopy
class MNISTSum3Net(nn.Module):
  def __init__(self, provenance, k):
    super(MNISTSum3Net, self).__init__()
    # TODO: Initialize the nueral network here. It should include:
    #       1. MNISTNet
    #       2. Scallop program
    #       3. Forward function
    pass

  def forward(self, x):
    # TODO: Write the forward function for MNISTSum3Net
    pass

**Step 4** Setup trainer and loss function. We will use the BCE loss function for training the model.

In [None]:
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from tqdm import tqdm

def bce_loss(output, ground_truth):
  (_, dim) = output.shape
  gt = torch.stack([torch.tensor([1.0 if i == t else 0.0 for i in range(dim)]) for t in ground_truth])
  return F.binary_cross_entropy(output, gt)

class Trainer():
  def __init__(self, train_loader, test_loader, learning_rate, k, provenance):
    self.network = MNISTSum3Net(provenance, k)
    self.optimizer = optim.Adam(self.network.parameters(), lr=learning_rate)
    self.train_loader = train_loader
    self.test_loader = test_loader
    self.loss = bce_loss

  def train_epoch(self, epoch):
    self.network.train()
    iter = tqdm(self.train_loader, total=len(self.train_loader))
    train_loss = 0
    correct = 0

    for data_ct, (data, target) in enumerate(iter):
      self.optimizer.zero_grad()
      output = self.network(data)

      loss = self.loss(output, target)
      loss.backward()
      self.optimizer.step()
      train_loss += loss.item()

      pred = output.data.max(1, keepdim=True)[1]
      correct += pred.eq(target.data.view_as(pred)).sum()
      perc = 100. * correct / ((data_ct + 1) * pred.shape[0])
      avg_loss = train_loss / (data_ct + 1)
      iter.set_description(f"[Train Epoch {epoch}] Total loss: {avg_loss:.4f}, Accuracy: {correct}/{(data_ct + 1) * pred.shape[0]} ({perc:.2f}%)")

  def test(self, epoch):
    self.network.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
      iter = tqdm(self.test_loader, total=len(self.test_loader))
      for data_ct, (data, target) in enumerate(iter):
        output = self.network(data)
        test_loss += self.loss(output, target).item()
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).sum()
        perc = 100. * correct / ((data_ct + 1) * pred.shape[0])
        avg_loss = test_loss / (data_ct + 1)
        iter.set_description(f"[Test Epoch {epoch}] Total loss: {avg_loss:.4f}, Accuracy: {correct}/{(data_ct + 1) * pred.shape[0]} ({perc:.2f}%)")

  def train(self, n_epochs):
    self.test(0)
    for epoch in range(1, n_epochs + 1):
      self.train_epoch(epoch)
      self.test(epoch)
      break

**Step 5** Train the model, and see the performance. :)

In [None]:
n_epochs=10
learning_rate=0.001
provenance="difftopkproofs"
k=3

trainer = Trainer(train_loader, test_loader, learning_rate, k, provenance)
trainer.train(n_epochs)

### CLEVR - color