# The Convolutional Tsetlin Machine

Link to the original paper: https://arxiv.org/pdf/1905.09688v5.pdf

Convolutional neural networks (CNNs) have obtained astounding successes for important pattern recognition tasks, but they suffer from high computational complexity and the lack of interpretability. The recent Tsetlin Machine (TM) attempts to address this lack by using easy-to-interpret conjunctive clauses in propositional logic to solve complex pattern recognition problems. The TM provides competitive accuracy in several benchmarks, while keeping the important property of interpretability. The Convolutional Tsetlin Machine (CTM), an interpretable alternative to CNNs, uses each clause as a convolution filter. Whereas the TM categorizes an image by employing each clause once to the whole image.

Here I have applied The Convolutional Tsetlin Machine to the MNIST problem

* [Load data](#section-one)
* [Install the pyTsetlinMachineParallel package](#section-two)
* [Unpickle function to load data](#section-three)
* [The Convolutional Tsetlin Machine](#section-four)
* [Plot validation accuracy](#section-five)
* [Evaluate on the test set](#section-six)
* [Save and submit](#section-seven)

<a id="section-one"></a>
## 1. Load data

In [None]:
import os
import numpy as np 
import pandas as pd 

# List input files
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

<a id="section-two"></a>
## 2. Install the pyTsetlinMachineParallel package

In [None]:
!pip install pyTsetlinMachineParallel

!export OMP_NUM_THREADS=10

<a id="section-three"></a>
## 3. Unpickle function to load data

In [None]:
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

<a id="section-four"></a>
## 4. The Convolutional Tsetlin Machine

In [None]:
from pyTsetlinMachineParallel.tm import MultiClassTsetlinMachine
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from time import time
from tqdm import tqdm

epochs = 200

## We will use the QMNIST dataset to boost the performance
qmnist = unpickle("/kaggle/input/qmnist-the-extended-mnist-dataset-120k-images/MNIST-120k")
data = qmnist['data']
labels = qmnist['labels']

## We can use also the conventional MNIST dataset from keras.datasets
# from keras.datasets import mnist
#(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

# Split data
X_train, X_val, Y_train, Y_val = train_test_split(data, labels, test_size=0.05, random_state=31)

# Data pre-processing
X_train = np.where(X_train.reshape((X_train.shape[0], 28*28)) > 75, 1, 0) 
X_val = np.where(X_val.reshape((X_val.shape[0], 28*28)) > 75, 1, 0) 

Y_train = Y_train.flatten()
Y_val = Y_val.flatten()

# The Convolutional Tsetlin Machine definition
tm = MultiClassTsetlinMachine(2000, 50, 10.0)


acc_test = []
print("\nAccuracy over {} epochs:\n".format(epochs))
for i in tqdm(range(epochs)):
    start_training = time()
    tm.fit(X_train, Y_train, epochs=1, incremental=True)
    stop_training = time()

    start_testing = time()
    result = 100*(tm.predict(X_val) == Y_val).mean()
    stop_testing = time()

    #print("#%d Accuracy: %.2f%% Training: %.2fs Testing: %.2fs" % (i+1, result, stop_training-start_training, stop_testing-start_testing))
    acc_test.append(result) # Save accuracy of the validation set in array

<a id="section-five"></a>
## 5. Plot validation accuracy

In [None]:
plt.figure(figsize=(10, 5))

plt.title("Accuracy on validation set") 
plt.plot(acc_test, label='validation') 
plt.legend() 
plt.show()

<a id="section-six"></a>
## 6. Evaluate on the test set

In [None]:
X_test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')

X_test = np.where(X_test.to_numpy() > 75, 1, 0) 
Y_test = tm.predict(X_test)

<a id="section-seven"></a>
## 7. Save and submit

In [None]:
d = {'ImageId': [i for i in range(1,28001)], 'Label': Y_test}
df = pd.DataFrame(data=d)

In [None]:
df.to_csv('submission.csv', index=False)