# Naive Bayes Implementation on SPECT DATA From Scratch

In [6]:
# Define a parse text function and return a 2d list

def parseTxtSPECT(filename):
    dataset = open(filename).read().split("\n")
    for i in range(len(dataset)):
        dataset[i] = dataset[i].split(",")
        for j in range(len(dataset[i])):
            dataset[i][j] = int(dataset[i][j])
    return dataset


Since the features are binary data, so Bernoulli Naive Bayes Classifier (**BNBC**) should be taken into account. I will build a bernoulli naive bayesian classifier from scratch.

At first, I separate the training dataset and calculate the statistics for BNBC, i.e. means of all features and prior probabilities. And We assume all features are conditionally independent.

In [7]:
# the first column is the response

def sepByClass(dataset):
	dictMatch = {}
	for i in range(len(dataset)):
		row = dataset[i]
		if (row[0] not in dictMatch):
			dictMatch[row[0]] = []
		dictMatch[row[0]].append(row[1:])
	return dictMatch

def priorProb(trainset):
    trainSep = sepByClass(trainset)
    probClass = {}
    for key in trainSep.keys():
        probClass[key] = float(len(trainSep[key])/len(trainset))
    return probClass


# conditional mean

def mean(numbers):
	return float(sum(numbers)/len(numbers))

def meanCol(trainset):
    means = [mean(attribute) for attribute in zip(*trainset)]
    return means
    
def meanByClass(trainset):
    trainSep = sepByClass(trainset)
    priorProbs = priorProb(trainset)
    meanPerFeature = {}
    for y, features in trainSep.items():
        meanPerFeature[y] = meanCol(features)
    return meanPerFeature, priorProbs


Base on the fitting, the posterior probabilities of every class are calculated and we pick the class that maximizes the posterior probabilities.

Besides, A function is written to be applied the test dataset of SPECT Data.

In [8]:
# conditional prob is bernoulli
def Bernoulli(x, p):
    return p**x * (1-p)**(1-x)


# Calculate the posterior probs 
def testProb(NBFit, testFeatures):
    priorProbs = NBFit[1]
    probs = {}
    for y, meanFeature in NBFit[0].items():
        probs[y] = priorProbs[y]
        for i in range(len(meanFeature)):
            probs[y] *= Bernoulli(testFeatures[i], meanFeature[i])
    return probs


# Use the probs to predict the response of testset
def bernoulliNBPred(NBFit, testFeatures):
    probs = testProb(NBFit, testFeatures)
    # probPred: final predicted prob
    classPred, probPred = None, 0
    for y, prob in probs.items():
        if classPred is None or prob > probPred:
            probPred = prob
            classPred = y
    return classPred

def bernoulliNBPredTestset(NBFit, testset):
    classPreds = []
    for i in range(len(testset)):
        testFeatures = testset[i][1:]
        classPreds.append(bernoulliNBPred(NBFit, testFeatures))
    return classPreds


def accuracyNB(predTest, testset):
    correctPred = 0
    for i in range(len(testset)):
        if predTest[i] == testset[i][0]:
            correctPred += 1
    return correctPred/len(predTest)


Finally, we combine everything together to form a function with training and test dataset as variables,  and predictions and accuracy as outputs.

In [9]:
## A final bernoulli naive bayesian classifier for SPECT dataset
def bernoulliNBClassifier(trainset, testset):
    bernoulliNBFit = meanByClass(trainset)
    predTest = bernoulliNBPredTestset(bernoulliNBFit, testset)
    accu = accuracyNB(predTest, testset)
    return predTest,accu



We run the model. 

In [10]:
Strain = parseTxtSPECT("SPECTtrain.txt")
Stest = parseTxtSPECT("SPECTtest.txt")
predictionsTest, accuracy = bernoulliNBClassifier(Strain, Stest)
print("""The predictions of test dataset and accuracy for a BNBC on SPECT dastet is \n [%s] \n
and %.3f""" %(predictionsTest, accuracy))


The predictions of test dataset and accuracy for a BNBC on SPECT dastet is 
 [[1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0]] 

and 0.775
