# naive bayes

$n$ features, $\mathbf{x} = [x_1,\ldots,x_n]$, where each feature is **independent** to produce the probabilities over $K$ classes:

$p(C_k|x_1,\ldots,x_n)$

and the independence assumption allows us to produce a a simplification relying on 

$p(C_k|\mathbf{x}) = \frac{p(C_k)p(\mathbf{x}|C_k)}{p(\mathbf{x})} \propto p(C_k,x_1,\ldots,x_n)$   

and

$p(C_k,x_1,\ldots,x_n) = p(x_1,\ldots,x_n,C_k) = p(x_1|x_2,\ldots,C_k)p(x_2,\ldots,x_n,C_k)=p(x_1|x_2,\ldots,x_n,C_k)p(x_2|x_3,\ldots,x_n,C_k)\dots p(x_{n-1}|x_n,C_k)p(x_n|C_k)p(C_k)$


with the indepdence assumption 

$p(x_i|x_{i+1},\ldots,x_n,C_k) = p(x_i|C_k)$ 

and,

$p(C_k|x_1,\ldots,x_n) \propto p(C_k,x_1,\ldots,x_n) \propto p(C_k)p(x_1|C_k)p(x_2|C_k)p(x_2|C_k)p(x_3|C_k)\ldots \propto p(C_k)\prod^n_{i=1}p(x_i|C_k)$

producing,

$p(C_k|x_1,\ldots,x_n) = \frac{1}{Z} p(C_k)\prod^n_{i=1}p(x_i|C_k)$, with $Z = p(\mathbf{x}) = \sum_k p(C_k)p(\mathbf{x}|C_k)$

the search is for the maximum posterior,

$argmax_{k} p(C_k) \prod^n_{i=1}p(x_i|C_k)$

In [3]:
#with the TextAnalysis pkg
using TextAnalysis: NaiveBayesClassifier, fit!, predict

In [4]:
classes = ["normal","spam"]

2-element Array{String,1}:
 "normal"
 "spam"

In [5]:
model = NaiveBayesClassifier(classes)

NaiveBayesClassifier{String}(String[], ["normal", "spam"], Array{Int64}(undef,0,2))

In [8]:
email1 = "This is Bob, how are you?"
email2 = "Hi Alex, how are you this is Cat."
email3 = "You are being notified that you have a bill approaching."
email4 = "Hello, hope you are doing well, let me know if you want to go to the park, Derek."
email5 = "Amazin discount on a new car only \$5 for this super sale"
email6 = "Don't miss this super discount sale only this Friday"
email7 = "Do you like low prices and sales? This fantastic sale will discount all your favorite items!"
email8 = "This is hurricane season in FL be prepared"

emails = [email1,email2,email3,email4,email5,email6,email7,email8]
email_classes = [classes[1],classes[1],classes[1],classes[1],classes[2],classes[2],classes[2],classes[1]]

8-element Array{String,1}:
 "normal"
 "normal"
 "normal"
 "normal"
 "spam"
 "spam"
 "spam"
 "normal"

In [10]:
modelfit(ind) = fit!(model, emails[ind], email_classes[ind])

map(ii -> modelfit(ii), 1:length(emails))

8-element Array{NaiveBayesClassifier{String},1}:
 NaiveBayesClassifier{String}([",", "how", "you", "is", "This", "are", "?", "Bob", "Hi", "Cat"  …  "items", "prices", "sales", "like", "FL", "prepared", "in", "season", "be", "hurricane"], ["normal", "spam"], [6 1; 3 1; … ; 2 1; 2 1])
 NaiveBayesClassifier{String}([",", "how", "you", "is", "This", "are", "?", "Bob", "Hi", "Cat"  …  "items", "prices", "sales", "like", "FL", "prepared", "in", "season", "be", "hurricane"], ["normal", "spam"], [6 1; 3 1; … ; 2 1; 2 1])
 NaiveBayesClassifier{String}([",", "how", "you", "is", "This", "are", "?", "Bob", "Hi", "Cat"  …  "items", "prices", "sales", "like", "FL", "prepared", "in", "season", "be", "hurricane"], ["normal", "spam"], [6 1; 3 1; … ; 2 1; 2 1])
 NaiveBayesClassifier{String}([",", "how", "you", "is", "This", "are", "?", "Bob", "Hi", "Cat"  …  "items", "prices", "sales", "like", "FL", "prepared", "in", "season", "be", "hurricane"], ["normal", "spam"], [6 1; 3 1; … ; 2 1; 2 1])
 NaiveBayes

In [14]:
new_email1 = "Don't miss the super sale this Saturday only. Last chance ever!"
new_email2 = "Hello, it is Bob, remember that this Tuesday has a football game"

"Hello, it is Bob, remember that this Tuesday has a football game"

In [15]:
predict(model, new_email1)

Dict{String,Float64} with 2 entries:
  "normal" => 0.000486431
  "spam"   => 0.999514

In [16]:
predict(model, new_email2)

Dict{String,Float64} with 2 entries:
  "normal" => 0.984046
  "spam"   => 0.0159545

In [23]:
new_email3 = "Hello, remember that this Tuesday has a football game, so let's go check out the fantastic car sale"

"Hello, remember that this Tuesday has a football game, so let's go check out the fantastic car sale"

In [24]:
predict(model, new_email3)

Dict{String,Float64} with 2 entries:
  "normal" => 0.622803
  "spam"   => 0.377197