Here we return to the problem we encountered while explaining the perceptron: sometimes there are two distinct groups of data, but each group has outliers such that the groups can't be perfectly divided by a line(ar object, e.g. a hyperplane).

In [None]:
import Pkg
Pkg.activate("..\\..\\juMLia")
import MLDatasets: Wine
using Plots, DataFrames, JSON3
wine = Wine()
logregdata = subset(wine.dataframe, :Wine => x -> (x .== 1 .|| x .== 2))[:, [:OD, :Proline, :Wine]]
logregfeatures = logregdata[:, [:OD, :Proline]]

In [None]:
wine1 = subset(logregdata, :Wine => x -> x .== 1) # Only select cultivar 1
wine2 = subset(logregdata, :Wine => x -> x .== 2)
scatter(wine1[:, :OD], wine1[:, :Proline], mc="red")
scatter!(wine2[:, :OD], wine2[:, :Proline], mc="blue")

Once again we have our clearly grouped but linearly inseparable data. This time we don't use any heuristics to comb out the inseparable parts--we just toss the whole thing into the machine.

In [None]:
include("SingleNeuron.jl")

In [None]:
logreglabels = logregdata[:, :Wine] .- 1

"Logistic" in this context refers to the sigmoid function, which is 0 at $-\infty$ and 1 at $+\infty$ but transitions from 0 to 1 in a narrow range around 0. 

In [None]:
domain = range(-10, 10, 100)
plot(domain, sigmoid(domain))

This allows for "probabilistic" classification, because using the sigmoid as an activation function means you classify most features as either 0 or 1, but if they're close enough to the boundary between groups they get a classification somewhere between the two groups. Because preactivation still behaves linearly, we can visualize the "border" where the classification is exactly halfway between the two options.

In [None]:
logregmodel = SingleNeuron(2, :logisticregression)
train!(logregmodel, logregfeatures, logreglabels)
plotneuron(logregmodel; leftbound=1, rightbound=4)
scatter!(wine1[:, :OD], wine1[:, :Proline], mc="red")
scatter!(wine2[:, :OD], wine2[:, :Proline], mc="blue")

And there we go! "Most" of each group is on one side of the border.