In [2]:
push!(LOAD_PATH, joinpath(Pkg.dir("SpectralClustering"), "docs"));

# Eigenvector Clustering

Once the eigenvectors are obtained, we have a continuous solution for a discrete problem. In order to obtain an assigment for every pattern,  it is needed to discretize the eigenvectors.
Obtaining this discrete solution from eigenvectors often requires solving another clustering problem, albeit in a lower-dimensional space. That is, eigenvectors are treated as geometrical coordinates of a point set.

This library provides two methods two obtain the discrete solution:
- Kmeans by means of [Clustering.jl](https://github.com/JuliaStats/Clustering.jl)
- The one proposed in [Multiclass spectral clustering](#stella2003multiclass)


## Text clustering
In this example a  [Health News in Twitter Data Set](https://archive.ics.uci.edu/ml/datasets/Health+News+in+Twitter) is clusterized in 7 groups. Every tweet is represented as a boolean term array and the simmilarity between two tweets is defined as:

$$
w(i,j) = exp\left( -\dfrac{Hamming(x_i, x_j)}{(2\cdot\alpha^2 )}  \right)
$$

where $x_i$ and $x_j$ are two tweets and $\alpha$ es the mean hamming distance between the first 1000 tweets.

In [2]:
using Extras, Distances,SpectralClustering, TextAnalysis, Plots, Clustering
function weight(i::Integer,j::Vector{<:Integer},tweet_i, neighbors_data,scale)
    local d = Distances.colwise(Jaccard(), tweet_i, neighbors_data)    
    return exp.(-(d.^2/(2*(scale^2))))
end

data        = Extras.health_tweets(number_of_tweets=10000);
docu_term_m = DocumentTermMatrix(data);
scale       = mean(pairwise(Jaccard(),full(docu_term_m.dtm[1:1000,:]')))
nconfig     = RandomNeighborhood(20);
G           = create(Float32, nconfig, (i,j, t_i, n_d)->weight(i,j,t_i,n_d,scale), docu_term_m.dtm');
emb         = embedding(NgLaplacian(7), G);
c1          = SpectralClustering.clusterize(YuEigenvectorRotation(), emb);
c2          = SpectralClustering.clusterize(KMeansClusterizer(7), emb);
output      = tweets_output([c1, c2],  ["Eigenvector Rotation", "KMeans"], docu_term_m);

  likely near In[2]:7


In [26]:
using DocUtils, JSON
display("text/javascript",file("tweets.js", Dict{String,String}("data"=>JSON.json(output))))

In [24]:
push!(LOAD_PATH, joinpath(Pkg.dir("SpectralClustering"), "docs"));
using DocUtils
display("text/html",file("tweets_html.html"))

# References

In [4]:
using DocUtils
display("text/html",bibliography(["stella2003multiclass"]))